It’s a great rule and applies to data projects just as much as programming ones. It’s admitting that you can’t create a perfect source, and then expect it to stay that way. Keeping a clean dataset, like keeping a clean campground, is a continual and team effort.
It’s like a programming project in that it’s constantly being added to, and needs to be continually refactored and cleaned.
Whether you’re cleaning or updating an Amazon Redshift cluster, a CSV file or CRM data, it’s all entirely important to keep it clean day in and out—even if you’re the only one going through the data.
According to Forbes, even though 57% of data scientists regard cleaning and organizing data as the least enjoyable part of their work, it’s an important and continual task for us.
While data cleansing can be seen as a sore spot for many of us, it’s an important part of our responsibilities and heavily contributes to the overall success of the business. So the next time you’re injecting more data, tweaking a model or find an inconsistency, remember the rule:
"Leave your data cleaner than you found it."