It takes more than just data to run a business
To be successful at running a business you need to use data to make important decisions. You go through a continual process of collecting, updating, and creating data in order to have the insights that help you grow and succeed. The quality of the data your company uses is essential to the reliability of your business analytics and business intelligence.
Your business intelligence system should allow the decision makers to gain better insight into the business in order to make reliable decisions faster and better. While your business intelligence system should make it easier to analyze your data, simply having data is not enough to make those important decisions. The quality of the data is just as, if not more, important than the data itself.
Data is never 100% clean
Data quality problems can arise from a number of sources, such as a database consolidation or human error. If your data is unreliable you may need to consider a data cleaning or data scrubbing process. This may involve identifying incomplete, incorrect, inaccurate, irrelevant data and replacing, modifying or deleting this data.
The goal of data cleaning is to improve the quality of data to make it fit for use. Your data will never be 100% clean. But if the data doesn’t mean what you think it does, you may need to use a data cleaning process.
The six data quality dimensions
Understanding the key data quality dimensions will help assess your data as well as determine the scope of your data quality root causes and if you need to go through data cleaning.
According to SmartBridge the six dimensions of data quality are:
Are critical data values missing? A database with missing data values is not unusual, but when the information missing is critical, then completeness is an issue. If a customer’s first name and last name are mandatory but the job title is optional, a record can be considered complete even if a job title is not available.
Is the data following standard data definitions? For example, are dates in a standard format? Maintaining conformity to standard formats are important to maintaining consistent structure and nomenclature for sharing and internal data management.
Is the data accurate to the “real-world” values expected? Incorrect spellings, misplaced decimals, untimely or out of date data can lead to inaccurate analysis. If the sales from a customer are not the true sales or the email address of a contact is misspelled, the data is not accurate.
Is the data available when expected and needed? Timeliness depends on the user’s expectations and needs. If tracking information is delayed or a customer’s purchasing information is not updated in real-time then the timeliness could be an issue.
Does the data across several systems reflect the same information? If data is reported across multiple systems, it should have the same information. If one database reports a customer’s account as active, while another reports the account as closed, the data set is not consistent.
Is the data valid across the relationships and can all the data in a database be traced and connected? For example, in a customer database there should be a valid customer/sales relationship. If there are sales data without a customer then that data is not valid and is an orphaned record. The inability to link related records may introduce duplication across your systems.
In my next post I'll share some common causes of errors in data and how to get started in your data cleaning strategy.