Data Quality Tips from the Pros (Part 1)
Data quality is something that every business should strive for. Unlike the past, we now rely on data in ways that were unthinkable even 20 years ago. As part of our interest in supporting and promoting data quality to yield higher business value, we have developed a series of tips that organizations can use to assist them in an overall data quality program.
Tip #1: Never change your source data.
Source data is what the system captures in the originally created format. For auditing, among other purposes, it is essential that you can restore what you started with. Therefore, you should always clean a copy of the source data, not the source data itself.
After your data cleansing process has been in place for a while, business users may decide that the source data needs to start off cleaner than it has been thus far.
Don’t be tempted to take the cleansed data and send it back to the source system. It’s better to start with the source system first. If you improve the data collection process initially, there will be less cleansing to do and fewer discrepancies with downstream systems.
Remember, for cleaner source data, it’s always preferable to clean the source data process rather than to send cleansed data back to the source system.
Tip #2: Version everything.
As you make changes, create copies of both the records and the rules you currently have. This will ensure that you can always revert back to an earlier state.
Plus, the key structure of the data should never be changed. At all times, the primary key should come along with any record as part of a cleansing process, and even end up in final tables in a data warehouse, data lake, or other downstream system. You may not need or use it every time, but you can always trace the data back through the primary key, which is critical for data lineage.
Remember to tag every level of a dimension or hierarchy with the primary key that it comes from also.
Tip #3: Publish data quality metrics.
Create a scorecard, so business users can see how much of their data meets data quality standards. Higher transparency means greater trust.
Some possible metrics to display are:
- Valid member count on a field
- Number of records that met data quality criteria vs. those that didn't
- Number of records that met data quality standards after rules were applied
- Number of records that needed to be manually validated or corrected
This information is good for data stewards who are in the business unit, handle DQ and resolution of those issues, and provide a balance between business and IT. The same metrics can be given to the business, so they are aware of what are norms, average values, acceptable values, and values outside the norm. This allows them to be able to look for data quality issues that impact the business.
Finally, encourage input on what your data quality measures really mean. Businesspeople tend to understand the value of the data better than IT people; IT people tend to understand how rules work better than businesspeople. If businesspeople see a metric that doesn’t make sense to them, it may indicate that a data quality rule needs updating or a conversion process needs to be changed (e.g., a percentage figure might have been getting transferred to a real-number field). Talking to business users about the meaning of the metrics can be very beneficial to both business and IT users.
For more information on Data Quality, and leveraging modern platforms and code optimization techniques to minimize initial investment and total cost of ownership, visit the Omni-Gen Data Quality Edition site here, and also download our whitepaper, “The Real Cost of Bad Data: Six Simple Steps To Address Data Quality Issues” .