Data Quality Tips From the Pros (Part 2)
Data quality is something that every business should strive for. We now rely on data in ways that were unthinkable even 20 years ago. As part of our interest in supporting and promoting data quality to yield higher business value, we have developed a series of tips that organizations can use to assist them in an overall data quality program. This is the second installment of the series. You can read the first installment here.
Tip #4: Build Business Rules in a Cumulative Way
When building a new cleansing rule, it’s best to base it off of the rules you’ve already created – let the initial rules run, and then add any new requirements to the end if possible. Also, ensure that you carry along the original data with the cleansed data throughout the process. This will help you spot where any unwanted changes have occurred and will aid you in reverting to earlier states of the data if needed.
Things to remember:
- If your data warehouse records include the operational system values as well as the new values, you can use the operational values in matching processes and don’t have to worry about re-cleansing the operational data
- If you’re going to put cleansed data back into your operational systems as well as into your data warehouse, make sure the cleansing rules don’t result in data that’s valid for one system but not for the other
Tip #5: Don’t Leave Data Standards Management to IT
At least two business organizations should make sure data standards are acceptable. It may be best to have the chief data officer (CDO) oversee this, if you have one in your organization. The CDO can ensure that metadata is properly managed and how dimensions are structured. High-level oversight benefits standards such as spelling, format, standardization issues, hierarchy levels, proper norms for measures, date formats, and currency notations.
Once you have cleansed and standardized dimensions, publish them where the business can see it. The CDO should announce them so all the members of the team use consistent versions of dimensionality.
Tip #6: Know the Difference Between Cleansing and Mastering
When you are target-oriented, as opposed to source-oriented, you are leaning away from the data cleansing process and moving into data mastering. Steps that may look like data cleansing are actually master data management (MDM). To illustrate, if there are three levels of data coming in which are categories of a retail brand, execute a match-merge process in order to find the missed items and put them within the correct brand. This is an example of the data being moved around in the hierarchy to get the right results.
It’s also important to understand the hierarchy, the golden record, to determine how these brands should be organized, and how the source data should be analyzed, in order to make that determination. This is above and beyond cleansing.
For more information on data quality, and leveraging modern platforms and code optimization techniques to minimize initial investment and total cost of ownership, visit the Omni-Gen Data Quality Edition page, and also download our whitepaper, “The Real Cost of Bad Data: Six Simple Steps To Address Data Quality Issues” .