Quick Doesn't Have To Mean Dirty

Jake Freivald's picture
 By | februari 02, 2011
in iWay , cartoons, Data Quality, Data Governance
februari 02, 2011

Click for a full-sized image. Cartoon by Mark Anderson, www.andertoons.com. Freely distributable with attribution, provided no modifications are made.

BI software providers always talk about how fast their implementation times are, how fresh their data is, and how swiftly you can get value from their projects: Everything’s fast fast fast.

I talk like that, too, and those things are really important. But I don’t care how fast answers are if they’re wrong.

So how do you keep business intelligence implementations fast – in every sense – while focusing on getting the right answers?

Rapid implementation times come from having the right data in the right form. That’s why so many of these one-to-six-week implementations use a single data source from a system of record (which may or may not be a data warehouse): If the data’s dirty, at least it’s dirty in a way that the users will understand. People can accommodate the limitations of the BI system, and the value they get from dirty data is higher than not having BI at all.

That’s not a great answer, though.

One way to get clean data right from the start is to have a “data quality firewall.” As soon as the data comes in – from a customer-facing application, from a B2B exchange, from an order-entry clerk at a terminal, whatever – it immediately gets cleansed, often before it enters any application or data warehouse. Not only does that mean you don’t have to build data quality into the BI system, but if you have more than one BI / analytics / reporting application from the same system, you can be assured that the data’s clean and reconciled from one BI system to the next.

And although it’s beyond what some companies are able to do right now, leveraging a Master Data Management system as part of the data quality firewall will help ensure that different implementations, built from different systems, all agree.

Those same steps help provide fresh data. As more applications and systems move to real-time information, you can’t wait for the traditional data warehouse batch run to cleanse the data as part of the extraction, transformation, and load (ETL) process. You need to trickle-feed the warehouse, and the data needs to be clean from the outset.

And that, of course, leads to faster time-to-value. If you can deliver fresh data that’s accurate and properly reconciled, you can do everything from shaving percentages off costs to changing culture.

Note that none of this requires additional time in your BI system itself. It requires some thinking early on, but it’s not about the BI tool you’re installing or the reports you’re writing – it’s something that can be done up front, before or in parallel with any BI projects you’re currently working on.

Do most BI vendors focus on these things when they make their claims about how fast everything is? Not in my experience. They focus on getting the data into memory so an analyst can slice it up – call it time-to-playtime if you like – or on connecting to that single system of record.

But quick doesn’t have to be dirty, and if you’re serious about getting good answers quickly, it pays to be quick and clean.