A data warehouse problem…And, cuddling up with Bill Gates

Andy Hayler from Kalido is a great read for a no-nonsense perspective on business intelligence. He wrote recently.

"Meanwhile, a survey of 1,000 UK business managers at companies with over 250 staff, published by ICS, indicates a widespread need for better BI systems. The study found that over three quarters of respondents were forced to make decisions ’blind’ due to late or insufficient business information". By contrast, this is entirely believable, though not for the reason that the article gave. The critical issue is that you can have as many pretty reporting tools and dashboards as you like, but you need accurate and timely information to feed those systems coming from a data warehouse (unless you are one of the few brave souls using EII). The problem is that most data warehouses are entirely unable to keep up with the pace of business change (reorganisations, acquisitions etc) and so are constantly out of date. Consider a data warehouse with just ten source systems. A major change in one of its sources will impact the warehouse schema, and may take three months to fix the schema, the load routines and the reports that are impacted by the change (this is a pretty typical figure in my experience at Shell).

A major change of this type does not happen every day, but is almost certain to happen once a year to each of these source systems, maybe twice. There are then ten sets of separate changes, each taking three months worth of changes needed to the warehouse every year. Even assuming that the changes are neatly spread over the year and that you have plenty of programming resources to fix the changes, so you can do these in parallel, you still have 15 months of change to fit into 12 months; basically the warehouse can never catch up. You may well have more than 10 sources for your data warehouse, so the problem could be even worse than this. This is indeed what happens in reality: the data warehouse is usually out of date, so armies of Excel jockeys in finance get the answers via email and have to manually number-crunch for anything really critical while the warehouse lumbers on with out of date information. This situation is not the fault of the BI tools—it is the fault of the data warehouses that feed the BI tools. Until companies admit that the status quo is failing and start abandoning custom-build warehouses this problem will persist. It is like with treating alcoholism: the first step is admitting that there is a problem.

We’ve long been skeptical of top-down analytics that try to centralize a businesses information and knowledge. Andy’s post highlights one problem with this approach - change happens. Here’s another problem - you don’t know what you don’t know when you build a warehouse.

In other news, I’ve cut my six degrees of separation to Bill Gates down to two. More later.