Skip to content Skip to navigation

Everything You Know About Business Intelligence, Data Warehousing and ETL is Wrong — Part I

April 16, 2010
by Marc D. Paradis MS
| Reprints

A History of Yesterday

For at least the last 25 years or so, certainly ever since researchers Barry Devlin and Paul Murphy coined the term “business data warehouse”, various vendors and technologies have been carving up and attempting to lay exclusive claim to overlapping slices of the data warehouse ecosystem - the sum total of the tools and methods required to support a data warehouse from source systems to end-users. You are familiar with these slices, they go by names and acronyms like: Extract, Transform & Load (ETL); Extract, Load & Transform (ELT); Data Quality (DQ); Data Profiling (DP); Master Data Management (MDM); Datamarting and Cubing; Database Federation; Data Warehouse Appliances (DWA); Business Intelligence (BI); Decision Support Systems (DSS); Executive Information Systems (EIS); Query & Reporting (Q&R); Enterprise Information Integration (EII); Advanced Analytics (AA); and Visualization, among many others. For each of these you can probably name at least two or three distinct vendors off of the top of your head. The thing of it is though, these are first and foremost marketing distinctions, driven by the needs of these vendors to differentiate themselves; and secondarily these slices are historical atavisms, reflective of sometimes decades-old technological limitations. In truth, data warehousing begins with data and it ends with data, and there is nothing in between but data. To understand how fundamentally this should impact both your strategic and operational approaches to information architecture, data governance, and vendor management, we need a quick review of the history of data warehousing.

In the beginning, there were systems, usually mainframes, optimized for the processing of business transactions. These systems were rather straight-forwardly known as OLTP, or on-line transactional processing, systems. OLTP systems were (and still are) great for handling large numbers of concurrent transactions which require the application of complex business rules. OLTP systems were (and still are) terrible at organizing, aggregating and trending either their input or output data values, in other words, they are terrible at actually reporting on the business processes they support. The amount of effort required to collect, clean, organize, aggregate and store these input and output data values for reporting was dear in terms of time, people and dollars. What was worse was that the efforts were often repeated independently for each new report. It was in response to this business pain that Devlin and Murphy in 1988 proposed an architecture for a “business data warehouse”.

Their architecture made use of some new and some old technologies – most notably dimensional data schemas which had been around since the ‘60s, and the database management systems developed in the ‘70s which were optimized to query them. Within 5 years of Devlin and Murphy, a series of firsts: the first database optimized for data warehousing; the first software for developing data warehouses; the first book on data warehousing; and the first publication of the 12 rules of on-line analytical processing (OLAP) which has provided the conceptual and architectural underpinnings for every relational database management system since. By 1996 the two major philosophies of data warehousing were established and doing battle to the death. Bill Inmon’s top-down, subject-oriented, non-volatile and integrated corporate information factory versus Ralph Kimball’s bottom-up, departmentally-oriented, versioned and conforming datamarts.

The chip, memory, disk, bus and software architectures of the early- to mid-‘90s severely restricted both the size and the speed of the data warehouse relative to the amount of data that was available for collection and processing. Furthermore, the implementation of a data warehouse architecture created an absolute need for the movement and manipulation of relatively large amounts of data between physical devices and logical schemas. This was the fertile soil in which a profusion of vendors and proprietary technologies germinated, each trying to define and grow into a niche from which to out-compete both their direct and next-nearest rivals. What had begun as a somewhat academic exercise in the ‘60s and ‘70s was a crowded and growing, multi-billion dollar, world-wide market by the turn of the millennium.

Pages

Topics

Comments

Marc,
I am looking forward to Part II (two). So far, this is a terrific contextualization which I've never seen brought together with such candor.

For me, the eight hundred pound gorilla in the room (and my spell checker suggested that, although I spelled gorilla correctly, I should consider Godzilla instead), is the people side of BI, DW, DSS, SaaS, etc. Whether it's the behavior of those entering the initial data in a workflow context, those using it in the course of overseeing departments, service lines and business units, or the CEO deciding what to do about it all (and more importantly the how to address conflict avoidance), the BI data mechanics become only the gasoline. Not the business fire.

My friends working in this world would have started with Culture. See my current point on Culture and the Perils of Guesswork. Is this the flip side of your post's coin?

Yes Joe,

Any end-user facing IT implementation will live or die on the basis of it's ability to make the end-user happy. Most end-users are very difficult to make happy. Behavior change, culture change and truly solving an unmet end-user need are the distinction between a killer app and a dead app.

All that being said however, if the data behind a killer app is rotten (and by rotten I mean relative to end-user standards, not rotten relative to internal IT standards), the killer app will soon commit hari-kari.

When I was rowing, we used to have a saying about the coxswain - s/he can lose a race, but they can't win one. The assumption of every rower is that the coswain is steering a perfect coarse, calling out a perfect strategy and perfectly assessing position and speed relative to the other shells. When all coxswains are performing this way, the best crew will win. When your coxswain is performing worse below perfection, the rowers have to pull that much harder, generate that much more boat speed and expend that much more energy to win the same race.

The coxswain is necessary but not sufficient, while the rowers are necessary and sufficient. In the same way, the appropriate data governance and lifecycle management is necessary but not sufficient. Just as the rowers are necessary and sufficient, so are the end-users who do the heavy lifting. However, the energy and coordination necessary to win without the coxswain or to succeed with data governance are nearly prohibitive - unless of course you are competing against another boat without a coxswain or habituated to using a system without data governance.

Marc, I appreciate the clarity of your model:

1) non-rotten data,
2) heavy-lifting, end-user "rowers",
3) a competent "coxswain", and
4) competitively matched (ideally superior in some area) data and staffing.

An old friend once shared with me, "There are a thousand ways to fail and only one to succeed." Sounded very harsh. I think he was making the same point you did with four components of a chain. Failure of one link is failure of the whole chain.

My friend, by the way, subsequently softened and sharpened his view. He evolved the model to three requisite and sequential steps. Step one, develop mutual respect from those involved, including internal within your company and external with your customers. Neither of you can exploit the other in the short or long run.

Step two, clearly define everyone's responsibilities. Step three, with the first two steps in place, then focus on results.

There's still a chain but a chain that can be created and managed.

I think your model, Marc, is highly concordant, especially the respect for the rowers, the responsibilities including that of the coxswain, and the fact that the result is in a competitive context. "The industry grades on a curve." In business, you don't have to be perfect. (In healthcare delivery, on the other hand, you often have to perfectly follow procedures.)

How many Parts will this series be?

Pages

Marc D. Paradis MS

Marc D. Paradis, MS, is Director of Strategic Data Services in Applied Informatics and a Manager...