A History of Yesterday
For at least the last 25 years or so, certainly ever since researchers Barry Devlin and Paul Murphy coined the term “business data warehouse”, various vendors and technologies have been carving up and attempting to lay exclusive claim to overlapping slices of the data warehouse ecosystem - the sum total of the tools and methods required to support a data warehouse from source systems to end-users. You are familiar with these slices, they go by names and acronyms like: Extract, Transform & Load (ETL); Extract, Load & Transform (ELT); Data Quality (DQ); Data Profiling (DP); Master Data Management (MDM); Datamarting and Cubing; Database Federation; Data Warehouse Appliances (DWA); Business Intelligence (BI); Decision Support Systems (DSS); Executive Information Systems (EIS); Query & Reporting (Q&R); Enterprise Information Integration (EII); Advanced Analytics (AA); and Visualization, among many others. For each of these you can probably name at least two or three distinct vendors off of the top of your head. The thing of it is though, these are first and foremost marketing distinctions, driven by the needs of these vendors to differentiate themselves; and secondarily these slices are historical atavisms, reflective of sometimes decades-old technological limitations. In truth, data warehousing begins with data and it ends with data, and there is nothing in between but data. To understand how fundamentally this should impact both your strategic and operational approaches to information architecture, data governance, and vendor management, we need a quick review of the history of data warehousing.
In the beginning, there were systems, usually mainframes, optimized for the processing of business transactions. These systems were rather straight-forwardly known as OLTP, or on-line transactional processing, systems. OLTP systems were (and still are) great for handling large numbers of concurrent transactions which require the application of complex business rules. OLTP systems were (and still are) terrible at organizing, aggregating and trending either their input or output data values, in other words, they are terrible at actually reporting on the business processes they support. The amount of effort required to collect, clean, organize, aggregate and store these input and output data values for reporting was dear in terms of time, people and dollars. What was worse was that the efforts were often repeated independently for each new report. It was in response to this business pain that Devlin and Murphy in 1988 proposed an architecture for a “business data warehouse”.
Their architecture made use of some new and some old technologies – most notably dimensional data schemas which had been around since the ‘60s, and the database management systems developed in the ‘70s which were optimized to query them. Within 5 years of Devlin and Murphy, a series of firsts: the first database optimized for data warehousing; the first software for developing data warehouses; the first book on data warehousing; and the first publication of the 12 rules of on-line analytical processing (OLAP) which has provided the conceptual and architectural underpinnings for every relational database management system since. By 1996 the two major philosophies of data warehousing were established and doing battle to the death. Bill Inmon’s top-down, subject-oriented, non-volatile and integrated corporate information factory versus Ralph Kimball’s bottom-up, departmentally-oriented, versioned and conforming datamarts.
The chip, memory, disk, bus and software architectures of the early- to mid-‘90s severely restricted both the size and the speed of the data warehouse relative to the amount of data that was available for collection and processing. Furthermore, the implementation of a data warehouse architecture created an absolute need for the movement and manipulation of relatively large amounts of data between physical devices and logical schemas. This was the fertile soil in which a profusion of vendors and proprietary technologies germinated, each trying to define and grow into a niche from which to out-compete both their direct and next-nearest rivals. What had begun as a somewhat academic exercise in the ‘60s and ‘70s was a crowded and growing, multi-billion dollar, world-wide market by the turn of the millennium.