ICU Data Mart: A Non-IT Approach | Healthcare Informatics Magazine | Health IT | Information Technology Skip to content Skip to navigation

ICU Data Mart: A Non-IT Approach

October 25, 2011
by Vitaly Herasevich, Daryl J. Kor, Man Li, and Brian W. Pickering
| Reprints
A Team of Clinicians, Researchers and Informatics Personnel at the Mayo Clinic have Taken a Homegrown Approach to Building an ICU Data Mart
Schematic relationship of ICU and OR data marts to clinical systems. Source: Vitaly Herasevich
Schematic relationship of ICU and OR data marts to clinical systems. Source: Vitaly Herasevich

As technology consumers, we have come to expect a high level of functionality on the computerized systems we have come to depend on for our everyday tasks such as banking, tracking of parcels, and airline ticketing. Unfortunately, that same functionality that is typified by those systems does not extend into healthcare, which is often hobbled by technical problems such as fragmented source databases.

This is especially true of larger healthcare systems and academic hospitals, where there is a general lack of integration between these multiple unique source databases. Indeed, most databases were built on legacy systems that are not designed to integrate with other software systems. While this lack of integration was initially the result of project-driven systems, the problem of “database silos” has remained, and continues largely because of commercial interests.

Recently, the concept of a fully integrated electronic medical record (EMR) has opened up the possibility of breaking down those silos. Underlying the idea of an integrated EMR is the need to address multiple unique medical informatics needs, all which strive to integrate the various source systems and technologies into a single-window EMR.

While these efforts have occasionally been quite successful, they almost always have been operation-oriented. As a result, these newly integrated systems generally still lack of reporting and research capabilities. These remaining problems are particularly important in time-sensitive data rich environments such as the operating room and intensive care unit, where the intensity of care is high and the needs for understanding both healthcare delivery processes and patient outcome are substantial. In this setting, a traditional data warehousing approach is inefficient to provide optimal results.


About five years ago, Ognjen Gajic, M.D., a critical care physician and researcher from Mayo Clinic, Rochester, Minn., had an important research question to be answered. Specifically, he was interested in evaluating the association between blood transfusion and a respiratory complication known as acute lung injury (ALI). In order for his research to move forward, he needed to be alerted about patients who had a blood transfusion order issued and who were at risk of ALI.

This seemingly straightforward task was complicated by the large number of transfusions administered at the participating institution. Moreover, the data needed to be extracted from multiple source databases to allow adjudication of the outcome of interest. Specifically, the detection of ALI required interrogation of the radiology reporting system, as well as laboratory results from the hospital's laboratory reporting system. Ultimately, the alert system required three separate data feeds from the EMR system. At that time, this was neither technically nor practically feasible.

The proposed solution to this study's unique informatics needs was a concept termed the “ICU data mart,” which would be an integrated database where all pertinent data regarding critically ill patients would be stored in near real- time. In addition, the data within this ICU data mart would be able to be queried readily. The integrated nature of the data mart would allow complex queries, including data from multiple non-integrated source databases.

When this concept was proposed to a group of our information technology colleagues, their response was a rather straightforward: “Impossible.” Soon after, we began to build this integrated relational database ourselves “one brick at a time.” It has become a highly functional near real-time database servicing dozens of investigator-initiated data requests, quality improvement initiatives, and administrative needs. Moreover, it has been developed with minimal resources and at a very low cost.

We believe that the success of this project is in large part due to its “non-IT approach.” This doesn't mean that we avoided the use of computers and databases. Quite the opposite, the ICU data mart is physically a Microsoft structured query language (SQL) database. However, our approach was based on three key concepts-Legos, UNIX, and Matrix-that often run contrary to traditional informatics approaches.


No, this doesn't imply that the database was built from our children's Lego sets. Rather, it is the concept of building a project one piece at a time while maintaining a vision of what the final project will look like and, equally importantly, what the next piece will add to the whole. An important benefit of this piece-by-piece approach was that it allowed the existing data to be used before the final version of the database was completed.

Our initial piece for the ICU data mart was the reference table based on admission and demographic information. This was an essential starting point, because it allowed us to define a specific event: the ICU length of stay. Indeed, without a defined time interval, everything else becomes a mess. We then used a combination of the patient identification number and admission time as key links to other tables of interest.

This was the start. We now had version 1.0! No beta versions, no releases. Of course, each new “piece” required careful testing and validation, which were performed by comparing our automated results to the actual EMRs on manual review of the medical records. This step was mandatory before moving the newly developed data elements to a production stage. Additional statistical controls were also used to assess for unanticipated gaps in the data, as well as potential data outliers.

Having moved the initial piece into a production phase, we immediately began working on the next data element. Since we needed to identify arterial blood gas results, our next focus was the source database housing laboratory data. Piece by piece, the database grew (and continues to grow). All the while, previously tested and validated data have been available to the end users. Without this approach, it would have taken years to realize a functional “fully integrated EMR/database.” In contrast, this system was functional from the very beginning. Additional data are simply added to the existing database and the process continues to move forward.


While some people fondly remember the command line, most database end users prefer a Windows-based interface. Yet, although this works well when working on the standard office tasks, it is often inadequate when working with complex databases. Furthermore, the development of multiple interfaces adds additional layers of complexity, cost, and potential errors.

Indeed, complex database solutions often require a custom-built query interface. This interface generally translates still algorithmic query language into SQL commands. Database end users must not only understand the interface, but they also must learn the interface query language. Moreover, the varied interfaces often require additional resources such as web-servers and a team that can develop, support, and improve the interface over time-an iterative, ongoing process.

For the ICU data mart, we chose to explore query building tools that reside in the statistical software. Most of these embedded query building tools have the ability to interrogate databases using open database connectivity (OBDC). Microsoft Excel is an example of one such tool.

For most of our analytic needs, we have found that JMP statistical software (from SAS Institute Inc., Cary, N.C.) was quite adequate. Embedded query tools require no additional interfaces and need for data export. The data are simply right there, residing within a powerful statistical program, and immediately available for the desired analyses. For those few circumstances where more robust analyses were needed, we used SAS Institute's SAS Data Management software.


Do you remember the nice green-on-black screen from the Wachowski brothers' movie, The Matrix? How the data visually fell from out of one site to another? Beautiful, raw data! The concept of The Matrix is all about storing raw data-no pre-processing, no massaging, no normalizing. Only the original data are stored.

Don't get us wrong, data parsing, processing, and normalization are extremely important, but this process will vary depending on the specific data need. Moreover, pre-processing and normalization will result in an unnecessary loss of data. Often, this loss of data will prove to be a barrier when future data needs arise. In contrast, post-processing and normalization allows the end users (or applications) to tailor the data to their specific needs, while keeping the full complement of data elements available for future use.

Importantly, filtering data feeds may be necessary as you will likely not need (or want) to store all aspects of the technical data. Rather, what you really want to store are the meaningful data. We advise that you take some time to determine which data elements are meaningful or unnecessary and can be filtered out. Ultimately, when the meaningful raw data are available, it makes organizing, using, and summarizing the data far more powerful. For example, if report requirements change, it is much easier to modify existing code within the data mart than to modify the interfaces with the various source databases.

An additional key element regarding data acquisition is the timing of its availability. Due to the increasingly fast-paced nature of medicine, particularly in high-acuity environments such as the operating room and ICU, near real-time feeds are of increasing importance. However, real-time data feeds can come at a cost, particularly with regard to resource utilization and the stability of the source databases. Therefore, you must determine just how time-sensitive your data needs might be.

Generally, data requirements for quality initiatives, reports, and research do not require real-time data feeds. In most clinical systems, real-time data are not truly real-time; for example “real-time” clinical notes appear only after they are transcribed and finalized by the authoring clinicians. ICD-9 codes are generally assigned only after a patient was discharged. Are these data sources ever truly “real-time?” Often, the ability to choose an appropriate time interval for data retrieval can save significant resources without sacrificing a systems' usefulness.

In summary, our group of clinicians, researchers, and informatics personnel have developed an ICU data mart that contains a near real-time copy of pertinent ICU patient information on a population of 206 ICU beds, with an average of 15,000 ICU admissions per year. This includes historical data going back to 2003. Having been in existence now for almost five years, the approach taken by our team has proved efficient, adaptable, and very well-suited to time-sensitive environments such as the ICU.

The data elements within the ICU data mart relational database continue to expand, and now include details from the pre-ICU environment (e.g. emergency department and transportation), as well as post-ICU long-term outcomes. Due to the success of this effort, we have an effort underway to replicate this process in the perioperative environment as well.

While the OR data mart will clearly benefit from the approaches and experiences of the ICU data mart build, it will also serve as a valuable additional data source as the ICU data mart continues to grow. Ultimately, by securing detailed data from pre-ICU environments such as the ED and the OR, we believe systems such as this can help to find new ways to optimize healthcare delivery in the OR and ICU. Perhaps more importantly, technological strategies such as the ones described above may prevent patients from needing intensive care services in the first place.

Vitaly Herasevich, M.D., Ph.D., is assistant professor of medicine; Daryl J. Kor, M.D., is assistant professor of anesthesiology; Man Li, M.D., is senior analyst programmer, Anesthesia Clinical Research Unit; and Brian W. Pickering, M.D., is assistant professor of anesthesiology. All are with the Department of Anesthesiology, Multidisciplinary Epidemiology and Translational Research in Intensive Care (METRIC), Mayo Clinic, Rochester, Minn. Healthcare Informatics 2011 November;28(11):42-45

The Health IT Summits gather 250+ healthcare leaders in cities across the U.S. to present important new insights, collaborate on ideas, and to have a little fun - Find a Summit Near You!


See more on