Skip to content Skip to navigation

The Great Data Escape

August 1, 2007
by Sami Benmechiche, Carol Chouinard, Ross Christen, Deepak Goyal, Ajit Kumar, and Richard Kupcunas
| Reprints
Successfully mining clinical data depends on first constructing a proper data warehouse

Clinical data has been around as long as the medical profession itself. The practice of documenting observations, diagnostics, prescribed drugs, and procedures is an intrinsic part of providing healthcare. As healthcare companies move away from paper-based clinical records and deploy clinical information systems (CIS), more information is being stored electronically. Our research shows that operational systems for patient administration, orders, and results management, as well as laboratory, radiology, and pharmacy systems, are playing a major role in generating and storing raw clinical data.

Existing applications have struggled to cope with storing and processing ever-increasing clinical data, leading to the emergence of clinical data warehouses. These large, electronic repositories of information accrue over time through the normal processing of healthcare systems. Use of data warehouses began with the production of mandatory regulatory reports. Today, they are evolving into a resource for sophisticated clinical and financial predictive analysis (see Figure 1).

Key Factors Contributing to Extensive Use and Use and Sharing of Health Care Data

The rush is on

Our research indicates that several convergent factors are contributing to the exponential growth in clinical and financial healthcare data, leading to the evolution of clinical data warehouses:

  • Connectivity and networking between healthcare companies, resulting in connected local, regional, and national health information networks

  • Maturity and standardization of interoperability standards, such as HL7, ANSI X12, and XML, all helping to enable integration of disparate systems

  • Maturity and standardization of coding, helping physicians document their diagnostic and procedure information in a consistent manner

  • Conversion of more documents to electronic media storage: for example, historical paper-based charts converted to electronic format, radiology imaging converted from film to electronic format (see Figure 2)

  • More-sophisticated diagnostic methods, such as genetic testing, the affordability of testing, and the growing complexity of procedures

  • Legal and regulatory pressures, requiring greater transparency and traceability throughout the care delivery life cycle

  • Greater acceptance of privacy- and security-related technologies and procedures, such as HIPAA.

EMR Adoption Percentage EMR Adoptin by Practice Size

Together, these factors are presenting information management challenges for every stakeholder in the healthcare value chain — hospitals, health plans, government health bodies, patients, pharmaceutical companies, and biotechnology companies. The gold rush is on to tap the potential for building and exploiting clinical data warehouses.

Who gets the gold?

As more data is created and managed through clinical data warehouses, key questions emerge: Who is all this data for? And, how are they going to use it? Here are some possible answers for key stakeholder groups.


Most hospitals have built a central data repository for continued data compilation from patient administration, clinical, financial, and claims submission systems (see Table 1).

Current and Future Benefits for Hospitals

Current Usage

Future Trends

  • Store historical information

  • Analyze and forecast the level of utilization of their facilities

  • Support medical research

  • Provide medical management data

  • Perform case management

  • Provide report to regulatory bodies

  • Create a secondary source of information — for situations where the primary applications become unavailable

  • Analyze and reconcile cost and revenue

  • Transfer or obtain electronic medical records (for out-of-area, or recently relocated patients)

  • Compare quality of care with comparable facilities

  • Submit reportable disease to authorities in real-time

  • Refer patients to other providers and electronically book appointments

  • Perform analysis that includes the patient's family history

  • Utilize medical data to reduce medical errors

  • Perform cost containment and feasibility studies

  • Adapt to consumerism: use customer relationship management data to enhance patient experience, share anonymous data with potential business partners

Enormous untapped potential exists for large providers that have yet aggregated data from multiple facilities or regions. Our research shows that many providers also have not yet used data in a preventive manner, such as using demographic data to schedule tests, influencing patient behavior (e.g., lifestyles and medication compliance), or comparing a patient's health history to the family's health history to better deal with possible negative outcomes.