Skip to content Skip to navigation

Managing the Data Explosion

October 1, 2013
by John DeGaspari
| Reprints
From large health systems to community hospitals and physician groups, how are provider organizations meeting their growing data storage needs?

As healthcare becomes more and more of a data-driven endeavor, it is opening up vast frontiers that promise to improve patient outcomes and extend research into whole new areas. That’s an exciting prospect in terms of patient care; yet underlying those potential gains is a nuts-and-bolts issue that cannot be ignored: how to manage and store all of this data, which is growing at a significant rate for provider organizations across the board.

That growth is coming from a variety of sources: organic growth from patient enrollment and the transition to electronic records, as well as technology advancements, particularly in imaging, where storage requirements are driven by both the number and the density of images. Add to this the advent of “Big Data,” which is driving major capacity expansions at large medical centers, and provider organizations of all kinds will have their work cut of them for the foreseeable future.

Fortunately, the cost of storage media is coming down, which will help organizations keep pace with their expanding requirements. The availability of the cloud is another option that is getting serious attention, while storage technology advancements are providing better tools for provider organizations to manage data in their own data centers. How are they meeting their growing data storage requirements? A variety of leading organizations offered insights into their strategies.

The Push to Accommodate ‘Big Data’

This month marks one year since the University of Pittsburgh Medical Center (UPMC) health system launched its enterprise analytics initiative, a five-year plan that it says will foster personalized medicine. Part of that plan is to build an enterprise data warehouse for the 20-plus hospital system that will bring together various types of data that so far have been difficult to integrate and analyze.

UPMC's Chris Carmody

Chris Carmody, UPMC’s vice president of enterprise infrastructure services, says the overall volume growth of data storage requirements in the healthcare sector is doubling every 18 months. UPMC currently stores five petabytes of enterprise data, a figure he expects to grow to 20 petabytes by the year 2016. That growth encompasses all types of data, from structured data in the electronic medical record (EMR) to unstructured data and imaging data, he says.

“As a technologist supporting that environment, my focus is on the end-user—the doctor, the nurse, the researcher. That’s what we are preparing for and planning, to enable them as we have these new sets of applications and insights from our enterprise analytics program in our environment,” Carmody says.

Children's Hospital of Pittsburgh, part of the 20-hospital UPMC

system. Photo: UPMC

To support its data analytics initiative, UPMC is building an enterprise data warehouse that will store data from many sources, including the EMR, laboratory systems and radiology systems. “We will pull that data in, and apply algorithms and analytics programs over that data to provide insights into what is happening with a specific patient or what’s happening with an entire population,” he says. The initiative will bring together data from sources that have never before been in one place, he says. The cloud is also part of UPMC’s strategy to meet its requirements, Carmody says, noting that there is organizational support for moving to the hybrid cloud model, which today UPMC uses only minimally.

A similarly ambitious data initiative is taking place at Memorial Sloan-Kettering Cancer Center (MSKCC) in New York, where vice president and CIO Pat Skarulis says the hospital is gearing up for the arrival of genomic data. “We are going to the next generation of sequencers, and they will put out a huge amount of data, which we need to save,” she says.

DNA sequence data will be processed by Memorial Sloan-Kettering in its laboratories, and will be saved at every step of the way, Skarulis says, who notes that the sequences themselves are getting faster and bigger. According to David Barry, MSKCC’s manager of storage, the processed genomic data are conservatively projected to be a terabyte a week.

Patrick Carragee, Memorial Sloan-Kettering’s director of infrastructure and data center, says the organization plans to store the data on tape in its own data center.

The prospect of housing such large genomic data sets has resulted in some changing strategies. One, according to Barry, is a return to using tape for this type of data, which is more economical than disk-based systems for long-term accessibility. While the high-speed computational work that needs to be done on sequencing will be stored on higher speed media, the derived data that will come from the processing will be stored on archival disk or tape, he explains.

Coping With Steady Organic Growth

Even excluding genomic data, Skarulis says data volume at Memorial Sloan-Kettering has been growing at a healthy 30 percent a year, which she attributes to just normal business, such as adding patients to its databases. As a research institution, “all of the data is extremely valuable,” she says.