Skip to content Skip to navigation

At the University of California Irvine, a Big-Data Revolution

December 15, 2013
by Mark Hagland
| Reprints
A big-data initiative at UCI Medical Center is leading to data analytics initiatives

Change is in the air at the 350-bed University of California Irvine Medical Center (UCI Medical Center) in Irvine, California. There, Charles Boicey, information solutions architect at UCI Medical Center and the University of California Irvine School of Medicine, is helping to lead a broad data strategy that is applying big data analytics strategies to clinical operations and care delivery, and in the process, is leveraging open-source Hadoop technology to create for complete clinical data searchability and availability.

The data initiative is focusing on reducing avoidable readmissions, speeding new research projects, and tracking patient vital statistics in real time. Among other elements in the data initiative, Boicey and his colleagues are using Hadoop technology (specifically the Hortonworks Data Platform) to access more than 20 years of legacy data, covering 1.2 million patients, and more than 9 million records. In that context, one of the major sub-initiatives has been a project to predict the likelihood of hospital readmission within 30 days of discharge, for patients with congestive heart failure (CHF). Working with a medical device integration partner, the hospital has developed a program that sends CHF patients home with a scale to weigh themselves daily and that automatically and wirelessly transmits that weight data to Hadoop, where an algorithm determines which weight changes indicate risk of readmission, notifying clinicians about those cases.

Boicey, who has been in his position for four years, spoke recently with HCI Editor-in-Chief Mark Hagland regarding the work that he and his colleagues have been doing at UCIM, and his perspectives on the current initiative. Below are excerpts from that interview.

Tell me about your organization’s big data strategy?

Let me articulate it in the context of the CCD [continuity of care document]. Back in 2010, I had a hypothesis that I could store CCD documents by the hundreds of thousands and make them available for clinicians to do simple queries against. I was looking at NoSQL technologies, and would store the documents in Mongo DB (for database)—that’s the name of a NoSQL database solution. And what we were able to do was to ingest these CCD documents in their native form. Usually, you break things up inside a database. But we built a database on top of it so the physician could type in, for example, “My patients who haven’t had an a1c within six months.” The clinician can’t go up to the EMR and scan the EMR for analytics; so this created that. And then on the research side, this allows a researcher to say, “45-50-year-old male with prostatectomy,” with identifiers removed. We were able to do that successfully in 2011. And we presented at the Health Care Data Warehouse Association; we presented it to their meeting in the summer of 2011.

Charles Boicey

So then I looked at other NoSQL environments, including Twitter. The Twitter environment is a lot like a laboratory information system. And your LinkedIn profile has sections and subsections not unlike a radiology or pathology report. And then I looked at Facebook, which shares the same underlying architecture—you do all your postings within a month or a year, and Facebook stores them temporally, as in, for a year; so you can retrieve them temporally. And I found out that Apache Hadoop is the underlying technology for all of this. So I went to Yahoo, which is where all the Apache Hadoop architecture originated back in 2006; it all came out of Yahoo. And in reaching out to them, I wanted to understand scalability; and I learned that Yahoo has over 60,000 servers, with over 160 pedabytes of data; that was six to seven months ago.

And I started all this work in January 2012. Yahoo created this architecture and put it out in open source. Some have commercialized it; but you can go to Apache and get it in its complete open-source form. So I was pretty happy with that. So I had to find a use case for it, to get the UCI to fund it. UCI has actually been on an EMR since 1988—the TDS system.

But the old legacy EMR was in view-only form. So I knew I could print that data to text and then ingest it into this Hadoop environment, which reads and stores the data in its native form. So I ingested 9 million records of 1.2 million patients, from over 22-year period of time. So that is now searchable and viewable. And now, whatever information was available in the legacy system is now viewable within the current EMR.  The key to using this Hadoop architecture is that it allows for complete viewability and searchability of data, while also allowing an organization to retain its legacy information in its entirety.  The reality is that the complete backloading or migration of one system to another doesn’t usually work.

So when did this go live to be viewable and searchable?

June 2013.

Are you the very first organization to ever do this?

Yes, I’ve been talking about this for over a year or so; but yes, we were the first to go live. I’m one of the first to work within this environment. There are a couple of commercial vendors to work within the Apache Hadoop world; one is Explorys from the Cleveland Clinic; the other is Apixio.

How many times has the legacy system been viewed or searched since June?

Any patient newer than October 2009 would not be involved. We’re an Allscripts Sunrise client. So newer patients would not have anything within the TDS environment.

What has physicians’ reaction been to this innovation?