In June, just eight months into its $100 million, five-year, enterprise big-data initiative, leaders at the vast University of Pittsburgh Medical Center (UPMC) health system in Pittsburgh, Pa. were able to announce that, using the foundational architecture of their recently created enterprise data warehouse, researchers at the University of Pittsburgh and UPMC were able to electronically integrate for the first time clinical and genomic information on 140 patients previously treated for breast cancer. Adrian V. Lee, Ph.D., a renowned expert in the molecular and cellular biology of breast cancer, and director of the Women’s Cancer Research Center at the University of Pittsburgh Cancer Institute, has been leading his colleagues in research on differences between pre-menopausal and post-menopausal breast cancer. Now, leveraging the organization’s data warehouse capabilities and its core electronic health record (EHR), Dr. Lee and his colleagues are mining breast cancer data available to them and applying genomics data to the care of 140 patients who have been treated at UPMC for breast cancer. The work of Lee and his colleagues is building on the creation of the data warehouse, which in turn required collaboration among several vendor partners, including Oracle, IBM, Informatica, and dbMotion.
HCI Editor-in-Chief Mark Hagland spoke recently with Dr. Lee about the work that he is helping to lead in Pittsburgh, and about its implications for development of what is variously being called personalized and precision medicine. Below are excerpts from that interview.
Dr. Lee, when I met you in Pittsburgh in January, you were just beginning to roll out this initiative.
Yes, we had just started the project, when you and I met, and the goal was related to the point that we now have an incredible capacity to sequence patients, and now are creating the capability to impact patient care, but progress has occurred so quickly that we have no information infrastructure to store that data and/or analyze that data. So this is what we wanted to build with Oracle.
Adrian V. Lee, Ph.D.
So, what were the basic building blocks of the program?
Oracle has installed an Exadata server here at Pitt and UPMC that can handle large amounts of data, and provides fast access to data, and we also use Cohort Explorer, a tool that encompasses something the TRC—the Translational Research Center, which helps you do SQL queries on the database. And what I told you when you were here was that we were starting out with a very small, discrete use case. We took a very unique set of patients—140 breast cancer patients—with tumors that were sent to a national consortium that has sequenced their tumors—the Cancer Genome Atlas, run by the National Institutes of Health. It’s the largest-ever effort to sequence and analyze the genome for cancer patients, and involves doing every molecular test possible on 10,000 tumors. And obviously, for that scale, only the NIH could manage an effort that big. What happens is that multiple sites submit tumors—we send off the actual tissue, it’s frozen. They then send it out to the Data Centers, and each data center does something different; some sequence, some measure the gene expression. The three major centers are Baylor College of Medicine, the Broad Institute in Boston, and Washington University in St. Louis.
This is a very complicated structure; you have all these medical centers submitting tissue, with tons of analysis, etc. And it’s taken management to do all this. And it creates large data; the data is now about 720 terabytes large.
Yes. When we sequence a single tumor, it generates a terabyte of data. And that’s the raw data; once you start analyzing it, it gets worse. Looking at the sequencing center, Wash U has sequenced 14 pedabytes of data already. It’s like a little village. This is why we need new systems, because we are fundamentally changing the way data is used.
Are you involved in the Internet2 initiative?
We likely are, because we have a supercomputing center here. Pittsburgh has the largest supercomputing center in the world, with two times 15 terabytes of RAM. And the supercomputing center is an independent center.
So please tell me more about what you’ve been doing with these particular patients.
So we took those 140 patients… They’re special in that we sent their tumors to this consortium. So University of Pittsburgh and UPMC are the largest single submitter of tumor tissue; we’ve submitted in all cancer areas. I submitted in breast cancer, because that’s my specialty. The 140 women are breast cancer patients; that was the first use case. But since then, 600 tumors have been submitted in all cancers.
So that makes us unique, because we have 140 patients for whom we know all of their clinical information that sits in the UPMC system, and we know all their genomic or molecular information that sits within the consortium. And the nice thing about the consortium is that that data is all made public; it’s made public to everyone.
All 140 patients have the same mutation?