A particularly exciting development in the world of information technology has been the creation and forward evolution of Internet2, a community of U.S. and international leaders in research, academia, industry, and government, who are collaborating to create new channels for communication in order to support research and development.
As its website indicates, the collaborative known as Internet2 encompasses 220 U.S. universities, 60 leading corporations, 70 government agencies, 38 regional and state education networks, and “more than 100 research and education networking partners representing over 50 countries.”
Among the U.S. universities that is farthest along with regard to connecting genomic research and patient care delivery is the University of Florida at Gainesville, where developers have been connecting genetic and genomic research with cancer treatment. As part of the technology backbone for this work, the University of Florida folks and other Internet2 collaborative partners are being supported by the San Jose, Calif.-based Brocade.
Recently, HCI Editor-in-Chief Mark Hagland spoke with Gigi Lipori, MT, MBA, the University of Florida Health’s senior director for operational planning and analysis, and Erik Deumens, Ph.D., the University of Florida’s director of research computing, about the work that they and their colleagues are currently involved in. Below are excerpts from that interview.
Can you share with me a bit about the origins of Internet2?
Erik Deumens, Ph.D.: When the Internet was invented decades ago, it grew out of an academic and military background. But in the beginning of the 2000s decade, universities got together and said, we’ve got this Internet, but we can no longer do research on it anymore. So they created a new consortium with 200 members, and called it Internet2, for second-generation Internet. And the first focus of activity was to try to get faster speed.
So of the more than 400 organizations involved, how many are universities?
About 150 of them; the rest are companies, like Intel and Dell, as well as some research institutes like the Broad Institute, a group of people with interest in moving data. The Internet2 collaborative is focused on increasing speed; what has happened in the last two years is that Internet2 received a grant from the federal government to create a 100-gigabit-per-second backbone. The typical Internet connections are 10 or 20 gigabits between cities. So our collaborative came up with the concept of the Internet2 platform. Currently, about 26 universities and research institutions have committed to become an Internet2 innovation platform.
There are three conditions: first, you have to get from wherever you are to the closest backbone site. In the case of the University of Florida, we have to make a connection between Gainesville, where the university is, and Jacksonville, the nearest site. The second condition is to have a “science DMZ.” A DMZ (demilitarized zone) is a separate network—it means that the typical security people allow certain things to happen—the researchers, if they have a particular project, there’s a quick, separate approval process. Because typically, an institution is protected by strong firewalls; so the firewalls are lowered somewhat, so the research can keep on happening. And the reason for doing this is that if you tried to send a terabyte of data from one university to another, and it had to go through a firewall, and the firewall had to inspect every packet, that would be way too slow.
So if you know from the beginning you were going to send a terabyte data set, all belonging to the same stream, and if you inspect the first packet, you can say, the whole data set is OK. And the third requirement is that you have to do active research in software-defined networks—a new technology where you’re trying to create more efficient protocols. Currently, things are governed by the IP—Internet protocol. And if you send a packet, the first couple of bytes have the format of the address in them, and every router across the Internet has to read this address information; but we all know that that protocol was invented 30 years ago, and we really would like to make that more efficient. The Internet is very stable; and has never crashed as a whole yet. But we all know that the Internet protocol is dated, and we’d like to test a new protocol; and you certainly cannot test it on the current production network, because it would create chaos. So with software-defined networks and DMZs, you’re trying to create a separate platform where researchers are trying to innovate.
So then what happened is that the University of Florida made an investment and got a grant. And we were trying to be early adopters; it turns out the University of Florida is the first university in the nation and the world to meet all three criteria of the innovation platform. And Internet2 was pretty excited, and they issued a press release, and said, we want to explore other areas the university is working on, in terms of the use of its science DMZ for healthcare. So the University of Florida—and I made this statement at one of the meetings—the University of Florida will now build a storage infrastructure that is HIPAA-compliant, that will allow people to do research and use fact connectivity for genomic and protected health information.
So how far are you on the healthcare side?
I basically told you my end; my background is nuclear physics and computational chemistry. I’ve been working for many years to build infrastructure for chemists, physicists, and engineers, and now I’m responsible for building these high-performing networks and storage systems. In the last year, we’ve made a push forward to get to the top 10 in the nation as far as infrastructure of research computing is concerned. And now that it’s built, people want to use it to work with protected health information.
And Gigi has been working for multiple years on building infrastructure and protocols, for protected health information.
Gigi Lipori: So we’ve been working for a long time on warehousing data and providing it in a way that researchers can have direct access to it in HIPAA-compliant fashion. So we’ve gone along slowly and steadily to reassure people that we’re doing it in a HIPAA-compliant way. So as much as it’s a technical exercise to bring together research data, hospital data, anesthesia data, etc., in one place and work with it, the other part of it is sharing the data with researchers so they can form a hypothesis, and subsequently, that they can go back to the IRB (Institutional Review Board for the Protection of Human Subjects) for approval to link research to patient care.
What we do is NIH-funded [funded by the National Institutes of Health], and is called I2B2 (Integrating Biology to the Bedside), so they do their queries there, find a cohort and say to the IRB, I want to do this study on this set of patients, say 300 or 500 of patients, and then the IRB determines whether it’s appropriate, and once they do, we can broker the data to them. So we gather data and put it into this integrated data repository; we allow the researchers to query this using an I2B2 tool, in a way that’s protected and blinded. When they have a good hypothesis, they go back to the IRB and ask for permission to get protected data; then, upon receiving permission, we have an honest-broker process—there are five people in the university certified to give data back to them, and they’re all in my group. We broker that data back to them in a virtual environment that’s also safe, so they can’t drag that protected data anywhere else. But the beauty of it is that, before, they couldn’t explore these hypotheses safely, so they would just blindly go in and inspect medical records, which is dangerous, because you could end up seeing things you weren’t supposed to. This is safer for the human subjects, and it’s better for institution, because we can provide the data more efficiently.
Erik: This is very complex; it’s not something any institution can do in just a few months. And at Internet2, we were having these discussions; and everyone knows that it’s very, very hard, and we’ve been working on this for years. But we feel we’ve made some really good progress; and we hope that in the next couple of years, we will be able to scale this to the size that many of our health researchers are dreaming of doing. But you have to build all these details, make sure they all fit, and that everybody’s in agreement; because otherwise, if someone disagrees with it, the whole thing crashes again.
And this kind of development work could crash your EHR, correct?
Gigi: Yes, it could crash your EHR; and when you’re trying to look across a million patients to try to find a particular phenotype, there’s definitely an art to engineering all of this.
When will the average academic medical center be able to do all this?
Erik: I think it will be sooner rather than later, because there’s incredible pressure to get this done. There are several other universities doing something similar to what we’re doing. And the creation of an Internet2 data path will allow institutions to talk to each other; and they’ll quickly figure out ways to share information and expertise; we will gladly do that. This could turn into a watershed moment. And that is of course what Internet2 as an organization is hoping, that organizations can learn from each other and make progress. And people will write white papers and documents and share software that you can download. So I think it could be within the next couple of years; definitely by 2020, this is going to be standard.
Do you see genomic data use as being accelerated soon?
Gigi: You have the holy grail of phenotype-genotype matching, which is what people are after. And we pull in data from personalized medicine, so that we can have that genomic data available to do some genotype-to-phenotype matching, particularly around Clopidogrel (Plavix), so they’re able to do that; and if they’re looking for patients and trying to find all the breast cancer patients with a particular phenotype and who have allowed their specimens to be used, we can address that as well. It’s all good when you’ve got a cohort of 500 patients, but you need 50 samples in storage.
From my perspective, the key takeaway here is that what’s allowed us to compete is that people with special expertise have built these components we’re bringing together, so I’ve created this I2B2 integrated data repository piece, Erik’s created the high-performance computing infrastructure, and we bring these pieces together, and we’re trying to create this large thing that can be used as a service in the university. And I think that’s been something that’s allowed us to really go forward in the university.