In February, researchers at the National Institutes of Health (NIH) and George Washington University announced that scientists running genomic analyses at George Washington University’s Colonial One High Performance Computing Center were set to pilot ultra-high-speed, 40-gigabit-per-second data transfers from the NIH’s National Library of Medicine (NLM), using both organizations’ new 100-gigabit-per-second links, to the Internet2 Network.
In a press release posted on Feb. 26, Don Preuss, who heads the systems group at the NLM’s National Center for Biotechnology Information, said, “Biomedical researchers need high-bandwidth access to the extremely large data sets used in today’s medical research. Our new 100-gigabit connection to the Internet2 backbone will provide researchers at GW and other research centers with state-of-the-art connectivity.” What’s more, said Raja Mazumder, Ph.D., associate professor of biochemistry and molecular medicine at the GW School of Medicine and Health Sciences, and co-developer of the High-Performance Integrated Virtual Environment (HIVE), a genomic analysis platform, said, “High-speed transfers via Internet2’s Network will enable us to provide our genomic clients with faster results, ultimately hastening discovery and therapeutic decisions.”
Working with the NIH/NLM and George Washington leaders and researchers are leaders and developers at Internet2. Internet2® is a member-owned, advanced-technology community founded by leading academic institutions in 1996, and providing a collaborative environment for research and educational organizations to solve common technology challenges.
Shortly after the announcement, HCI Editor-in-Chief Mark Hagland interviewed Dr. Mazumder and Michael Sullivan, M.D., who is associate director of health sciences for Internet2. Dr. Sullivan, whose clinical background is as an emergency physician and whose administrative background is as the CEO of an emergency physician group, and who has 25 years’ experience in medical informatics, as a medical software developer and health IT consultant. Hagland spoke with Drs. Sullivan and Mazumder regarding recent developments at Internet2 and the George Washington University School of Medicine and Health Sciences, and the implications of the work being done in the HIVE program and across Internet2, for genomically facilitated patient care going into the future. Below are excerpts from that interview.
Can you share with us what you believe is the significance of this recent collaborative work involving Internet2, branches of the NIH, and George Washington University?
Michael Sullivan, M.D.: I think this particular collaboration that we’re talking about here is an example of two technologies working in a way that support one another. As we were describing earlier, GW needs to exchange massive amounts of information with NCBI, and Internet2 provides a network connection between those two organizations that enables the transfer of really massive amounts of information, quickly. And once the pieces of information arrive at GW, then there is a genomics platform that Dr. Mazumder has developed, that also accelerates the analysis of that data. So our network is fast, and his platform is fast.
Raja Mazumder, Ph.D.: At this point, the data is from NCBI to us; my lab is not yet generating data yet.
So data is traveling through the Internet2 channel from NCBI to GW, and then Dr. Mazumder, it arrives at your platform?
Sullivan: What we have in common is that both Internet2 and Dr. Mazumder’s platform, called HIVE, remove bottlenecks. High-Performance Integrated Virtual Environment.
Can you give us a sense of the timeframes around all this activity?
Sullivan: Internet2, again, is a coalition of 250 universities, government agencies, and industry partners, and was formed 15 years ago. And it has connected those universities to NCBI for most of that period of time. NCBI recently upgraded its bandwidth, the size of its network connection, to Internet2; and that happened at the end of 2013. And in terms of George Washington University, there has also been a network upgrade, but I am less clear on those dates, but there’s also been the creation of a new network here in Washington that acts as a bridge between them. And that’s also been in the past few months.
Mazumder: Dr. Sullivan is referring to the Capital Area Advanced Research and Education Network—CAAREN. In terms of specific elements of this, Colonial One launched in June 2013. CAAREN launched in December 2013. HIVE began testing on Colonial One in September 2013, and started using CAAREN for transfers in December.
We’ve been working on it for a long time, more than a decade; so, like any complex software, there are bits and pieces that get updated all the time. We use it and have a platform, and are using it as the demands and needs change over the next decade. Here’s an example: we recently were working on a proposal where the requirements for data analysis and storage were 2.5 petabytes. And over the next few years, just for that one project, the usage and storage platform needs could increase ten to 100 times. And we have to accommodate to and adapt to those changes, but our platform deals with it once the data is inside the system.
My research involves working with cancer genomic data to identify mutations, or biomarkers, that can be used for diagnostics or therapeutics. We recently put out a database called Biomuta. We use the HIVE platform and others, and are working with mutations associated with specific cancer types. As you can imagine, this type of database will become more and more useful as more genomic data becomes accessible. So you have a patient, you sequence their genome, and then you scan it against Biomuta to look for diagnostic markers.
It’s just the last couple of months in which all this has been working together?
Sullivan: the new pieces are the Capital Area Network, and also, there is a new high-performance computing center, called Colonial One. And there is a new connection from NCBI, through the Capital Area Network, to Colonial One, that creates a very fast pathway to the genomics platform called HIVE, that was developed by Dr. Mazumder.
Can you explain the technical aspects of how the data transfers take place?
Mazumder: HIVE manages its own data transfers using industrial standard protocols such as HTTP/HTTPS, FTP, SCP and proprietary tools such as Aspera Connect. Transfers from participating Internet2 partners automatically goes over the new CAAREN connections.
How do all of these existing elements (Colonial One, CAAREN & Internet2) fit together to support genomics research and future clinical breakthroughs?
Mazumder: CAAREN enables high-speed transfer of research datasets, such as the genome sequences used in HIVE, to and from the GW campus networks. Colonial One provides a dedicated on-campus high-performance computing environment for GW researchers, and this includes work to integrate the HIVE application to provide efficient and simple genomics search tools for scientists. This combination provides faster time-to-results and allows researchers greater flexibility in partnering with external institutions.
And what will happen in the next year?
Mazumder: Now that the pieces are together, our group is running some tests, using all these pieces—Colonial One, Internet2, and HIVE, and finding out how fast this runs, compared to other channels. So we’re going to be able in the next three months to test this channel. I don’t know if other faculty at GW are doing it; but my group is getting data from NCBI and using Internet2, Colonial One, and HIVE, together, along with the CAAREN network.
What areas of cancer are involved?
Mazumder: Two things our group is interested in. One is cancer: we’re identifying mutations associated with certain cancer types, and we’re providing data; that’s about early detection of cancer. The second thing our group is interested in is metagenomics. If there’s an infection in the upper respiratory tract and want to figure that out, you do metagenomic sampling and testing. I envision that this could be great for infections that are difficult to treat, or in which physicians suspect there could be the presence of an organism they aren’t aware of; in other words, for pathogen detection.
So looking at pathogen detection is on the horizon?
When might some of this research be widely available?
Mazumder: That’s a difficult question to answer. I was at a meeting sponsored by the National Cancer Institutes; and there are different estimates as to when some of this biomarker-related research might be linked to patient care. It’s a time-consuming affair. A lot of research is going on. There is some mutation data already used in the clinic, but large-scale application, it’s difficult to predict. It could be a few years, or five to ten years. I don’t know, you’d have to survey scientists; but everyone sees it within the next decade, that this will be very common, especially for diseases that are difficult to diagnose and treat.
Sullivan: In talking to leaders at cancer centers, most of them have already begun plans to deploy high-performance computing facilities, or to partner with HPC centers, to do clinical genomics. And most of them expect that they will be doing genomic sequencing of basically all their cancer patients in the relatively near future—on the order of the next few years.