The University of Chicago has announced that it is collaborating with the National Cancer Institute (NCI) to create a comprehensive computational facility, which stores and harmonizes cancer genomic data generated through NCI-funded research programs.
The establishment of the NCI Genomic Data Commons (GDC) will aim to expand access for scientists around the country, speeding up research, with the goal that it will in turn lead to faster discoveries for patients. The GDC will provide an interactive system for researchers, and will provide resources to facilitate the identification of subtypes of cancer, as well as potential therapeutic targets.
“The Genomic Data Commons has the potential to transform the study of cancer at all scales,” Robert Grossman, director of the GDC project and professor of medicine at UChicago, said in a news release statement. “It supplies the data so that any researcher can test their ideas, from comprehensive ‘big-data’ studies to genetic comparisons of individual tumors to identify the best potential therapies for a single patient.”
NCI has funded a number of large research projects that have collected genomic data on tumor types from more than 10,000 patients. However, the data for these studies are scattered across different locations and are in different formats, making it challenging for researchers to perform analyses.
As such, the GDC will provide an expandable, modern informatics framework that uses standards to make raw and processed genomic data broadly accessible. It will harmonize and centralize existing NCI datasets through an approach to data storage and analysis similar to what is used by companies such as Google and Facebook. The GDC will eliminate a major chokepoint, streamlining access to data for researchers regardless of their institution’s size or budget—effectively democratizing access to the material. It will also enable previously unfeasible collaborative efforts between scientists.
The GDC also creates a foundation for future cloud-based technologies that one day will allow researchers to analyze large-scale datasets and perform experiments remotely. The open-source software being developed by the GDC has the potential to become a model for data-intensive research efforts for other diseases, such as Alzheimer’s and diabetes, which would greatly benefit from similar large-scale, data-driven approaches to develop cures, officials say.
The GDC builds upon the Bionimbus Protected Data Cloud, a pilot cloud-based system developed by Grossman that was the first to be approved by the National Institutes of Health to hold cancer genomic data from projects such as The Cancer Genomic Atlas.