Skip to content Skip to navigation

Researchers Work to Define, Harmonize, Share EHR Phenotypes

September 7, 2016
by David Raths
| Reprints
NIH Research Collaboratory group seeks to ease use of EHR data to identify populations for research
Click To View Gallery

As an early step in the development of a learning health system, the National Institutes for Health (NIH) is sponsoring large-scale pragmatic clinical trial demonstration projects that rely heavily on EHR data from multiple health systems. In order to promote transparency, reuse and data quality, informatics researchers and data analysts are working to identify best practices and advocate for cultural and policy changes related to using EHRs to identify populations for research.

Rachel Richesson, Ph.D., M.P.H., associate professor of informatics in the Duke University School of Nursing, recently gave an online presentation to describe the work of the NIH Research Collaboratory’s Phenotypes, Data Standards, and Data Quality Core group.

First Richesson described the clinical information system landscape that researchers face. “There is little that is standardized in terms of data representation in EHRs today,” she said. And what appears to be standard is not always so. Each health system has multiple sources of ICD-9 and ICD-10 codes, lab values, and medication data.

Also, EHRs have no standard representation or approach for phenotype definition — that is, a way to define populations with certain conditions such as chronic pain or uncontrolled diabetes.

Additionally challenging is that multi-site pragmatic clinical trials pull information from many ancillary systems as well as the EHRs into a single research database to support the study. A common process used in data warehouses is extract, transform and load (ETL). This has to happen for each organization contributing data to the trial, and there are many sources of error that can be introduced or sources can be missed completely. One trial studying colon cancer has had trouble identifying colonoscopies done outside the health center because they are embedded in PDF or narrative reports but not coded in data.

Besides co-leading the Phenotyping, Data Standards, and Data Quality Core, Richesson is also the co-lead of the Rare Diseases Task Force for the national distributed Patient Centered Outcomes Research Network (PCORnet), specifically promoting standardized EHR-based condition definitions (“computable phenotypes”) for rare diseases, and helping to develop a national research infrastructure that can support observational and interventional research for various types of conditions. Before joining the Duke faculty in 2011, Richesson spent seven years as at the University of South Florida College of Medicine directing strategy for the identification and implementation of data standards for a variety of multi-national multi-site clinical research and epidemiological studies housed within the USF Department of Pediatrics, including the NIH Rare Diseases Clinical Research Network (RDCRN) and The Environmental Determinants of Diabetes in the Young (TEDDY) study.

In her recent presentation, she gave a few examples of the use of EHRs in the Collaboratory trials:

• The Collaborative Care for Chronic Pain in Primary Care (PPACT) needs to identify patients with chronic pain for the intervention. This is done in different EHR systems using a number of “phenotypes” for inclusion – e.g., neck pain, fibromyalgia, arthritis, or long-term opioid use. Harmonizing that data has proven challenging.  “They have had to monitor large groups of codes that represent these conditions, particularly after the change to ICD-10 to make sure there were no changes in coding behavior,” she said.

• The Strategies and Opportunities to Stop Colorectal Cancer (STOP CRC) trial needs to continually identify screenings for colorectal cancer from each site, so it must maintain a master list of codes (CPT and local codes) related to fecal immunochemical test orders across multiple organizations.

 • The Trauma Survivors Outcomes and Support (TSOS) trial needs to screen patients for PTSD on Emergency Department admission. Yet the wide variety of clinical information systems used in the 24 sites’ emergency departments have varying ability to screen for substance-related disorders and mental health. (Richesson’s Ph.D. dissertation from the University of Texas Health Sciences Center in Houston involved the integration of heterogeneous data from multiple emergency departments.)

“These examples give an idea of how crucial EHR data is to the functioning of these trials and underscore the need to be active and iteratively reach out to IT staff to understand what data it collects and work flow,” Richesson said. “That is a universal experience of the projects in the Collaboratory.”

Transparency and Reproducibility of Pragmatic Clinical Trials

Ultimately, these clinical trial demonstration projects are going to be reporting their results in journals and describing the characteristics of the patients in the intervention and control groups. They will need to point to definitions for diabetes, hypertension, etc. Today there is a wide variety of phenotype definitions based on lab codes, medication codes, or any combination of the two.

“There are huge variety in how conditions could be defined using EHR data,” Richesson explained. “To support transparency and reproducibility in these pragmatic clinical trials, we want to be able to allow readers and consumers to identify what was the definition and how data was obtained and used. Having an explicit phenotype definition would certainly be useful in this area. Our Core has been working toward explicit reporting in these trials.”