Skip to content Skip to navigation

Standardizing EHR Research Queries Across Health Systems

October 25, 2013
| Reprints
NIH Health Care Systems Collaboratory is working to identify best practices for using EHR phenotypes in research applications

Did you know that how a health system in New York identifies patients with diabetes using data encoded in their electronic health records might be completely different than how one in California or Texas does?

I didn’t realize that until I had the chance to speak last week with Rachel Richesson, Ph.D., M.P.H., an associate professor of the Duke University School of Nursing, whose research involves work on “phenotype definitions” or standard EHR queries for finding patients with a given condition (e.g., diabetes or chronic pain) from EHRs. If multiple health systems can learn to identify populations of patients with particular phenotypes, they can participate in multi-site research studies or apply management strategies and interventions that have been shown to improve care in similar populations.

First, what is EHR phenotyping? It is the use of data captured as a byproduct of healthcare delivery to identify individuals or groups of patients with specified conditions. Researchers must carefully define queries for specific conditions so that that they can be applied across different data sets to locate patients that truly have the condition of interest, assuring that the results of research conducted across multiple health systems are valid.

Richesson told me that the creation of standard phenotype definitions could streamline the development of patient registries and enable consistent inclusion criteria to support regional surveillance and the identification of rare disease complications.

So, for instance, the ability to identify people with diabetes across healthcare organizations by using a common definition would have value for clinical quality and research.

But because the way in which providers collect patient information in EHRs is not standardized, designing phenotype definitions is often difficult and time-consuming. When identifying patient conditions, health systems tend to look at three basic types of structured data: ICD-9 codes, lab results and medications. “But there are lots of issues with those data sources,” Richesson said. For instance, every healthcare system uses ICD-9 codes for billing and reporting, but they don’t necessarily use them in the same way. “There are dozens of possible codes for diabetes and they are often used quite differently,” she added. Also, health systems might identify  medications use at different points in the care process, such as pharmacy orders, actual medication administration, or patient reports of medication use.

To illustrate this point, Richesson and her colleagues at Duke University conducted a  study in which they applied inclusion and exclusion criteria from seven research phenotype definitions as query algorithms to select sets of patients from the Duke Medicine Enterprise Data Warehouse, which encompasses data generated in the care of more than 4.3 million patients within the Duke University Health System.

Their research found that each definition pulled a different population of Duke patients. “The variation among diabetes cohorts raises the question of whether heterogeneous definitions are identifying populations with different clinical disease profiles (ie, different phenotypes) or if they are failing to identify the ‘same’ clinical phenotype consistently,” Richesson and colleagues wrote in their paper published online in the Journal of the American Medical Informatics Association in September 2013. “This research underscores the outstanding and important need for clearly defined phenotype definitions and consistent application in multisite projects. Currently, there are no standard EHR phenotype definitions for most chronic conditions, including diabetes.”

Richesson also is involved in a larger nationwide effort called the NIH Health Care Systems Collaboratory, which is working to identify best practices for using EHR phenotypes in research applications. With $11.3 million in funding from NIH, the Collaboratory consists of seven demonstration projects and several problem-specific working group “cores,” aimed at leveraging the data captured in “real-world” environments for research, thereby improving the efficiency, relevance, and generalizability of clinical trials.

The eight awards funded as part of the Collaboratory include the HCS Research Collaboratory Coordinating Center (Duke University, Dr. Robert M. Califf, Principal Investigator) and the following seven pragmatic clinical trial demonstration projects:
•    Decreasing Bioburden to Reduce Health Care-Associated Infections and Readmissions, University of California, Irvine; Dr. Susan Huang, Principal Investigator
•    Strategies and Opportunities to Stop Colon Cancer in Priority Populations, Kaiser Foundation Hospitals, Portland, Ore.; Dr. Gloria Coronado, Principal Investigator
•    Pragmatic Trial of Population-based Programs to Prevent Suicide Attempt, Group Health Cooperative, Seattle; Dr. Gregory Simon , Principal Investigator
•    A Pragmatic Trial of Lumbar Image Reporting with Epidemiology (LIRE), University of Washington, Seattle; Dr. Jeffrey Jarvik, Principal Investigator
•    Nighttime Dosing of Anti-Hypertensive Medications: A Pragmatic Clinical Trial, University of Iowa, Iowa City; Dr. Gary Rosenthal, Principal Investigator
•    Collaborative Care for Chronic Pain in Primary Care, Kaiser Foundation Hospitals, Portland, Ore.; Dr. Lynn DeBar, Principal Investigator
•    Pragmatic Trials in Maintenance Hemodialysis, University of Pennsylvania, Philadelphia; Dr. Laura M. Dember, Principal Investigator.

More information is available at: