Standardizing EHR Research Queries Across Health Systems | David Raths | Healthcare Blogs Skip to content Skip to navigation

Standardizing EHR Research Queries Across Health Systems

October 25, 2013
| Reprints
NIH Health Care Systems Collaboratory is working to identify best practices for using EHR phenotypes in research applications

Did you know that how a health system in New York identifies patients with diabetes using data encoded in their electronic health records might be completely different than how one in California or Texas does?

I didn’t realize that until I had the chance to speak last week with Rachel Richesson, Ph.D., M.P.H., an associate professor of the Duke University School of Nursing, whose research involves work on “phenotype definitions” or standard EHR queries for finding patients with a given condition (e.g., diabetes or chronic pain) from EHRs. If multiple health systems can learn to identify populations of patients with particular phenotypes, they can participate in multi-site research studies or apply management strategies and interventions that have been shown to improve care in similar populations.

First, what is EHR phenotyping? It is the use of data captured as a byproduct of healthcare delivery to identify individuals or groups of patients with specified conditions. Researchers must carefully define queries for specific conditions so that that they can be applied across different data sets to locate patients that truly have the condition of interest, assuring that the results of research conducted across multiple health systems are valid.

Richesson told me that the creation of standard phenotype definitions could streamline the development of patient registries and enable consistent inclusion criteria to support regional surveillance and the identification of rare disease complications.

So, for instance, the ability to identify people with diabetes across healthcare organizations by using a common definition would have value for clinical quality and research.

But because the way in which providers collect patient information in EHRs is not standardized, designing phenotype definitions is often difficult and time-consuming. When identifying patient conditions, health systems tend to look at three basic types of structured data: ICD-9 codes, lab results and medications. “But there are lots of issues with those data sources,” Richesson said. For instance, every healthcare system uses ICD-9 codes for billing and reporting, but they don’t necessarily use them in the same way. “There are dozens of possible codes for diabetes and they are often used quite differently,” she added. Also, health systems might identify  medications use at different points in the care process, such as pharmacy orders, actual medication administration, or patient reports of medication use.

To illustrate this point, Richesson and her colleagues at Duke University conducted a  study in which they applied inclusion and exclusion criteria from seven research phenotype definitions as query algorithms to select sets of patients from the Duke Medicine Enterprise Data Warehouse, which encompasses data generated in the care of more than 4.3 million patients within the Duke University Health System.

Their research found that each definition pulled a different population of Duke patients. “The variation among diabetes cohorts raises the question of whether heterogeneous definitions are identifying populations with different clinical disease profiles (ie, different phenotypes) or if they are failing to identify the ‘same’ clinical phenotype consistently,” Richesson and colleagues wrote in their paper published online in the Journal of the American Medical Informatics Association in September 2013. “This research underscores the outstanding and important need for clearly defined phenotype definitions and consistent application in multisite projects. Currently, there are no standard EHR phenotype definitions for most chronic conditions, including diabetes.”

Richesson also is involved in a larger nationwide effort called the NIH Health Care Systems Collaboratory, which is working to identify best practices for using EHR phenotypes in research applications. With $11.3 million in funding from NIH, the Collaboratory consists of seven demonstration projects and several problem-specific working group “cores,” aimed at leveraging the data captured in “real-world” environments for research, thereby improving the efficiency, relevance, and generalizability of clinical trials.

The eight awards funded as part of the Collaboratory include the HCS Research Collaboratory Coordinating Center (Duke University, Dr. Robert M. Califf, Principal Investigator) and the following seven pragmatic clinical trial demonstration projects:
•    Decreasing Bioburden to Reduce Health Care-Associated Infections and Readmissions, University of California, Irvine; Dr. Susan Huang, Principal Investigator
•    Strategies and Opportunities to Stop Colon Cancer in Priority Populations, Kaiser Foundation Hospitals, Portland, Ore.; Dr. Gloria Coronado, Principal Investigator
•    Pragmatic Trial of Population-based Programs to Prevent Suicide Attempt, Group Health Cooperative, Seattle; Dr. Gregory Simon , Principal Investigator
•    A Pragmatic Trial of Lumbar Image Reporting with Epidemiology (LIRE), University of Washington, Seattle; Dr. Jeffrey Jarvik, Principal Investigator
•    Nighttime Dosing of Anti-Hypertensive Medications: A Pragmatic Clinical Trial, University of Iowa, Iowa City; Dr. Gary Rosenthal, Principal Investigator
•    Collaborative Care for Chronic Pain in Primary Care, Kaiser Foundation Hospitals, Portland, Ore.; Dr. Lynn DeBar, Principal Investigator
•    Pragmatic Trials in Maintenance Hemodialysis, University of Pennsylvania, Philadelphia; Dr. Laura M. Dember, Principal Investigator.

More information is available at:

These teams are working to demonstrate that pragmatic clinical trials can use EHR data and be conducted in partnership with healthcare systems to generate results that are applicable to “real world” patient populations. Special working groups within the Collaboratory are exploring strategies for assessing data quality and developing standards on how they query data from different EHRs. “The goal is to be able to look at Duke’s system and Kaiser’s system to identify populations with the same condition, and aggregate the data and make comparisons,” Richesson explained. “So far, that has been difficult. If research is important to us and we want to create a learning health system, we have to take these steps.”

In a web presentation in July, Dr. Robert Califf, Principal Investigator of the Collaboratory Coordinating Center, said that although interest in pragmatic clinical trials is enormous, there is still confusion about how to retrieve high-quality data from EHRs, especially across systems. But he noted that progress is being made on that front, including Mini-Sentinel, a project funded by the Food & Drug Administration that has been able to work through the “nuts and bolts” of national drug safety surveillance using existing electronic data from multiple healthcare systems.

Curating the data and getting it formatted for exchange is hard work, stressed Califf, who is Vice Chancellor of Clinical and Translational Research and Director of the Duke Translational Medicine Institute. “One of the most common problems is that there are a lot of people who are led to believe that if they buy the right information system, all of this will just happen. But actually what is involved is a lot of human effort to get the data ready in addition to having the right software.”


The Health IT Summits gather 250+ healthcare leaders in cities across the U.S. to present important new insights, collaborate on ideas, and to have a little fun - Find a Summit Near You!


See more on