Data extracted from cloud-based electronic health record (EHR) systems in combination with a machine learning algorithm can provide near real-time regional estimates of flu outbreaks, according to a study published in Nature Scientific Reports.
Researchers Boston Children’s Hospital’s Computational Health Informatics Program, Harvard Medical School and Harvard School of Engineering and Applied Sciences examined whether EHR data collected and distributed in near real-time by an electronic health records and cloud services company, athenahealth, combined with historical patterns of flu activity using a suitable machine learning algorithm, could accurately track real-time influenza activity (as reported by the U.S. Centers for Disease Control and Prevention, CDC), at the regional scale in the United States.
According to researchers, up to 50,000 people in the U.S. die each year by influenza-like illness (ILI). Therefore, monitoring, early detection, and prediction of influenza outbreaks are crucial to public health. “Disease detection and surveillance systems provide epidemiologic intelligence that allows health officials to deploy preventive measures and help clinic and hospital administrators make optimal staffing and stocking decisions,” the researchers wrote.
According the researchers, many attempts have been made to design methods capable of providing real-time estimates of ILI activity in the US by leveraging Internet-based data sources that could potentially measure ILI in an indirect manner. “Google Flu Trends (GFT), a digital disease detection system that used Internet searches to predict ILI in the US, became the most widely used of these non-traditional methods in the past few years12. In August of 2015, GFT was shut down, opening opportunities for novel and reliable methods to fill the gap,” the study authors wrote.
Researchers built a machine learning model that “optimally exploits the data by building a system as timely as GFT used to be, yet as stable and reliable as CDC validated data sources," the study authors wrote. The model was named ARES, which stands for AutoRegressive Electronic health record Support vector machine.
For the study, researchers, in collaboration with athenahealth’s research team, used the vendor’s cloud network, which consists of patient-provider encounter data for more than 72,000 healthcare providers in medical practices and health systems nationwide. The database includes data for more than 64 million lives and electronic health records for more than 23 million lives. Researchers obtained weekly total visit counts, flu vaccine visit counts, flu visit counts, ILI visit counts and unspecified viral or ILI visit counts. The athenahealth ILI rates are based on visits to primary care providers on the athenahealth network, for the period between June 2009 and October 2015. The study authors noted that the athenahealth data was available at least one week ahead of the publication of the CDC’s ILI reports.
The study authors concluded, “In this study we have shown that EHR data in combination with historical patterns of flu activity and a robust dynamical machine learning algorithm, are capable of accurately predicting real-time influenza activity at the national and regional scales in the US.”
And, the study authors noted, “Our methodology provides timely flu estimates with the accuracy and specificity of sentinel systems like the CDC’s ILI surveillance network. This demonstrates the value of cloud-based electronic health records databases for public health surveillance at the local level.”