How can analyzing social media help public health departments track important trends? Last year I reported on a presentation at the Health Datapalooza conference by Bechara Choucair, M.D., the former commissioner of the Chicago Department of Public Health.
He said Chicago has 16,000 restaurants and only 32 safety inspectors. One day Choucair noticed a tweet by someone who said they got sick eating at a specific restaurant. He asked an inspector to check it out, but he also started using the social media management software HootSuite to search for other tweets about people getting sick in Chicago after eating out. He then worked with volunteer civic apps developers at a hackathon to create an app, FoodBorneCHI that automates searching Twitter and routing the messages related to food-borne illnesses to inspectors, who follow up and encourage the Twitter user to file a complaint through the city’s Open311 system.
I was reminded of Choucair’s presentation when I saw that the Indiana University School of Nursing has opened a Social Network Health Research Lab on the Indiana University-Purdue University campus in Indianapolis. Staffed with researchers from the IU schools of nursing, informatics and computing, and liberal arts, the new lab’s goal is to leverage the advanced computing methods available at IU to better understand how health science researchers can make use of social network data.
I recently spoke with Chad Priest, R.N., the lab’s director and principal investigator. The lab is starting up with because of a data-sharing agreement with social media question and answer service ChaCha, which has agreed to provide IU researchers access to anonymous, de-identified questions from the ChaCha question and answer service in order to better understand a variety of health and cultural topics. The data, composed entirely of questions submitted by users on a wide range of subjects from 2009 to 2012, will provide researchers with access to the public’s questions about health and wellness. The researchers hope to match anonymous social network information with aggregate health outcomes to map health-related dialogue occurring through social media networks.
“We were thrilled when ChaCha offered up their data set,” Priest said. “It provides a great picture of what people are interested in and allows us to build some research capacity,” he added. “It will allow interdisciplinary colleagues to begin thinking about how access to social network data can inform research questions and help us shape the things we are interested in exploring and how we can answer questions using the data itself. We hope to use this ChaCha data to catapult ourselves into doing some hypothesis-free knowledge generation and have the computer point us to insights.”
The lab team includes nurses, physicians, informaticists, data scientists, geopgraphers, and an attorney, he said. Complex text-mining algorithms based on Hadoop computing platforms will be developed to facilitate research on the ChaCha data.
The lab’s initial projects will involve mining the ChaCha data set to understand how ChaCha users ask about health and wellness.
Priest said his past research has involved how hospitals and health systems can remain resilient in the face of catastrophic events. “One of the things that got me into social media in general was using Twitter analytics to do predictive modeling around public safety. The problem is that the science isn’t really there yet,” he said. “So I went looking for a way to do early work with partners. What if we found something reportable and see if the social networks can predict it? So if there are sexually transmitted diseases (STDs) reported in a certain place, is there a correlation to people talking about disease symptoms?” Priest said there has been some research in this area, but they found that the terms teens use to describe STDs are not clinical. “We want to get to that authentic patient voice. One of our first projects is going to be to try to understand how adolescents describe and talk about sexual health in their language, and that will help us build a lexicon that can be used for further analysis.”
The lab also has a sleep researcher on its team. ChaCha includes temporal and geographic metadata, Priest said, with time zone and ZIP code level data. “What time of day are people asking about sleep issues? That is rare data to get,” he said. “We are talking about 2 billion pieces of data. We want to mine it as quickly as we can for research purposes and to allow other researchers to access it.”