Skip to content Skip to navigation

With Big Data Comes Big-Time Data Governance: UPMC’s Forward Push

August 16, 2013
by Mark Hagland and Rajiv Leventhal
| Reprints
UPMC's Terri Mikol, UPMC's director of data governance, talks about the challenges and opportunities in data governance

Last October, the 20-plus-hospital University of Pittsburgh Medical Center (UPMC) health system launched a massive big-data initiative, one that will cost the organization more than $100 million over the next five years. With healthcare and healthcare IT leaders in that organization collaborating to embrace and leverage their massive store of data (senior vice president and CIO Dan Drawbaugh recently noted internally that UPMC has five pedabytes of data enterprise-wide), data governance has come to the fore as an issue of urgent importance.

As a result, earlier this spring, the organization launched a formal Data Governance Program, with a formal Data Governance Council, and a broad objective to support data policy-making across the massive, far-flung organization.  Recently, Terri Mikol, director, data governance, at UPMC, spoke with HCI Editor-in-Chief Mark Hagland and Assistant Editor Rajiv Leventhal regarding the organization’s data governance initiative, and its implications for data governance across healthcare. Below are excerpts from that interview.

From your perspective as one of the key leaders of the data governance initiative at UPMC, what do you see as the core challenges and needs in this area, and what is your vision and the vision of your team around this?

We’ve internally created five statements that really frame why we’re doing data governance and how, as well. We began our program to protect our investment we had made in data analytics, and we were going to build a large data warehouse, and we needed a plan. The three elements of data governance are metadata, data integrity, and master data. So the crux of our program is, without these three components, we will create yet another pile of data. In healthcare, we have lots of piles of data, but to bring them to life and turn them into assets.

And how do you define those three elements?

Let’s start with master data. In our organization, we actually maintain a provider/physician master [directory] in multiple places, and the data we keep about our physicians doesn’t match. So something as simple as, what is a physician’s specialty? It’s a problem. The terms don’t match across the system. So master data is primarily data maintained in multiple places but is not in sync, but you have to do mappings and consolidations, and harmonizations, and the physician master is our best example in healthcare. We’re also working on master data for our patients—things like who is their PCP, and what is a consistent name for a patient?

And how do you define metadata and data integrity?

Metadata has to do with information about our data—where is it in our organization? Where does it originate, where is it collected, and where do we move it, and how do we use it? And what are the definitions of core metrics and data we use, and what do we send out of the organization to share? The tasks are endless. We estimate that we run about 1,200 different applications at UPMC, so we’re building an official inventory of those, to figure out how they’re being used. So in addition to the application inventory, which helps us determine where all of our data is collected, we will then move to where our data is being interfaced and shared. This is important in data governance, because every time data is moved, there’s the potential for it becoming corrupted. We want to minimize movement and keep data sound.

In addition, we’re beginning to identify all the teams that do reporting at UPMC, and we’re going to have them publish lists of the core reports they provide; so we won’t publish ALL the reports that are done in a list, but this will help us know where everyone should go for report information. We’ll never be done with metadata, but that’s what we’re working on now.

In terms of data integrity, we’ll always have problems, because we have 1,200 applications running. So we’ll be working to expose and make them transparent. We can’t fix them all. But we can start to make smart choices about what we fix. Most of the data integrity issues start at the collection point, so a lot of our applications are rather legacy and don’t have the best edits in place; changing some of these systems can be very costly, so we have to make smart choices.

What is the people-governance process around all this?

So, I have three bullet points around people. First, we’re building for the first time a shared stewardship around our data. Historically, data has been seen as IT’s responsibility, but we are now sharing the stewardship with the business, and the bulk of the ownership involves people outside IT, both business and clinician leaders. So our hope is to build data analytics integrity here. Second, we hope to change our culture at UPMC over the next five years, by growing the number of people who attain data analytics capability.