Skip to content Skip to navigation

All Systems Down

October 27, 2009
by Kate Huvane Gamble
| Reprints
As more applications go live, the pressure on CIOs to ensure server reliability is growing

Terry Evans

Terry Evans

On Aug. 18, 2009, the scenario every CIO dreads became a reality for Chuck Podesta when Fletcher Allen Health Care suffered an outage that knocked the EMR offline for more than seven hours. And while it wasn't a hurricane or an earthquake that hit the Burlington, Vt.-based academic medical center, what happened was about as close to a perfect storm as an organization can get.

The 562-bed regional referral center was hit by one unpredictable event after another, all of which started when a tree fell on a power line on a sunny summer morning. “There was no car accident, no storm, nothing like that,” Podesta says. “That's kind of freaky.” Podesta says they are still doing some analysis, but that it looks like it sent a surge through the system.

And that, he says, was only the beginning for Fletcher Allen, which went live with the Epic (Verona, Wis.) EMR in June. For the next few hours the staff scrambled to figure out what went wrong.

The first point of failure occurred when the batteries in both strings of the uninterruptible power supply (UPS) failed, sending an electrical spike. “When that happened, our hardware on the storage side shut down, which it should do, because it's protecting itself,” says Podesta. “The second thing that happened is the failover software - which is not part of the Epic EMR - malfunctioned.”

Podesta says the system is designed so that servers from each site constantly “ping” each other to make sure they are awake. Automatic failover is built it so that if one server doesn't respond, all capability switches over to the other. However, in this case, the servers were still running, but the disk storage failed, preventing failover. But because this wasn't obvious right away, the staff needed to perform analysis to determine what happened, which further prolonged getting the EMR back up.

“If everything had worked the way it was supposed to, we should've had about a three to five minute downtime, because the failover would have gone to the second of our two data centers,” he says. But as the staff now knows, all plans go out the window with an unplanned downtime. “If you audited this and looked at what we did with failover and mirroring, you'd say, ‘Wow, this thing is never going to go down.' And that's what we thought. When you line up the scenario that happened to us, you realize that you can't replicate this in a million years.”

For Podesta, the ordeal demonstrated just how critical server availability is for an integrated health system, and how vital it is that the proper plans are in place.

As organizations become more reliant on electronic records, the need to protect data is increasing, says Terry Evans, CIO at Thibodaux Regional Medical Center, a 185-bed regional facility in southeast Louisiana. “Clinical information has become more real-time and crucial,” says Evans. “The access of information has made server reliability almost top on your list. You cannot afford to be down.”

Uptime strategies

Gary Weiner, manager of performance improvement and interim management at Dearborn, Mich.-based ACS Healthcare Solutions, says data protection should be a vital part of the CIO's overall strategy. According to Weiner, the focus should be threefold: maintaining high availability, having a disaster recovery plan, and identifying how long a system can be down before it negatively impacts the environment. “You need to determine what it will take, what it will cost, and what you need to do to ensure 100 percent reliability in case of failure,” he says.

Of the various methods used to protect data, the one he sees gaining serious traction in the healthcare industry is virtualization, a technology that can lower costs while improving availability, redundancy and recovery time. “In a virtualized environment, your total cost of ownership over a three to five year period can be reduced by 40 to 60 percent,” he says. “So it is a tremendous opportunity to not only save money but to enhance your environment.”

Another strategy is cluster servers, which can be configured so that if one fails, the applications it was hosting continue to run on one of the remaining servers. “It's becoming more and more common,” says Weiner.

At Thibodaux Regional, the IT staff utilizes multiple virtual servers to ensure clinicians always have access to data from the Meditech (Westwood, Mass.) EMR. Evans says if one of the servers is down or undergoing maintenance, the others pick up the slack. “It's more than just duality - this technology has given us ways where we can have absolutely no downtime.”

It has also created an environment that, while dynamic, does require upkeep when new IT applications are added to the hospital. “When new systems come on board, we add storage and we add servers in the virtual environment,” Evans says. “This way, everything has a backup strategy if it fails.”

Layers of protection

For Carilion Clinic, an eight-hospital, 1,125-bed organization, planning an aggressive Epic EMR rollout meant reassessing data storage capabilities. “You can't go to an electronic record unless you have a fail-safe IT backbone on which to run it,” says Daniel Barchi, senior vice president and CIO of the Roanoke, Va.-based system. “We thought that was important - so important, in fact, that in addition to our primary data center, we built a secondary data center as we were rolling out the EMR.”