Getting Through a Ransomware Attack - Day 2

Episode 116 • 16th June 2021 • UnHack with Drex DeFord • This Week Health

This transcription is provided by artificial intelligence. We believe in technology but understand that even the most intelligent robots can sometimes get speech recognition wrong.

Today in Health it, the story is getting through a ransomware attack. My name is Bill Russell. I'm a former CIO for a 16 hospital system and creator of this week in Health IT a channel dedicated to keeping Health IT staff. Current and engaged Health lyrics is my company and I provide executive coaching and advisory services for health leaders around technology and it.

If you wanna learn more, check out health lyrics.com. All right. Here's today's story. This is a continuation from yesterday. We have a three-part story going on here. I found a 50 minute video out on YouTube that is recording from the CIO of Sky Lakes Medical Center. I. In southern Oregon talking about their six month journey after the ryu ransomware attack that hit their health system back in the fall.

And yesterday we described the experience, what it's like after a ransomware attack and the experience of, of going through an almost 30 day major system. Downtime today, we're going to walk through the steps that they took and are taking to bring everything back online and tomorrow. The so what for this story?

What did we learn and what do we take away from the story? Again, I wanna say to anyone who's thinking of sharing this kind of story, I. That I'm surprised this video is out there. I was taught not to share our security posture with the media for public consumption. While I applaud the spirit of this to help the industry understand and respond to future attacks, I would recommend using, uh, different types of forums, maybe not such a public forum.

Use CHIME or himss, do non-recorded sessions with your peers. I'm also surprised that the legal team allowed this, considering that the health system may be facing class action lawsuits. And I was never able to comment publicly on breaches within our health system. No one was actually for that matter, except for maybe one or two people for the entire health system.

th,: 2020

Where they talk about the things that actually happen in a lot of detail, but suffice it to say, clicked on it went to an external zero day site, uploaded the payload to the workstation. The workstation then executed a PowerShell command cobalt strike, and the ransomware was off and running it. It propagated laterally across the network and it propagated across all the major systems.

So now we're gonna talk about what they did. So the first thing is they started working with their insur cyber insurance company. They identified two companies to come in, Talos, which is a Cisco company. They were there to attempt to determine the root cause and Kivu, KIVU, to recover the offline systems.

The first step, and they broke this down into five things. The first steps they did is they segmented the network and shut off all the systems. Right, so you have to protect against any further infections, decrease the lateral movement of it, and make sure that no additional systems get infected. The next step was to notify Asante.

Now, if you remember, they are a community connect site, which means that they are a sub. EHR of a major EHR. So Asante Health is the larger EHR. They run the main Epic implementation and they run their Epic implementation off of that one. So they contacted Asante. There was a significant time gap here.

They were, I think they figured out that it was malware sometime in the middle of the night. And they, uh, notified Asante at, uh, 7:27 AM the next day. So. They disconnected from Asante and all the access Citrix, you name it was disconnected. The next thing is they established a command center. Which is what you normally do during these kinds of events, and one of the things they talk about at length here was just the communication mechanisms had been completely disabled, right?

No email, phone systems are struggling, you name it. So they're using cell phones, texting, and WebEx teams, which is a external system to the health system. Which they had just finished installing. Otherwise, they would've just been on, uh, phone and text, and then a lot of face-to-face meetings, quite frankly.

So the next step was to figure out a plan. All right, how do we start bringing these things back on? And they started making all sorts of different lists around prioritizing things to bring back online. Now Southern Oregon is in a area that gets a fair amount of snow. , right? And so they knew they needed to bring systems online, so they started focusing on cancer.

'cause cancer doesn't take vacation. That's a quote from this. And sure enough, snowstorms coming. And people come to the CIO and say, look, all of our facilities have these systems are heating equipment and stuff to heat our sidewalks and whatnot. It's all controlled by computers. And we have no way of turning these things on.

And so again, you're gonna have those kinds of situations that pop up all along the way. They were focused very much on the clinical systems, but there was also safety systems and those kinds of things like the heating system that have been computerized over time. So that has to be taken care of. The next thing was, Hey, what's the process for bringing these things back on?

We have what? What? They termed the dirty VA, so that's the vlan where the ransomware code actually exists. Then what they decided was, we're gonna make a staging VA, and this is a clean VLAN that they could bring things into and see if they're able to bring these systems online. And bring these systems online safely without propagating the ransomware.

And then if they were able to establish that in a staging environment as clean, they would then move it to a clean vlan where they can then bring those things back online. Alright, so that was the process that they determined to use. Now you have to remember, they shut down every pc, every server, you name it.

So they had to start rebuilding or replacing over 2,500 PCs. and there were different, obviously, images of windows that had to be rebuilt, and each one of those devices took about 45 minutes to an hour and a half process, and they all had to be rebuilt. Everything on site offsite, PCs, anything that was on the network at the time had to be rebuilt.

In order for them to be reconnected to the system. All right? Then you had third party systems, smart pumps, diagnostic imaging, MRIs, Pyxis machines and whatnot. Those had to go through a, an extensive process to bring those back online. I. So part of the challenge here was they had to prove that the network was safe to be on once again.

And in order to do that, they had to provide, you know, a lot of information back to the, the vendors that they were working with. They had to provide information back to Asante. Uh, who was their, again, their EHR partner and others? Um, they, the partners came back with some stipulations and some different things that they had to do.

They had to change their, uh, password policy and get that changed across all of their systems, uh, which they did as they were bringing them back online. . The, uh, first system that they really targeted was the cancer systems, which we talked about earlier. And on November 7th, they were able to bring some of those systems back online.

So they had to again, prove to that vendor that they could bring those online safely and that they were not compromised, their passwords were in in place, that they were on a clean network that was going to propagate or potentially expose any of that information. The next thing obviously was they were laser focused on the EHR itself, and that required them to work out a memorandum of understanding with Asante on what Asante would require for them to reconnect to the EHR.

And Asante gave them four bullet points, essentially. Third party clean bill of health. In other words, a third party had to come in, look at the system. And gave them a clean bill of health. The second was they had to agree to an annual risk assessment with pen testing, penetration testing. That is a c was an incident notification more timely.

So what you saw was that that gap of about five, six hours potentially where the ransomware was propagating throughout their network, uh, ante was asking for a more timely identification and notification of . Any kind of event that was going on on their network. And then finally, they wanted a NIST version two framework, security, posture and culture.

So they required that memorandum of understanding to be signed and to be worked on, and I guess to be verified moving forward in order to connect. But when you think about it, all right, so they've been offline on paper for about 23 days, and if you could imagine that this is pretty much a new Epic go live.

I mean, granted, there's a lot of information that's already there. They don't have to rebuild, they don't have to rebuild order sets, all that other stuff. But they had an awful lot of information that was being generated on a daily basis on the patients that were already in the hospital, as well as they had to reconnect all the systems.

They had to verify all the systems, and it was a pretty arduous process. . . And so they had to put a full go live plan back together, share that with Asante and move forward with that. And they were able to do that after about 23 days. Their PAC system was not as lucky. He actually tells the story when they were hit with the ransomware attack, avoided the warranty, and he was sort of incredulous on this, like, you know, can you believe they abandoned us in our time of need?

The reality is that the PAC system itself. Can't control whether you're going to be hit by a ransomware attack. Now, if it was their fault that you were hit by a ransomware attack, I am incredulous as well. But since it was not their fault and it voids their warranty, you can almost understand their position, which is there's a situation that evolved on your network, which, which essentially.

Did certain things to our PAC system and brought 'em offline. We cannot be held responsible for that. So that's, that was their position. It was a legacy PAC system. It was pretty old. And so they essentially decided to bring up a new PAC system, move whatever images they had across there, and then start the process either of re-imaging.

Patients are restoring as much as they possibly could to that new system. And the good news is they brought that up fairly quickly after the attack. So that was one of the major systems they had to bring back online. So what's left? You're bringing your major clinical systems back online and whatnot while email, right?

So email is was still offline. They brought that back online to a pilot group around day 30. And I imagine they installed some other systems. There's some things they did not talk about in this. I imagine they installed some other systems that were, uh, designed to protect against future phishing attacks and future ransomware attacks.

And so day 30 pilot group was uh, restored and it doesn't really go into detail in terms of when everyone was given access to email. Again, I would imagine there was some training that also . Had to go on. The other thing they talk about is their, uh, shared drives, the P drive and the F drive, as they're affectionately called, they did not have access to those things.

So when you think about where your policy documents are and your , your documents on your recovery plan and all that stuff. It's probably on shared drives, and that's also something to keep in mind. So for today, what I wanted to do was give you an idea of the steps that they went through in order to get through this and the amount of time it really took.

And I don't think they're all the way through this, to be honest with you. I imagine they're still trying to restore some other legacy systems. Most health systems have about. I don't know, four or 5, 600, 700 apps across the board. This is a smaller health system, so they're probably on the lower end of that scale, but either way, there's gonna be some apps that may never be brought back online, and there's gonna be data that's lost and data that has to get restored, and that's something else that they talk about.

But there isn't enough detail in here to really talk about. They had to set up a process for taking all the information that had been generated during the downtime and bring that back into the system. So you had the legacy data, restoring the legacy data, then you had the data that was generated during the downtime, and then you have the, the data from obviously the monitors and other things.

Which in some cases will be able to be brought back into the record, but in other cases will not be able to be brought back into the record. So that is some of the steps that they took to get through this. And tomorrow what we're gonna do is we're gonna take a look at what they. identified as their findings, their learnings from this, and we're gonna talk a little bit about what my learnings are from this and what I think we should take away from it.

So that's all for today. If you know of someone that might benefit from our channel, please forward them a note. They can subscribe on our website this week, health.com, or wherever you listen to podcast Apple, Google Overcast, Spotify, Stitcher. You get the picture. We are everywhere, or at least we're trying to be.

We wanna thank our channel sponsors who are investing our mission to develop the next generation of health leaders, VMware Hillrom, Starbridge Advisors, McAfee and Aruba Networks. Thanks for listening. That's all for now.

Transcripts

Chapters

Video

More from YouTube