18 Aug Hope for the best, Plan for the Worst: Redundant Radiology Systems
This morning I read a forum post on Aunt Minnie about a popular teleradiology company that experienced a long technology outage during peak coverage hours. As an IT department leader, this piqued my interest. The issue was related to an outside Internet Service Provider – a vendor who was unrelated to the teleradiology provider themselves.
You and other IT leaders out there, like myself, are probably having two distinct reactions right now: 1) I feel bad for the poor IT guys at the teleradiology company that had to deal with the outage and the pressures of restoring service and dealing with customer complaints while the entire event was really outside of their immediate control, and 2) why didn’t they plan better? As IT leaders, we know that the key to avoiding situations like this is to change the mindset in the IT department from one of reaction to one of pro-action.
Hope for the best, but plan for the worst! I have always pushed to do this in the departments I have been involved with, including the department here at ONRAD. We all know that you cannot think of everything, but you can think of a lot, especially those of us who have experience with these sorts of catastrophes. And, we all know that, when disaster strikes, your customers will not care the least bit about whose fault it is, which vendor is responsible, or how it was outside of your control – all they will care about is restoring service. This pressure is compounded in the industry that we are in, when patients are literally on emergency room tables waiting for a diagnostic interpretation of their medical imaging study from one of our doctors, who is dependent on the technology that our company has deployed. In that situation, “Sorry Mr. ED Doctor, and Mr. Patient, it is not us, it is our ISP,” just won’t cut it.
So, what do we do? How can we be proactive IT leaders? I don’t claim to have all the answers, but I do have a simple approach that has worked well for our team at ONRAD. It starts with a detail-oriented approach to inventorying both your complete set of customers, and your complete set of systems. If we are running a strong IT shop, we likely have this list, hopefully in a CRM system (we use Salesforce.com at ONRAD and we love it). Next, we score each of our systems with a risk level (critical means the entire company is dead in the water if that system goes down, and low means that there is no impact to our customers if the system goes down). Finally, we score each system, and each customer, with a redundancy score- this is basically just yes or no indicating if there is a redundant system, combined with a description of what it is.
With this documentation setup, we can now do a fault tolerance audit of our systems, and of our customers. At ONRAD, we do both routinely. First, with each customer, we evaluate what would happen if any of the systems between our radiologist dictating the study, and the imaging modality acquiring the study, were to fail. We do not ask “Whose fault would that be,” but instead ask, “What would happen if the system failed, and how would we maintain service to that customer.” This analysis has led us to setup alternate routes of connectivity to most of our customers who are willing to work with us on the project, including ensuring that there are redundant ISP’s (including physically separate connections on the all important “last mile” into our datacenter), and setting up alternate transmission routes with respect to VPN tunnels, which use physically separate firewall edge devices, and even allow us to bypass the VPN in many cases with an alternate method of encryption using technologies like SSL or TLS. Our IT department advises our customers on how to configure these types of connections, and our technology is flexible to accommodate them. However, there are situations where complete redundancy is not possible. In this case, it is important to communicate with the customer, and your own internal management team, what the risks are- everyone should know that in the event of some situation, say a natural disaster near the customer’s facility that disables their power or connectivity, that transmission of images are not possible. All stake holders should understand and accept these risks up front. As a final step to this preparation, and a step that is sometimes overlooked, it is important to routinely test the use of your redundant systems and connectivity both to confirm that they operate as expected, and as a training exercise for your IT team to get used to using them.
These steps as a whole, if implemented properly, will not necessarily mean that you, Mr. IT Leader, will avoid the predicament in the event that disaster strikes, but they will help mitigate the pressure as you will have the maximum level of practical redundancy in your systems, and your customers and internal management will understand and accept the risks of any of your systems that are potentially exposed.
Many of you will read this and think “Of course, we should plan ahead and be proactive,” however, I challenge all of you who run IT departments, either at a hospital or elsewhere, to go through this fault tolerance exercise, and all of you customers out there to ask your vendors if they have gone through this exercise. My guess is you may be surprised in what you discover. I am curious to hear your thoughts on this topic. Feel free to email me, or better yet, respond with a comment to this blog- we want to hear from you!
– Jesse Salen, CTO, ONRAD, Inc.