Disaster Recovery Journal Spring 2023

will present a multitude of issues. It could be a lack of staff who remember how the systems work, inability to get vendor support in the event of failure, or worse, the impossibility of restoring the applica tion to service because the org lacks the hardware to do so. Making a conscious decision to ignore tech debt is to accrue interest on that debt. That interest comes from higher failure rates, increased maintenance costs in extended support contracts, and a lack of future feature sup port for business resulting in more kludge solutions to achieve business goals. Oh, and when I say “update,” I mean a “complete overhaul,” not simply insert ing a nice-looking front end on a decrepit back-end system. So how is this our Alamo moment? This opens a new opportunity for us to raise the alarms around technical debt. Southwest technology personnel knew the risks and challenges, raised them to management, but were overall unable to sway the executives. In a letter from the second vice president of SWAPA (union), Tom Nekouei said, “This meltdown was easily avoidable. It was predictable and it was predicted.” That single-mindedness led them to the scenario they’re in now. Technology, like any other investment, has a set lifespan; and when you exceed that, you invite disaster. For the rest of us, we can use this moment to educate and demon strate the true risks involved with taking on technical debt. Use this as a teaching moment followed by an analysis of the risks and likelihoods and that should raise the hackles of the hardest cost-conscious executives. Tech debt systems have a variety of weaknesses which can become apparent if you know where to look. These data points can help you make the case about rationalizing your application platforms:  Number of skilled staff in the system – Particularly egregious tech debt will see a proportion of the system support staff retire or leave the company. The number of retirees of a team can be a red flag the platform is no longer supportable.

 Availability of skilled workers on the job market – If you find it difficult to hire personnel for a particular system, it might be a good sign you’re behind the technology curve. Then, if you manage to find someone willing to work on a dated system, they are often long tenured and extremely experienced – which is to say expensive to hire.  Increasing number and severity incidents – Apps that have routine incidents, or an increasing number of incidents over time can show an application in distress. An increase in minor incidents can result in a large amount of time lost to maintaining a cantankerous application. In addition, incidents that increase in severity will require more extreme measures to respond.  Increasing time to restore – When mean time to restore or triage time increases over a period of years, it can show an application where staff is no longer skilled enough to support – or the system has become too complex to triage and manage effectively.  Extended support contracts – Systems in extended support are beyond end of life from the vendor. Costs for these contracts increase exponentially, and if

you haven’t thought about replacing this system, you need to immediately. It is unfortunate when vendors drop planned obsolescence on us, but the pain of not upgrading comes from the extended support costs, plus the loss of staff on the vendor side which supports the product.  Increases in vulnerabilities – Increased numbers of security vulnerability patches or other security weaknesses can be a sign that it’s time to kill an application. The amount of time dedicated to patching, testing the patches, and then rollout to production can quickly swallow entire product teams. Increased patch frequency can delay staff from doing other, more productive work. v where he worked with critical infrastructure protection, inci dent response and technical disaster recovery planning. He transitioned afterward to Deloitte where he served a variety of commercial clients in healthcare, fintech, heavy manufacturing, and shipping groups, consulting for all their resilience needs. Doernhoefer then moved on to work with United Airlines as their disaster recovery professional where he was cited as the BCI Professional of the Year for the Americas. He now serves as the lead infrastructure resilience manager at USAA. Matt Doernhoefer is an experienced tech nical resilience professional with nearly 20 years in a variety of industries. He started his career working with federal government agencies such as DHS, DOD, and DOJ

C

M

Y

CM

MY

CY

CMY

K

24 DISASTER RECOVERY JOURNAL | SPRING 2023

Made with FlippingBook - Online catalogs