Disaster Recovery Journal Winter 2024

which is less than the “every six months” in 2023 or “every five months” in 2022. When IT leaders were asked what per centage of their production systems was recoverable within expected SLA’s during their last IT-DR test, only 58% of systems came online within their timeframe. While others likely simply resumed slower, one has to presume some percentage weren’t recoverable at all. IT-DR SLAs are not meeting expectations As resiliency professionals, we know the secret to any successful resiliency plan is in the consistency by which it can be executed (barring unforeseen circum stance variation). Surprisingly, only 37% of IT teams utilize orchestrated workflows as part of their systems’ recovery pro

cesses. Some 37% can test their “recipe” for server/application recovery, examine which workflow steps exposed errors, and then optimize the workflow for higher reliability in the future. The other 63% are conducting manual recovery steps per workload, each SLA adherence will vary by test, with little to no opportunity to improve the potential agility in future exercises or actual events. In fact, when enterprise IT leaders were asked how long they would anticipate the IT-DR recovery of a relatively simple 50-server environment (which isn’t a lot in an enterprise environment), only 32% believed they could recover those 50 serv ers within a single business week. Conclusions and recommendations There are more statistics related to the potential for IT teams to meet the recov ery expectations during “typical” disaster recoveries as well as ransomware events in the research cited above, including (data protection trends) and (ransomware trends). In the meantime, as business resil iency leaders, here are a few questions to discuss with your DR constituents in IT: 1. How do we ensure our backups are not affected when ransomware hits our production systems? 2. When was the last time we tested at scale (e.g. 50 servers or more)? 3. How much of the per-workload recovery is orchestrated with workflows that can be assessed and optimized? 4. Do we have a “clean room” or other staged restoration capability to reduce the risk of re-infection during restoration from a cyberattack? 5. When were these IT-DR capabilities last externally audited against our expected SLAs and recovery plans? That last one ought to make the other four much more instructive. v Jason Buffington has spent 35 years in IT disaster recovery. He first earned his CBCP in 2003, spoken at numerous DR and IT events over the years, and has been pub lished in DRJournal and other periodicals. He is a VP of strategy at Veeam Software and his blogs can be found on http://ITDRblog.com.

IT teams are testing DR less With as much hype as ransomware has in the media, as well as various regulatory mandates for cyber-resilience and board level directives, one would assume testing of IT systems’ recovery at scale would be on the rise. Unfortunately, the 2024 statis tics do not show that. Many BC/CM teams saw additional testing and plan develop ment efforts in the years immediately after COVID, based on teams’ bandwidth and an inherent recognition of the organiza tion’s depending on IT systems, even as some of those IT systems evolved from datacenter centric to new architectures in support of remote workforces. Research from 2024 reveals IT teams are only testing recovery-at-scale (e.g. IT disaster recovery) every eight months,

16 DISASTER RECOVERY JOURNAL | WINTER 2024

Made with FlippingBook flipbook maker