Disaster Recovery Journal Fall 2024
A rtificial Intelligence is the most popular business term of the year 2024. Generative AI models released in 2023 and 2024 such as ChatGPT have been demonstrating unimagin ably cool results in some applications. However, many observers have also criticized them for hallucina tions, expressed concerns over their safety, and fear of their potential negative impact on society in general. Can businesses trust the AI models to run critical processes? That is still an open question. Although AI models and products already can help you write coherent texts, e.g. this article could use some AI tools to optimize the language and phrasing, the AI model still needs an initial prompt to formulate the problem to be solved. Some programming code can already be written by AI with engineers formulating tasks in plain English and then finalizing the code to solve a specific problem. Let’s consider the impact of AI on disaster recovery (DR) as another potential area to benefit from it. DR is a critical part of the business continuity strategy of any organization and DR’s importance grows with the size and complexity of infrastruc ture as well as with regulatory require ments. At the same time, DR is not urgent or is often not considered urgent until a disaster strikes. Disasters, be it cloud out ages, cyberattacks, or misconfigurations, don’t happen very often. However, when they happen, the results could be disastrous. In Q1 2024 United Health Group (UHG), the largest health care company in the US and a processor of more than 50% of medical claims for almost one million physicians and one hundred million customers, experienced a ransomware attack that was successful for hackers and devastating for UHG. The company already reported a $872M loss in Q1 2024 and expects to almost double the number by the year-end. The UHG service was fully recovered in over 20 days, so the company was not ready for a disaster. Leveraging AI to drive DR automation can help reduce the recovery time 50-100x,
from days to hours and minutes. AI could have played an important role and could have saved hundreds of millions in the UHG case. Disaster Recovery Today The disaster recovery function and associated processes tend to be very con sumptive of technical and business labor resources. These resources are not only needed to build and manage programs and plans, but also keep them current, tested, and relevant to the organization. Technical and business leaders usually have a lot of high priority projects in the works and are not very excited about deal ing with DR programs. While most of the DR program responsibilities are delegated downward within an organization, these responsibilities often land with technical and business team members who already have a full load of job responsibilities and the DR responsibilities often get a lower priority and focus. Here is one illustrative example of a traditional DR process based on many medium-sized companies in our network.
the best technical resources are focused on this effort and other higher priority jobs and company projects basically stop or are significantly delayed. However, what if it is when a real disaster strikes, they realize the existing DR plans and scripts are out-of-date? Now the team has no choice but to scramble their valuable team resources to try and fix the issues as fast as they can to get the organization back up and running. It is when the disaster strikes that out-of-date DR plans and scripts lead to recovery time actuals (RTA) that are counted in hours and days compared to objectives set in minutes and hours. Often times, mistakes are made during the recovery process that led to bigger issues and more financial impacts for the organizations. This is when the false sense of security the technical and/or business leaders might have with their business continuity and disaster recovery capabilities are brought to light. While the results of the DR test can be handled to mitigate negative exposure upward within an organization (i.e. creating a false sense of security in the upper levels of an organization), it is these unsatisfactory results during a real disaster that are reported to management and investors. This is when the organization, and potentially even individual job security, is negatively impacted. Not to mention next year the infrastructure will likely become more complex and the company will face even bigger “DR testing” problem and risk to the organization. Can AI Help Here?
A mid-sized company has an established DR process with a set of DR plans that should be tested and updated regularly. When the time comes, the project leader assembles a team to run necessary tests. At that point, they realize the existing DR plans and scripts are out of date and don’t reflect the current state of infrastructure because they have not been kept current. Either their test fails (they have to reschedule their test and team resources) or they scramble their valuable team resources to spend time trying to fix the issues to get a successful test. Regardless of how the team chooses to handle DR testing, the organization finds itself having a squad of the best company technical and business resources working day and night to update plans and run several iterations of testing to finally achieve acceptable RTOs and other metrics. The DR testing project could last three to five weeks. During that time, all
For those responsible for DR, it would be ideal to solve the DR problem once and for all with an internal or external solution they trust and enables them to more effec tively create and manage a DR program. But can AI help them get there? The short answer is “yes,” but let’s first consider the four major areas which cause challenges with DR today:
DISASTER RECOVERY JOURNAL | FALL 2024 9
Made with FlippingBook Digital Publishing Software