A Disaster Recovery Strategy for Hybrid Networks

By Loke Tan, Director of Product Management at Skytap

As organizations increasingly move to a hybrid cloud model (IDC predicts that over 90% of enterprises worldwide will rely on a mix of on-premises/dedicated private clouds, multiple public clouds, and legacy platforms by 2022) how should their disaster recovery strategy change? Cloud-based options for disaster recovery and backups offer several benefits compared to on-premise solutions, including cost saving, geographical distribution/resilience, and ease of use. Customers with a hybrid cloud model will also need to account for the fact that they no longer have control of the connections between clouds that move over the public internet. This has several ramifications for disaster recovery, and there are multiple ways to manage this issue. Let's dig into these in detail.

The first question for organizations building a disaster recovery strategy, no matter their network environment, is whether to use the cloud. Disaster Recovery as a Service (DRaaS) solutions allows an organization to replicate data and IT infrastructure in a third-party cloud computing environment, rather than to a backup server of their own. Like other "as-a-service" models, the cloud infrastructure provider manages the resources used for the backups and the customer pays based on how much they use. This means users can reduce resources to only the level necessary to replicate data to the backup server under normal conditions and then "turn it on" when needed. From a cost perspective, this can be far more economical than buying and maintaining physical backup servers. The cloud also makes it easier to put backup servers in different regions for greater resilience, although companies subject to data sovereignty laws may be limited to providers that offer data center space in their country or region.

The pros and cons of using the cloud are the same regardless of whether an organization's infrastructure is on-premise or not. The major issue with a hybrid cloud model is that the organization no longer has control over their entire network infrastructure. Traffic between on-premise resources and a public cloud, for example, crosses the public internet. The speed and performance of this connection is no longer reliable; it will vary based on many factors beyond the organization's control. IT teams will need to account for this in their disaster recovery plan.

Because the performance of the public internet isn't reliable, copying data to a cloud backup server and restoring applications from that server may take longer in a hybrid cloud environment. Organizations will need to increase their Recovery Time Objective (RTO) and Recovery Point Objective (RPO) standards to account for this. IT should set these objectives carefully for each component of the environment, taking business as well as technical considerations into account. They will want input from business leaders who can quantify the cost of an outage or lost data to decide which systems need to be a higher priority. Understanding upstream and downstream application dependencies is also important in a hybrid cloud environment, where these relationships are likely more complex. When building a DR plan, IT needs to know if an outage of one application will cause a full or partial service outage to others. You can find more useful information about designing a DR plan in this RedHat blog post.

If organizations can't allow for higher RTOs and RPOs, they have a few options. First, they can use Virtual Private Networks to connect all their infrastructure and regain control over network performance through the VPN tunnel. Second, they can split up large applications into microservices that each only require a small amount of data (for example, turning one application that needs to move 100 gigabytes of data into ten separate microservices that each need to move ten gigabytes). By rearchitecting their applications and databases to support sharding or splitting into smaller more manageable chunks, each chunk will recover much more quickly. The combined recovery time will be much shorter than what it would be for the original application.

One other important consideration is how the disaster recovery program will scale up to cover new clouds or workloads that may be added in the future. The amount of work required for this can vary widely depending on the technical capabilities of the cloud provider in question. The easier it is to clone network environments and move them to new data centers, the easier it will be to cover new workloads as they are added and the more future-proof this system will be.

When creating a disaster recovery plan, all organizations need to consider how much data they need to back up, how often it changes, what is the desired recovery window, geographical and privacy requirements, and if using the cloud is a good fit for their needs. Those with hybrid networks need to consider all of the above, while taking into account the lack of control over the connections between their various clouds. Given the steady increase in hybrid cloud environments across companies of all sizes, IT will need to grapple with this issue eventually - the only question is when. 



Loke Tan 

Loke Tan is a Director of Product Management at Skytap, a cloud services to run IBM Power and x86 workloads natively in the public cloud. Loke combines a strong technical development background, with experience in developer marketing, technology evangelism and social media. He was a technical product manager and developer evangelist at Microsoft for ten years and has held similar positions at Avalara and Concur.

Published Thursday, July 29, 2021 7:44 AM by David Marshall
