By
Loke Tan, Director of Product Management at Skytap
As
organizations increasingly move to a hybrid cloud model (IDC predicts that over 90% of enterprises
worldwide will rely on a mix of on-premises/dedicated private clouds, multiple
public clouds, and legacy platforms by 2022) how should their disaster recovery
strategy change? Cloud-based options for disaster recovery and backups offer
several benefits compared to on-premise solutions, including cost saving,
geographical distribution/resilience, and ease of use. Customers with a hybrid
cloud model will also need to account for the fact that they no longer have
control of the connections between clouds that move over the public internet.
This has several ramifications for disaster recovery, and there are multiple
ways to manage this issue. Let's dig into these in detail.
The
first question for organizations building a disaster recovery strategy, no
matter their network environment, is whether to use the cloud. Disaster
Recovery as a Service (DRaaS) solutions allows an organization to replicate
data and IT infrastructure in a third-party cloud computing environment, rather
than to a backup server of their own. Like other "as-a-service" models, the
cloud infrastructure provider manages the resources used for the backups and
the customer pays based on how much they use. This means users can reduce
resources to only the level necessary to replicate data to the backup server
under normal conditions and then "turn it on" when needed. From a cost
perspective, this can be far more economical than buying and maintaining
physical backup servers. The cloud also makes it easier to put backup servers
in different regions for greater resilience, although companies subject to data
sovereignty laws may be limited to providers that offer data center space in
their country or region.
The
pros and cons of using the cloud are the same regardless of whether an
organization's infrastructure is on-premise or not. The major issue with a
hybrid cloud model is that the organization no longer has control over their
entire network infrastructure. Traffic between on-premise resources and a
public cloud, for example, crosses the public internet. The speed and
performance of this connection is no longer reliable; it will vary based on
many factors beyond the organization's control. IT teams will need to account
for this in their disaster recovery plan.
Because
the performance of the public internet isn't reliable, copying data to a cloud
backup server and restoring applications from that server may take longer in a
hybrid cloud environment. Organizations will need to increase their Recovery
Time Objective (RTO) and Recovery Point Objective (RPO) standards to account
for this. IT should set these objectives carefully for each component of the
environment, taking business as well as technical considerations into account.
They will want input from business leaders who can quantify the cost of an
outage or lost data to decide which systems need to be a higher priority.
Understanding upstream and downstream application dependencies is also important
in a hybrid cloud environment, where these relationships are likely more
complex. When building a DR plan, IT needs to know if an outage of one
application will cause a full or partial service outage to others. You can find
more useful information about designing a DR plan in this RedHat blog post.
If
organizations can't allow for higher RTOs and RPOs, they have a few options. First,
they can use Virtual Private Networks to connect all their infrastructure and
regain control over network performance through the VPN tunnel. Second, they
can split up large applications into microservices that each only require a
small amount of data
(for example, turning
one application that needs to move 100 gigabytes of data into ten separate
microservices that each need to move ten gigabytes). By rearchitecting their
applications and databases to support sharding or splitting into smaller more manageable
chunks, each chunk will recover much more quickly. The combined recovery time
will be much shorter than what it would be for the original application.
One
other important consideration is how the disaster recovery program will scale
up to cover new clouds or workloads that may be added in the future. The amount
of work required for this can vary widely depending on the technical
capabilities of the cloud provider in question. The easier it is to clone
network environments and move them to new data centers, the easier it will be
to cover new workloads as they are added and the more future-proof this system
will be.
When
creating a disaster recovery plan, all organizations need to consider how much
data they need to back up, how often it changes, what is the desired recovery
window, geographical and privacy requirements, and if using the cloud is a good
fit for their needs. Those with hybrid networks need to consider all of the
above, while taking into account the lack of control over the connections
between their various clouds. Given the steady increase in hybrid cloud
environments across companies of all sizes, IT will need to grapple with this
issue eventually - the only question is when.
##
ABOUT THE AUTHOR
Loke Tan is a Director of Product
Management at Skytap, a cloud services to
run IBM Power and x86 workloads natively in the public cloud. Loke combines a
strong technical development background, with experience in developer
marketing, technology evangelism and social media. He was a technical product
manager and developer evangelist at Microsoft for ten years and has held
similar positions at Avalara and Concur.