By Faiz Khan, Founder/CEO, Wanclouds
There are
two main approaches to creating disaster recovery for your cloud infrastructure
hosting your production application.
- The first approach is to have a replica of
the primary production infrastructure such as a VPC setup with network
functions, security policies and nodes, storage etc. hosting your application.
The replica can be in the same or different region with active data
synchronization running. A DNS switch-over (as an example) may be used to
switch to the backup site in case the primary site is unavailable for whatever
reason. While this is an ideal scenario, it can be very costly and paying for
the replicated infrastructure which for the most part is just sitting idle.
- The second approach is to use an on-demand
setup/restore of your primary production environment. This approach is cost
effective, especially if your application can tolerate a few minutes to an hour
type delay then you can save a lot of costs avoiding unnecessary spend on
infrastructure which may be sitting idle for most of the time.
In this
article, we are focusing on the second scenario where you want to have an
on-demand deployment or restoration of your production environment. This
sometimes is also referred to as the Cold DR approach. There are various backup
solutions available for backing up virtual machines and more recently solutions
for backing up kubernetes, data etc.. These solutions typically work well if
the infrastructure setup such as a Virtual Private Cloud setup, network
functions and their configurations haven't changed at the time of restoring on
the same setup. However, in a disaster scenario such as a particular public
cloud region going down, or the VPC setup itself getting messed up due to human
error or other reasons, the restoration of VMs, Kubernetes etc. becomes a
lengthy process as the VPC design, network functions, and security policies
have to be setup first and then restoring Kubernetes, VMs etc.
In
addition, imagine a scenario if the entire cloud experiences an outage in one
or multiple regions and you decide to move and restore your application,
infrastructure, network functions etc. in a different cloud. This
unfortunately, becomes a daunting task and most likely customers will not
consider this option specially in a disaster scenario given it may take weeks
and cost a lot as the process itself will require a lot of engineering effort
and resources.
For a
comprehensive disaster recovery or business continuity use-cases you should be
prepared or have the ability to restore your entire cloud infrastructure setup
in minutes not in days and weeks. Following is an example of resources that
needs to be ready to be restored in a different region or different cloud in a
matter of minutes:
- VPC Setup
- VPC Zones construct
- IP addressing
- Subnets
- Load Balancing
- Routes
- Security Groups
- Access Control Lists
- Virtual Private Network gateway and setup
- Public Gateway
- Network Address Translation
- Content Delivery Network Information
- SSH Keys
- Names and Tags
- Policies
- Virtual Machines
- Storage volumes
- File storage bucket information
- DNS
- Kubernetes or OpenShift:
- Kubernetes Manifest file
- Persistent Volumes
- Applications and Name spaces
- Other Kubernetes setup related configurations
such as PODs, Deployments, Services etc.
- Moving across storage classes
Ensure
that the above resources and their relationships with each other are understood
and can be restored during disaster scenarios across regions and across clouds.
Ensure
you can move your data cross storage classes migration for any persistent.
Multiple
tools such as Terraform, Ansible along with VM and Kubernetes DR tools can be
used to create a comprehensive DR solution with a savvy cloud operation team.
Companies like Wanclouds Inc. are also focusing on simplifying the DR with its
comprehensive DRaaS approach. Whatever multi-cloud, multi-regional DR scenarios
are created, the cloud ops team needs to make sure they are tested and
maintained with any changes in the source production environment.
##
To learn more about
cloud native technology innovation, join us at KubeCon + CloudNativeCon Europe 2021 - Virtual, which will
take place from May 4-7.
ABOUT THE
AUTHOR
Faiz Khan Founder/CEO, Wanclouds
Prior to founding Wanclouds, Faiz was an
executive at Cisco and played multiple technology leadership roles. His latest
assignment was leading the Global Cloud automation and orchestration
organization. Prior to that, he has built the Global Datacenter and cloud
practice and was the GM for Emerging Markets Technology Practices Organization.