By Dave Bermingham, Senior Technical Evangelist at SIOS Technology
When it comes to building out a SQL Server infrastructure
for high availability (HA), are there any cost advantages to staying on-prem
vs. moving to the cloud? If you need SQL Server to be available at least 99.99%
of the time, you're going to need backup infrastructure that can take over immediately
if the active instance of SQL Server suddenly goes offline - so it's not as
though moving to the cloud eliminates the cost of backup infrastructure.
But the cloud offers a far wider range of configuration
options that can, in fact, save you money when building out a SQL Server
infrastructure designed for HA. Let's start with core configuration
considerations. If you're configuring for HA on-premises, you'll want to build
an infrastructure designed for automatic failover among compute and storage
resources. That way, if the infrastructure supporting your active SQL Server
instance becomes unavailable (for whatever reason), the active infrastructure
will fail over to other infrastructure that can immediately take up the SQL
Server load and continue to support your operations.
On-Prem vs. Cloud Clusters
Organizations frequently use Windows Failover Clustering
Services (WFCS) to support this need, as WFCS enables the creation of a
multi-node SQL Server failover cluster instance (FCI) in which one node is
operating as the "active" node and one or more nodes - which can be located in
geographically different locations - act as "secondary" or "standby" nodes that
can be called into service immediately should something happen to the active
node. Organizations can use WFCS to configure for HA in the cloud too, but the
cloud offers some distinct cost-saving options.
First, if you're building your HA infrastructure on-prem,
you don't have a lot of sizing flexibility. You need to configure your FCI for
your anticipated workloads-even if those workloads vary significantly over
time. Your organization may experience spikes in traffic to your SQL Server
database at the end of the day, month, or quarter. If you're in retail, your
traffic may spike during the holidays and then fall back down to a much lower
level during other parts of the year. The problem on-prem is that your active
SQL Server infrastructure needs to be sized to accommodate those spikes when
they come, and that may mean costly investments in CPU and RAM that you don't
use during those off-peak times.
Moreover, it means identical investments in over-the-top CPU
and RAM capacity for each of the secondary nodes in your FCI - because those
backup nodes need to be powerful enough to take over when called upon. That
means two or more instances of high
end hardware (and appropriate software licenses), a good portion of which will
stand idle for the duration of the time that the active SQL Server
infrastructure is performing on par.
You can approach this a different way if you configure an
FCI in the cloud. To begin with, the elasticity of cloud offerings means that
you can configure the primary SQL Server node in your FCI to meet your
day-to-day performance needs (rather than peak performance needs). The cloud
infrastructure can automatically add cores and memory in response to spikes in
demand. You'll pay more for those added resources when you're using them, but
once the spikes subside those resources can be automatically turned off and you
continue to pay only for the resources you use.
That kind of elasticity delivers another benefit when
configuring for HA in the cloud: you don't need to configure your secondary
nodes as perfect mirrors of your primary node. The failover cluster node (or
nodes) that are standing by, waiting to be called into service in the event of
a failover, can be configured with fewer cores and far less RAM than your
production needs would actually require. However, because these are virtual
machines rather than physical machines, they can be reconfigured in an instant
and reconstituted upon reboot to have the same (or even more) CPU and RAM
capacity as the primary node - and in that reconstituted configuration they
fill in for the primary node that has failed. You pay only for the resources of
the under-configured system while it is in standby, pay for the additional
resources once you've rebooted the system as one appropriately sized to meet
the demands of the moment, and then you can go back to paying the lower amount
once you move your SQL Server workload back to the former primary node and
shrink the secondary node back to its under-configured state, which is
sufficient until it is subsequently called into service in response to the next
failover. Then, it's just rinse and repeat.
Saving on Software
One of the things you need to reassess in moving an HA
infrastructure to the cloud is the manner in which you ensure access to your
underlying SQL Server data. In an on-prem configuration you might have a shared
storage area network (SAN) that is separate from the application servers but
can be accessed by any the nodes in the failover cluster. In the cloud, a SAN
isn't an option. Each node in a failover cluster has its own storage, so
ensuring HA in the cloud requires that you configure a mechanism for
replicating data stored on the active SQL Server infrastructure to storage
attached to each of the backup infrastructures. That way, when a secondary node
is called into service, it's got a copy of all the data that was in use when the
primary node went offline.
In the Windows world, there are basically two ways to do
this. You can take advantage of a replication service built into SQL Server -
using the Basic or Always-On Availability Group (AG) feature - or you can take
advantage of a third-party SANless Clustering tool. Both approaches deliver the
core services required to replicate your SQL Server data from the primary to
secondary storage systems, but the cost of deploying these approaches may
differ dramatically.
If you rely on SQL Server Standard Edition (2012 or later),
you can use the Basic AG feature to replicate a single database per AG to a
single backup server in your failover cluster. However, if you have multiple
databases that you need to replicate, or if you intend to replicate one or more
SQL Server databases to multiple secondary servers, you'll need to upgrade to
SQL Server Enterprise Edition, which
is far more expensive. Your application
may not require all the features of SQL Server Enterprise Edition, but you'll
need that edition if you want to replicate more than a few SQL Server databases
to a secondary node or one or more SQL Server databases to more than one
failover node.
The alternate approach to HA data replication involves use
of a third-party SANless clustering tool. This approach provides you with
block-level storage replication from one storage system to another and is
independent of SQL Server itself. One benefit of this independence is that a
SANless clustering tool will replicate everything in storage (including any
number of databases), whereas the AG feature of SQL Server only replicates
user-defined SQL Server databases. A second benefit of this independence is
that the SANless clustering approach will work with whatever version of SQL
Server your organization requires - even back to SQL Server 2008.
That latter benefit can cut your operating costs
significantly. If your organization doesn't have a good business reason for
using the features of SQL Server Enterprise edition, the SANless cluster
approach to data replication enables you to continue to rely on SQL Server
Standard Edition while still ensuring the replication your SQL Server data
among as many nodes as your HA demands require. That translates into tangible
cost savings. On identically configured dual-node clusters with quad-core CPUs,
the cost of implementing HA is 58% lower using Standard edition of SQL Server
and a SANless clustering tool than when using SQL Server Enterprise edition and
the AG feature. Bump those configurations up to 24-core CPUs, and a SANless clustering
approach can cut your HA costs by more than 70% when compared to the cost of
running SQL Server Enterprise edition and an AG.
And more...
These examples only begin to suggest the cost differences of
configuring HA on-prem vs. in the cloud. Factor in the cost of administrative
and support personnel at multiple sites (unnecessary in the cloud), as well as
the cost of dedicated high-speed network links (again, already built into the
cloud), and it's clear that there are ample opportunities to reduce costs by
configuring your SQL Server for HA in the cloud rather than on-premises.
##
ABOUT THE AUTHOR
Dave Bermingham is the Senior Technical Evangelist at SIOS Technology.
He is recognized within the technology community as a high-availability expert
and has been honored to be elected a Microsoft MVP for the past 11 years: 6
years as a Cluster MVP and 5 years as a Cloud and Datacenter Management MVP.
Dave holds numerous technical certifications and has more than thirty years of
IT experience, including in finance, healthcare, and education.