Virtualization Technology News and Information
The Cost of Configuring SQL Server for High Availability On-Prem vs. the Cloud

By Dave Bermingham, Senior Technical Evangelist at SIOS Technology

When it comes to building out a SQL Server infrastructure for high availability (HA), are there any cost advantages to staying on-prem vs. moving to the cloud? If you need SQL Server to be available at least 99.99% of the time, you're going to need backup infrastructure that can take over immediately if the active instance of SQL Server suddenly goes offline - so it's not as though moving to the cloud eliminates the cost of backup infrastructure.

But the cloud offers a far wider range of configuration options that can, in fact, save you money when building out a SQL Server infrastructure designed for HA. Let's start with core configuration considerations. If you're configuring for HA on-premises, you'll want to build an infrastructure designed for automatic failover among compute and storage resources. That way, if the infrastructure supporting your active SQL Server instance becomes unavailable (for whatever reason), the active infrastructure will fail over to other infrastructure that can immediately take up the SQL Server load and continue to support your operations.

On-Prem vs. Cloud Clusters

Organizations frequently use Windows Failover Clustering Services (WFCS) to support this need, as WFCS enables the creation of a multi-node SQL Server failover cluster instance (FCI) in which one node is operating as the "active" node and one or more nodes - which can be located in geographically different locations - act as "secondary" or "standby" nodes that can be called into service immediately should something happen to the active node. Organizations can use WFCS to configure for HA in the cloud too, but the cloud offers some distinct cost-saving options.

First, if you're building your HA infrastructure on-prem, you don't have a lot of sizing flexibility. You need to configure your FCI for your anticipated workloads-even if those workloads vary significantly over time. Your organization may experience spikes in traffic to your SQL Server database at the end of the day, month, or quarter. If you're in retail, your traffic may spike during the holidays and then fall back down to a much lower level during other parts of the year. The problem on-prem is that your active SQL Server infrastructure needs to be sized to accommodate those spikes when they come, and that may mean costly investments in CPU and RAM that you don't use during those off-peak times.

Moreover, it means identical investments in over-the-top CPU and RAM capacity for each of the secondary nodes in your FCI - because those backup nodes need to be powerful enough to take over when called upon. That means two or more instances of high end hardware (and appropriate software licenses), a good portion of which will stand idle for the duration of the time that the active SQL Server infrastructure is performing on par.

You can approach this a different way if you configure an FCI in the cloud. To begin with, the elasticity of cloud offerings means that you can configure the primary SQL Server node in your FCI to meet your day-to-day performance needs (rather than peak performance needs). The cloud infrastructure can automatically add cores and memory in response to spikes in demand. You'll pay more for those added resources when you're using them, but once the spikes subside those resources can be automatically turned off and you continue to pay only for the resources you use.

That kind of elasticity delivers another benefit when configuring for HA in the cloud: you don't need to configure your secondary nodes as perfect mirrors of your primary node. The failover cluster node (or nodes) that are standing by, waiting to be called into service in the event of a failover, can be configured with fewer cores and far less RAM than your production needs would actually require. However, because these are virtual machines rather than physical machines, they can be reconfigured in an instant and reconstituted upon reboot to have the same (or even more) CPU and RAM capacity as the primary node - and in that reconstituted configuration they fill in for the primary node that has failed. You pay only for the resources of the under-configured system while it is in standby, pay for the additional resources once you've rebooted the system as one appropriately sized to meet the demands of the moment, and then you can go back to paying the lower amount once you move your SQL Server workload back to the former primary node and shrink the secondary node back to its under-configured state, which is sufficient until it is subsequently called into service in response to the next failover. Then, it's just rinse and repeat.

Saving on Software

One of the things you need to reassess in moving an HA infrastructure to the cloud is the manner in which you ensure access to your underlying SQL Server data. In an on-prem configuration you might have a shared storage area network (SAN) that is separate from the application servers but can be accessed by any the nodes in the failover cluster. In the cloud, a SAN isn't an option. Each node in a failover cluster has its own storage, so ensuring HA in the cloud requires that you configure a mechanism for replicating data stored on the active SQL Server infrastructure to storage attached to each of the backup infrastructures. That way, when a secondary node is called into service, it's got a copy of all the data that was in use when the primary node went offline.

In the Windows world, there are basically two ways to do this. You can take advantage of a replication service built into SQL Server - using the Basic or Always-On Availability Group (AG) feature - or you can take advantage of a third-party SANless Clustering tool. Both approaches deliver the core services required to replicate your SQL Server data from the primary to secondary storage systems, but the cost of deploying these approaches may differ dramatically.

If you rely on SQL Server Standard Edition (2012 or later), you can use the Basic AG feature to replicate a single database per AG to a single backup server in your failover cluster. However, if you have multiple databases that you need to replicate, or if you intend to replicate one or more SQL Server databases to multiple secondary servers, you'll need to upgrade to SQL Server Enterprise Edition, which is far more expensive. Your application may not require all the features of SQL Server Enterprise Edition, but you'll need that edition if you want to replicate more than a few SQL Server databases to a secondary node or one or more SQL Server databases to more than one failover node.

The alternate approach to HA data replication involves use of a third-party SANless clustering tool. This approach provides you with block-level storage replication from one storage system to another and is independent of SQL Server itself. One benefit of this independence is that a SANless clustering tool will replicate everything in storage (including any number of databases), whereas the AG feature of SQL Server only replicates user-defined SQL Server databases. A second benefit of this independence is that the SANless clustering approach will work with whatever version of SQL Server your organization requires - even back to SQL Server 2008.

That latter benefit can cut your operating costs significantly. If your organization doesn't have a good business reason for using the features of SQL Server Enterprise edition, the SANless cluster approach to data replication enables you to continue to rely on SQL Server Standard Edition while still ensuring the replication your SQL Server data among as many nodes as your HA demands require. That translates into tangible cost savings. On identically configured dual-node clusters with quad-core CPUs, the cost of implementing HA is 58% lower using Standard edition of SQL Server and a SANless clustering tool than when using SQL Server Enterprise edition and the AG feature. Bump those configurations up to 24-core CPUs, and a SANless clustering approach can cut your HA costs by more than 70% when compared to the cost of running SQL Server Enterprise edition and an AG.

And more...

These examples only begin to suggest the cost differences of configuring HA on-prem vs. in the cloud. Factor in the cost of administrative and support personnel at multiple sites (unnecessary in the cloud), as well as the cost of dedicated high-speed network links (again, already built into the cloud), and it's clear that there are ample opportunities to reduce costs by configuring your SQL Server for HA in the cloud rather than on-premises.



Dave Bermingham 

Dave Bermingham is the Senior Technical Evangelist at SIOS Technology. He is recognized within the technology community as a high-availability expert and has been honored to be elected a Microsoft MVP for the past 11 years: 6 years as a Cluster MVP and 5 years as a Cloud and Datacenter Management MVP. Dave holds numerous technical certifications and has more than thirty years of IT experience, including in finance, healthcare, and education.

Published Monday, March 22, 2021 7:39 AM by David Marshall
Filed under: ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<March 2021>