In any of Microsoft's online communities, you'll find a lot of people offering thoughts, ideas,
and opinions. But when people go into the community looking for actionable
answers to questions they can't answer, those thoughts, ideas, and opinions are
routinely greeted with a bit of skepticism-unless
they're coming from someone Microsoft has recognized as a Most Valuable
Professional (MVP). The MPVs responses? They carry weight. Those
answers people are comfortable acting on.
So what do the MVPs have to say about maintaining and
optimizing SQL Server performance in a cloud-based high availability
environment? Dave Bermingham, Senior
Technical Evangelist at SIOS
Technology and a Microsoft Cloud and Datacenter Management MVP, sat
down with Melody Zacharias, Senior Microsoft Solutions Manager at Pure Storage
and a Microsoft MVP; Allan Hirt, Managing Partner at SQLHA and a Microsoft Cloud and
Datacenter Management and Data Platform MVP & VMware vExpert; Joey D'Antoni, principal
consultant at Denny Cherry and Associates Consulting and a Microsoft
Data Platform MVP; and Mike Walsh, founder of Straight Path Solutions and a SQL Server
Consultant and former Microsoft MVP to discuss that topic. This Q&A will
highlight their thoughts and provide readers with actionable dos and don'ts for maintaining and optimizing performance in a cloud-based
high availability environment.
##
Dave Bermingham: Let's start with fundamentals. What makes for a highly
available solution in the cloud?
Allan Hirt: When it comes to high availability, there's no real difference between the cloud and on-premises.
You're still looking for 99.99% availability. It's a little more abstracted in
the cloud because there's more than one way to deploy in the cloud-including infrastructure-as-a-service (IaaS)
offerings, where you're doing the work of configuring and maintaining SQL
Server on top of a cloud provider's VM, storage, and networking infrastructure,
and platform-as-a-service (PaaS)
offerings where the cloud provider is providing SQL Server and managing nearly
all aspects of your application infrastructure for you-but your expectations
for HA remain the same.
Joey D'Antoni: Well, it might be important to add that high availability
PaaS configurations may not always provide the flexibility that a customer
might want because they may only be focused on ensuring high availability
within a specific region. Some solutions may support multiple copies of
storage; others may not. Some documentation from cloud providers is clearer
than others when it comes to PaaS options and how they provide high
availability.
Dave Bermingham: Are there performance issues companies should know
about before they deploy SQL Server in a high availability cloud
infrastructure?
Joey D'Antoni: Companies may encounter performance issues, but that's
often because you do things differently in the cloud. Once you know that, you
can make adjustments to achieve the expected performance. Some people are
afraid of using SQL Server indexing in their on-prem deployments, for example,
but it's really important to do that when you're running SQL Server in the
cloud. Some people don't understand how cloud storage works and try to
configure their cloud-based infrastructures the same way they always configured
their on-prem storage infrastructures-and that won't be successful. They may
need to stripe multiple disc devices together, choose a different VM, or
perhaps over-allocate storage to get the throughput they are expecting.
Melody Zacharias: That's one of the biggest challenges that we see for people
who are moving to the cloud. People think it's just going to be an apples to
apples shift, but in reality it's more like
turning an apple into an orange. Microsoft has recently deployed a workload
planner for customers shifting to Azure, and that can help avoid some potential
performance problems. Companies can model their SQL Server workloads in a high
availability cloud setting and the workload planner will suggest which VM will
work best to provide the desired performance in the cloud.
Mike Walsh:
And picking the right VM is still only part of the story. A lot of SQL Server
performance depends on throughput. Joey alluded to some best practices for
configuring cloud storage, but many companies don't know what their throughput
requirements are and so have no idea now to choose and configure cloud storage
appropriately. This can create the impression of application performance issues
in the cloud, but they're not really performance limitations of the cloud per
se.
Allan Hirt: This is why it's so important for companies to do
baselining and benchmarking before anything else. If you don't understand your
application and performance footprint on-premises, you won't be able to
determine how best to achieve the performance levels you expect in the cloud.
Dave Bermingham: What kind of baselining and benchmarking should companies
do? And how should they do it?
Allan Hirt: When we help customers migrate to the cloud, we just use a
set of PerfMon counters. You don't get anything fancy. The performance tools
that are built into the box are fine.
Joey D'Antoni: I agree. You need to collect the on-prem performance
data, but you don't need to go overboard. It's not really a whole lot of data.
If you're moving from an on-prem VM to a cloud VM, you want to make sure you
have the same amount of memory. You'll want to know what your CPU utilization
was like on-prem, maybe know how old your CPUs were. If you're on an older CPU,
you might be able to get a much faster CPU in the cloud, and that might provide
an opportunity to downscale your VM and still see the performance you expect.
What companies really need to know are their IOPS and I/O throughput numbers.
And, if they're moving from a high availability configuration on-prem to a high
availability configuration in the cloud, it will be important to know network
throughput demands, especially if using synchronous replication services to
keep your SQL Server data synchronized across availability zones.
Dave Bermingham: For customers moving to the cloud, what's the first thing
they should do on a new high availability infrastructure?
Joey D'Antoni: Test your production workload at scale in the cloud before
you actually migrate. Don't just run a handful of queries and say, "Yeah,
it works." Figure out how to run your SQL Server instance at scale and see
how it performs before you actually migrate.
Dave Bermingham: Recommendations for how best to do that?
Joey D'Antoni: Generating the user load is always the hard part, but if
you can do it programmatically that's fine. I love doing synthetic benchmarks
because it gets you a good idea of what the performance is going to look like,
but you never know how the SQL code is going to perform until you run it.
Dave Bermingham: When ensuring SQL Server high availability in the cloud,
is there anything you do differently from what you'd do in an on-prem scenario?
Mike Walsh: Failover tools,
whether that's Availability Groups in SQL Server or SANless Clustering
services, tend to be very sensitive to network blips, and they are more
frequent in the cloud. Consequently, we proactively increase various failover
timeout settings-cross-subnet, same subnet, and others-in a cloud-based high
availability configuration to avoid unnecessary failovers. We don't want
"boy who cried wolf" failovers, but those will happen more frequently
in the cloud unless we proactively take steps to avoid them.
Melody Zacharias: If you're moving
legacy SQL applications to the cloud, make sure that you build in the ability
to reconnect after failover. Many legacy applications may not even have this
ability to reconnect automatically over the network. It might not have been an
issue on-prem, but you need to make sure you've got a way to make that happen
if you're moving to the cloud.
Allan Hirt:
I find that what we do similarly or differently depends on the cloud and the
region. In Azure, for example, there are certain high availability features
that are only available in certain regions or certain zones. Some things work
with sets, some things do not-and all that influences how you configure for
performance and availability. So, if a customer is looking at a specific
feature in the cloud, that may require you to rethink the way you're going to
ensure high availability.
Dave Bermingham: What are your go-to tips to optimize availability of an
application in a cloud environment?
Joey D'Antoni: To me, one of the most important things is to understand
the failure mechanisms of your cloud and try to understand what single points
of failure you have. You need to think about it holistically.
Allan Hirt:
You really need to look at the service level
agreements (SLAs) of all the services you're using. Your VMs may be configured
for 99.99% availability, but if your SQL Server database is sitting on a
storage solution whose SLA only guarantees 99.9% availability for writes, then
the effective SLA for your solution is only 99.9%. That's where you'd want to
move to a different kind of storage solution that provides an SLA that is
consistent with the 99.99% SLA you have with the VMs.
Dave Bermingham: Are there any cloud specific tools that you recommend to
help identify performance issues?
Melody Zacharias: Azure Monitor, that's a good one.
Joey D'Antoni: Agreed. I really
like the tools Azure provides. I'll add this: If you're on SQL 2016 or higher
and you're not using the query store, you're missing out on a lot of great
data. The other thing I'll mention, tying into Melody's comment on the
Azure Monitor component, is Microsoft's SQL Insights.
SQL Insights captures a lot of performance data and dumps it into a log
analytics account, which you can view in a nice dashboard. It's an easy way to
get a lot of insight about your instances and databases.
Mike Walsh: I think an APM [application performance monitoring] tool is
important. Sometimes the problem is the performance of the database, but other
times it's not at all, and an APM tool can help you pinpoint where you're
really encountering performance problems. New Relic is one, but there are a
number of good APM tools out there.
Dave Bermingham: Final question: What improves the application performance
in the cloud?
Allan Hirt:
Knowledge is key. If you don't know what your
application is doing and you haven't historically tracked where it is
trending-in terms of storage trends, IOPS, CPU cycles, and the like-it's going
to be very hard to size your environment properly in the cloud. That said, the
ability to scale resources in the cloud-up or down-contributes greatly to
application performance. Compared to an on-prem environment, the cloud is much
more responsive and can enable you to scale your environment instantly if
that's what is required to ensure on-going performance.
Mike Walsh:
In a lot of cases, performance improvement still depends on the basics. Fix
your configurations. That's easy. That doesn't require developers and QA. Fix
your VMs. Fix Windows. Look for your resource bottlenecks. Do your weight stats
analysis with whatever tool you want to use. Find your bad queries. Fix it with
indexes. Fix it with antipattern fixes. And try to stay away from changing
application and database designs as much as you can. We often spend the first
half of a consulting engagement-even with big companies with full IT
teams-fixing basic things, and it makes a world of difference. If you're moving
to the cloud, fix those things now, for it will make your migration much
easier.
Melody Zacharias: Just to piggyback on what Mike was saying, there are a
lot of tools that can help you find and fix these things, and that really is
important. You don't have to be an expert in T-SQL scripts to do this, either.
There are tons of free tools from Microsoft-which you can use even if you're
not going to the Microsoft cloud-that will help you find your bad code or code
that needs to be upgraded, that will help you determine what VMs to upgrade to,
and so on. I'm a big proponent of uncomplicating things and doing it the easy
way, so use the easy things that are available to you for free.
Joey D'Antoni: I agree with everything everyone has said and add only
this: Indexes are magic. You probably need more indexes rather than fewer
indexes. Secondly, use data compression. Data compression has been available on SQL Server Standard Edition since
2016 SP1. It improves memory and disk utilization, and these days, given that
it was designed for CPUs that were in use 15 years ago, it involves minimal CPU
overhead. So, it's well worth it to use compression to improve performance.
##