Virtualization Technology News and Information
Maintaining and Optimizing SQL Server Performance in a Cloud-based High Availability Environment: A Q&A with the Microsoft MVPs


In any of Microsoft's online communities, you'll find a lot of people offering thoughts, ideas, and opinions. But when people go into the community looking for actionable answers to questions they can't answer, those thoughts, ideas, and opinions are routinely greeted with a bit of skepticism-unless they're coming from someone Microsoft has recognized as a Most Valuable Professional (MVP). The MPVs responses? They carry weight. Those answers people are comfortable acting on.

So what do the MVPs have to say about maintaining and optimizing SQL Server performance in a cloud-based high availability environment? Dave Bermingham, Senior Technical Evangelist at SIOS Technology and a Microsoft Cloud and Datacenter Management MVP, sat down with Melody Zacharias, Senior Microsoft Solutions Manager at Pure Storage and a Microsoft MVP; Allan Hirt, Managing Partner at SQLHA and a Microsoft Cloud and Datacenter Management and Data Platform MVP & VMware vExpert; Joey D'Antoni, principal consultant at Denny Cherry and Associates Consulting and a Microsoft Data Platform MVP; and Mike Walsh, founder of Straight Path Solutions and a SQL Server Consultant and former Microsoft MVP to discuss that topic. This Q&A will highlight their thoughts and provide readers with actionable dos and don'ts for maintaining and optimizing performance in a cloud-based high availability environment.


Dave Bermingham: Let's start with fundamentals. What makes for a highly available solution in the cloud?

Allan Hirt: When it comes to high availability, there's no real difference between the cloud and on-premises. You're still looking for 99.99% availability. It's a little more abstracted in the cloud because there's more than one way to deploy in the cloud-including infrastructure-as-a-service (IaaS) offerings, where you're doing the work of configuring and maintaining SQL Server on top of a cloud provider's VM, storage, and networking infrastructure, and platform-as-a-service (PaaS) offerings where the cloud provider is providing SQL Server and managing nearly all aspects of your application infrastructure for you-but your expectations for HA remain the same.

Joey D'Antoni: Well, it might be important to add that high availability PaaS configurations may not always provide the flexibility that a customer might want because they may only be focused on ensuring high availability within a specific region. Some solutions may support multiple copies of storage; others may not. Some documentation from cloud providers is clearer than others when it comes to PaaS options and how they provide high availability.

Dave Bermingham: Are there performance issues companies should know about before they deploy SQL Server in a high availability cloud infrastructure?

Joey D'Antoni: Companies may encounter performance issues, but that's often because you do things differently in the cloud. Once you know that, you can make adjustments to achieve the expected performance. Some people are afraid of using SQL Server indexing in their on-prem deployments, for example, but it's really important to do that when you're running SQL Server in the cloud. Some people don't understand how cloud storage works and try to configure their cloud-based infrastructures the same way they always configured their on-prem storage infrastructures-and that won't be successful. They may need to stripe multiple disc devices together, choose a different VM, or perhaps over-allocate storage to get the throughput they are expecting.

Melody Zacharias: That's one of the biggest challenges that we see for people who are moving to the cloud. People think it's just going to be an apples to apples shift, but in reality it's more like turning an apple into an orange. Microsoft has recently deployed a workload planner for customers shifting to Azure, and that can help avoid some potential performance problems. Companies can model their SQL Server workloads in a high availability cloud setting and the workload planner will suggest which VM will work best to provide the desired performance in the cloud.

Mike Walsh: And picking the right VM is still only part of the story. A lot of SQL Server performance depends on throughput. Joey alluded to some best practices for configuring cloud storage, but many companies don't know what their throughput requirements are and so have no idea now to choose and configure cloud storage appropriately. This can create the impression of application performance issues in the cloud, but they're not really performance limitations of the cloud per se.

Allan Hirt: This is why it's so important for companies to do baselining and benchmarking before anything else. If you don't understand your application and performance footprint on-premises, you won't be able to determine how best to achieve the performance levels you expect in the cloud.

Dave Bermingham: What kind of baselining and benchmarking should companies do? And how should they do it?

Allan Hirt: When we help customers migrate to the cloud, we just use a set of PerfMon counters. You don't get anything fancy. The performance tools that are built into the box are fine.

Joey D'Antoni: I agree. You need to collect the on-prem performance data, but you don't need to go overboard. It's not really a whole lot of data. If you're moving from an on-prem VM to a cloud VM, you want to make sure you have the same amount of memory. You'll want to know what your CPU utilization was like on-prem, maybe know how old your CPUs were. If you're on an older CPU, you might be able to get a much faster CPU in the cloud, and that might provide an opportunity to downscale your VM and still see the performance you expect. What companies really need to know are their IOPS and I/O throughput numbers. And, if they're moving from a high availability configuration on-prem to a high availability configuration in the cloud, it will be important to know network throughput demands, especially if using synchronous replication services to keep your SQL Server data synchronized across availability zones.

Dave Bermingham: For customers moving to the cloud, what's the first thing they should do on a new high availability infrastructure?

Joey D'Antoni: Test your production workload at scale in the cloud before you actually migrate. Don't just run a handful of queries and say, "Yeah, it works." Figure out how to run your SQL Server instance at scale and see how it performs before you actually migrate.

Dave Bermingham: Recommendations for how best to do that?

Joey D'Antoni: Generating the user load is always the hard part, but if you can do it programmatically that's fine. I love doing synthetic benchmarks because it gets you a good idea of what the performance is going to look like, but you never know how the SQL code is going to perform until you run it.

Dave Bermingham: When ensuring SQL Server high availability in the cloud, is there anything you do differently from what you'd do in an on-prem scenario?

Mike Walsh: Failover tools, whether that's Availability Groups in SQL Server or SANless Clustering services, tend to be very sensitive to network blips, and they are more frequent in the cloud. Consequently, we proactively increase various failover timeout settings-cross-subnet, same subnet, and others-in a cloud-based high availability configuration to avoid unnecessary failovers. We don't want "boy who cried wolf" failovers, but those will happen more frequently in the cloud unless we proactively take steps to avoid them.

Melody Zacharias: If you're moving legacy SQL applications to the cloud, make sure that you build in the ability to reconnect after failover. Many legacy applications may not even have this ability to reconnect automatically over the network. It might not have been an issue on-prem, but you need to make sure you've got a way to make that happen if you're moving to the cloud.

Allan Hirt: I find that what we do similarly or differently depends on the cloud and the region. In Azure, for example, there are certain high availability features that are only available in certain regions or certain zones. Some things work with sets, some things do not-and all that influences how you configure for performance and availability. So, if a customer is looking at a specific feature in the cloud, that may require you to rethink the way you're going to ensure high availability.

Dave Bermingham: What are your go-to tips to optimize availability of an application in a cloud environment?

Joey D'Antoni: To me, one of the most important things is to understand the failure mechanisms of your cloud and try to understand what single points of failure you have. You need to think about it holistically.

Allan Hirt: You really need to look at the service level agreements (SLAs) of all the services you're using. Your VMs may be configured for 99.99% availability, but if your SQL Server database is sitting on a storage solution whose SLA only guarantees 99.9% availability for writes, then the effective SLA for your solution is only 99.9%. That's where you'd want to move to a different kind of storage solution that provides an SLA that is consistent with the 99.99% SLA you have with the VMs.

Dave Bermingham: Are there any cloud specific tools that you recommend to help identify performance issues?

Melody Zacharias: Azure Monitor, that's a good one.

Joey D'Antoni: Agreed. I really like the tools Azure provides. I'll add this: If you're on SQL 2016 or higher and you're not using the query store, you're missing out on a lot of great data. The other thing I'll mention, tying into Melody's comment on the Azure Monitor component, is Microsoft's SQL Insights. SQL Insights captures a lot of performance data and dumps it into a log analytics account, which you can view in a nice dashboard. It's an easy way to get a lot of insight about your instances and databases.

Mike Walsh: I think an APM [application performance monitoring] tool is important. Sometimes the problem is the performance of the database, but other times it's not at all, and an APM tool can help you pinpoint where you're really encountering performance problems. New Relic is one, but there are a number of good APM tools out there.

Dave Bermingham: Final question: What improves the application performance in the cloud?

Allan Hirt: Knowledge is key. If you don't know what your application is doing and you haven't historically tracked where it is trending-in terms of storage trends, IOPS, CPU cycles, and the like-it's going to be very hard to size your environment properly in the cloud. That said, the ability to scale resources in the cloud-up or down-contributes greatly to application performance. Compared to an on-prem environment, the cloud is much more responsive and can enable you to scale your environment instantly if that's what is required to ensure on-going performance.

Mike Walsh: In a lot of cases, performance improvement still depends on the basics. Fix your configurations. That's easy. That doesn't require developers and QA. Fix your VMs. Fix Windows. Look for your resource bottlenecks. Do your weight stats analysis with whatever tool you want to use. Find your bad queries. Fix it with indexes. Fix it with antipattern fixes. And try to stay away from changing application and database designs as much as you can. We often spend the first half of a consulting engagement-even with big companies with full IT teams-fixing basic things, and it makes a world of difference. If you're moving to the cloud, fix those things now, for it will make your migration much easier.

Melody Zacharias: Just to piggyback on what Mike was saying, there are a lot of tools that can help you find and fix these things, and that really is important. You don't have to be an expert in T-SQL scripts to do this, either. There are tons of free tools from Microsoft-which you can use even if you're not going to the Microsoft cloud-that will help you find your bad code or code that needs to be upgraded, that will help you determine what VMs to upgrade to, and so on. I'm a big proponent of uncomplicating things and doing it the easy way, so use the easy things that are available to you for free.

Joey D'Antoni: I agree with everything everyone has said and add only this: Indexes are magic. You probably need more indexes rather than fewer indexes. Secondly, use data compression. Data compression has been available on SQL Server Standard Edition since 2016 SP1. It improves memory and disk utilization, and these days, given that it was designed for CPUs that were in use 15 years ago, it involves minimal CPU overhead. So, it's well worth it to use compression to improve performance.


Published Monday, November 07, 2022 7:30 AM by David Marshall
Filed under: ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<November 2022>