Virtualization Technology News and Information
Article
RSS
Measuring the Effect of Windows Guest Server Optimization

Measuring the Effect of Windows Guest Server Optimization

A Contributed Article by Bob Nolan, president and CEO of Raxco Software

I recently attended VMWorld in Las Vegas and an impressive array of virtualization-related technology was on display.  It was evident that virtualization has grown to the point where an entire industry is evolving to support the offerings of VMware, Microsoft, Citrix and other hypervisor providers.  While infrastructure virtualization has been adopted by the IT community, a quick walk around the VMWorld Solutions Expo would suggest it is not a panacea for all your IT problems.  Companies like VKernel, Veeam and Solarwinds were hyping performance monitoring tools that aid in troubleshooting VMware performance bottlenecks. These are the same types of problems system administrators have dealt with on physical servers for years, mainly resource contention issues. 

In discussions with vendor representatives, I learned that the performance bottlenecks being analyzed referred mostly to the hypervisor layer and storage. These products sort out information coming from vCenter and provide graphic displays and recommendations about what to do. The interesting thing about this is that neither VMware nor any of the performance monitoring tools touches the Windows guest system.  The guest server is where the work is being done: consuming CPU, memory and I/O.   When there are resource contention issues, the Windows guest OS is often the source of the problem. Let’s look at how Windows file system behavior can adversely impact hypervisor overhead, disk latency, queue utilization and overall performance in a way that would be missed, or misinterpreted by performance monitoring tools.

The Windows file system (NTFS) is notorious for fragmenting files and free space. This is well documented in numerous blog articles and in Microsoft’s TechNet with articles like this one on Optimizing NTFS (http://technet.microsoft.com/en-us/library/cc767961.aspx) and this one on Disk Fragmentation and System Performance (http://blogs.technet.com/b/askperf/archive/2008/03/14/disk-fragmentation-and-system-performance.aspx) that highlight the negative impact fragmentation has on performance in a standalone environment.  When you run multiple instances of Windows Server on the same host, there is the potential for resource contention on that host to increase dramatically.

 I spoke with SAN vendors, system integrators and a lot of VMware users that just did not understand how or why optimization of Windows guest systems would make any difference to virtualization performance. In many of my discussions, the term “defragmenting the SAN” was used and this is a bit of a misnomer.  Disk optimization is performed on the Windows OS in the guest system, which is unaware of the underlying storage hardware.

How NTFS Works

The key is in understanding how NTFS works. When a disk is recognized by a Windows system, two pieces of information are passed along: the size of the disk and the cluster size. Windows creates a bitmap file ($Bitmap) comprised of enough bits to represent the disk. For example:  A 100GB drive with a 4K cluster size would produce a $Bitmap file of about 25,000,000 bits, where each bit represents 4K of space.  Now, let’s say a user creates a 1GB file. NTFS creates a record for the file in the Master File Table (MFT) and then asks the $Bitmap file for 1GB of space. If NTFS can find 1GB of contiguous space, a single entry is made in the Extent List of the MFT record. This entry contains the starting Logical Cluster Number (LCN) and the length of the space $Bitmap allocated. The way NTFS sees it, this file is contiguous because it is in one logical piece.  NTFS conveys this address information and the data to the storage controller in the form of a single SCSI command. The SCSI command is mapped to the physical disks in the array by the controller software. A single SCSI command will map to the physical disks in anywhere from one to several physical I/O, depending on the controller software as shown in Figure 1.

 


 

 

In a second scenario, the same 1GB file is created and the same record is created in the MFT.  The difference is that when space is requested from $Bitmap, it returns 100 fragments of various sizes to accommodate the file. This necessitates 100 entries in the Extent List of the MFT record. Since there is more than one extent, NTFS sees this as a fragmented file.  NTFS conveys this address information and the data to the storage controller as 100 individual SCSI commands. Each SCSI command is independently mapped to the physical disks in the array by the controller.   A hundred SCSI commands will result in a minimum of 100 physical accesses to the disks depending on the controller software as shown here in Figure 2.

Logical fragmentation starts with the Windows guest system. An increase in NTFS fragmentation on the guest server increases the number of SCSI commands needed to move the same amount of data across the virtualization layer. This increase in SCSI commands correspondingly increases the volume of physical I/O to the disks in the array with an impact on disk latency and throughput.  

 

The Benefits of Windows Guest Optimization

The Windows guest server is the primary workhorse in most virtualized environments, but measuring the impact of Windows behavior on virtualization performance is not easy. We used VMware’s vscsiStats to compare disks optimized with Raxco’s PerfectDisk vSphere to baseline disks that were not optimized. The results showed improvements in some key performance metrics.

Lower Hypervisor Overhead

From previous discussions with VMware performance personnel, it is clear they believe the fewer IOPS that go across the virtualization storage stack the better. Each IOP represents overhead for the hypervisor, so reducing SCSI commands reduces that overhead.  In the illustrations above, we see that a contiguous file on a virtualized server would send one SCSI command across the stack while the same file in 100 fragments will send 100 SCSI commands. This is a dramatic increase in workload for the virtualization layer to access the same file. In testing performed last year with the vscsiStats utility we found that contiguous files reduced the number of I/O across the stack by 28%.  This is a significant reduction in IOPS with a corresponding reduction in overhead for the hypervisor, freeing additional CPU and memory resources.

Larger I/O Packets

It stands to reason that if you perform the same work with fewer IOPS, then each IOP must be bigger. Contiguous files facilitate larger I/O.  Going back again to the illustrations above we see how a single SCSI command results in just a few physical I/O to the disk. The larger I/O enables the transfer of more data and better disk mapping at the controller level.  In the referenced test report, the vscsiStats utility measured the number of I/O across several bucket sizes (the largest bucket being I/O greater than 524K). The test results showed that the optimized disks produced 12 times as many of these larger I/O than the un-optimized disks (2959 v. 247).

Disk Latency Improvement

Optimizing the Windows guests produces fewer and larger I/O.  This in turn has an effect on disk latency, the time it takes an I/O to complete.  Figure 1 shows a contiguous file can be mapped by the controller in as few as one and possibly just a few physical I/O to the disk. When all the data comes in a single SCSI command the controller software can map the data to the disks more efficiently provided there is sufficient contiguous free space.

VMware has indicated organizations should be concerned about I/O taking more than 15ms to complete and there is a real performance problem if an I/O takes over 30ms. Since optimized Windows guests generate fewer physical I/O, disk latency improves. Going back to the referenced report we find that vscsiStats showed the total number of I/O taking longer than 30ms was reduced by a whopping 49%.

 

 

This vscsiStats histogram shows Windows guest optimization reduced the numbers of I/O in every time slice from 1ms to more than 100ms by almost 50%.

 

Better Sequential I/O

When disk controllers receive larger I/O, they issue more sequential I/O, provided there is adequate consolidated free space.   Sequential I/O saves resources by reducing the number of separate I/O requests.  The test report showed that on average the optimized Windows guest had only one logical block separating each I/O 32.3% of the time, while on the un-optimized disks it occurred 21.3% of the time, a 51% improvement.

 

 

 

This vscsiStats histogram shows the improvement in distance between successive commands across the spectrum of the disks.

Improved Queue Utilization

Queue utilization is adversely affected by the size and volume of I/O between the HBA and the LUN. VMware introduced Storage I/O Control (SIOC) to mitigate some of these problems throttling throughput to improve disk latency. As we have seen in the statistics here, Windows guest optimization produces fewer and larger I/O which in turn can lead to better queue optimization and utilization.

Better System Throughput

The vscsiStats do a great job of measuring and sorting all of the I/O that come through the virtualization layer. These statistics are very helpful in assisting system administrators to understand the problems that pop up in their environment. One telling statistic we observed in our testing was measured with a stopwatch and that was the elapsed time.  The optimized disks completed the tests 28% faster than the un-optimized disks. The combined effect of fewer and larger I/O, a reduction in the slowest I/O, an improvement in sequential I/O delivered this result. This is good news for the system administrator.

 

SANs and RAID Arrays

As this article strives to point out, disk optimization occurs in the Windows guest and it is independent of the underlying storage.  The goal of optimization in a virtualized environment is twofold:

·         Reduce the number of SCSI commands across the storage stack

·         Reduce the physical I/O to the disk that results when files are fragmented.

The storage disk controller processes the incoming SCSI commands and determines which disks the data will be written to and how many writes that will entail. Disk optimization software does not instruct a SAN or RAID configuration on where or how to write data to the disks; it can only reduce the number of SCSI commands the SAN or RAID controller has to process.

Since returning from VMWorld, we worked with a prospect with a SAN-hosted 805GB VMDK with 5,000,000+ total files and 485,000 fragmented files in 6,800,000+ fragments. This situation is due to the behavior of NTFS in the Windows guest and has nothing to do with how the SAN writes these files to the array.  Optimizing this VMDK would reduce SCSI traffic to the SAN controller and the number of physical I/O needed to access the files. The net effect would be a reduction in virtualization overhead and improved performance from the VMDK and on the SAN.

Summary

Virtualization  offers corporate IT a host of benefits in terms of lower hardware and energy costs, a smaller footprint for data centers and fewer personnel to manage the systems.  However, these benefits can only be realized if the guest systems are properly maintained. There are several things administrators can do to mitigate resource contention in their virtual environments.  Some key steps would include:

  • Pre-virtualization Optimization- As one VMware consultant put it; “If you virtualize poorly performing servers you get poorly performing virtual servers”.  Optimizing before virtualization also gives you a smaller image since the files aren’t scattered all over the disk and, with PerfectDisk, you also get consolidated free space which slows file re-fragmentation.
  • Use vscsiStats to analyze resource bottlenecks http://communities.vmware.com/docs/DOC-10095 . The vscsiStats utility comes with VMware and it is extremely useful in identifying problems. In particular, look at the distribution of I/O by size and disk latency. If you see lots of small I/O this might be an indicator of a fragmented vmdk. If the same disk also shows lots of I/O taking more than 30ms to complete you have a performance problem that warrants investigation
  • Maintain Windows Guest System Health-   Virtualized servers will fragment files and free space with use just like their physical counterparts. A proactive program that keeps virtual servers optimized can go a long way to reduce the potential for I/O bottlenecks, minimize CPU and memory consumption and streamline throughput on the host.
Disk optimization will not resolve all resource contention issues; but it can save valuable time in troubleshooting I/O problems if you know that file and free space fragmentation are not possible culprits.  Just like you take measures to prevent viruses or malware, proactive disk optimization eliminates Windows guest system behavior as a potential problem source, increases end user uptime and decreases helpdesk costs.

###

About the Author 

Robert Nolan is the president and CEO of Raxco Software which is a Microsoft Gold ISV and a VMware Elite Technical Alliance Partner. He has over 25 years experience with system management software on multiple platforms. Mr. Nolan's career includes positions in software development, product management, sales and corporate management. He frequently speaks at Windows User Groups, VMUGs and other venues about virtualization performance.

Published Wednesday, October 26, 2011 5:30 AM by David Marshall
Filed under:
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<October 2011>
SuMoTuWeThFrSa
2526272829301
2345678
9101112131415
16171819202122
23242526272829
303112345