Measuring the Effect of Windows Guest Server Optimization
A Contributed Article by Bob Nolan, president and CEO of Raxco Software
I recently attended
VMWorld in Las Vegas and an impressive array of virtualization-related
technology was on display.
It was
evident that virtualization has grown to the point where an entire industry is
evolving to support the offerings of VMware, Microsoft, Citrix and other
hypervisor providers.
While infrastructure
virtualization has been adopted by the IT community, a quick walk around the
VMWorld Solutions Expo would suggest it is not a panacea for all your IT
problems.
Companies like VKernel, Veeam and
Solarwinds were hyping performance monitoring tools that aid in troubleshooting
VMware performance bottlenecks. These are the same types of problems system
administrators have dealt with on physical servers for years, mainly resource
contention issues.
In discussions with vendor representatives, I learned that
the performance bottlenecks being analyzed referred mostly to the hypervisor
layer and storage. These products sort out information coming from vCenter and
provide graphic displays and recommendations about what to do. The interesting
thing about this is that neither VMware nor any of the performance monitoring
tools touches the Windows guest system. The
guest server is where the work is being done: consuming CPU, memory and I/O. When
there are resource contention issues, the Windows guest OS is often the source
of the problem. Let’s look at how Windows file system behavior can adversely
impact hypervisor overhead, disk latency, queue utilization and overall
performance in a way that would be missed, or misinterpreted by performance
monitoring tools.
The Windows file system (NTFS) is notorious for fragmenting
files and free space. This is well documented in numerous blog articles and in
Microsoft’s TechNet with articles like this one on Optimizing NTFS (http://technet.microsoft.com/en-us/library/cc767961.aspx)
and this one on Disk Fragmentation and System Performance (http://blogs.technet.com/b/askperf/archive/2008/03/14/disk-fragmentation-and-system-performance.aspx)
that highlight the negative impact fragmentation has on performance in a
standalone environment. When you run
multiple instances of Windows Server on the same host, there is the potential
for resource contention on that host to increase dramatically.
I spoke with SAN vendors,
system integrators and a lot of VMware users that just did not understand how
or why optimization of Windows guest systems would make any difference to virtualization
performance. In many of my discussions, the term “defragmenting the SAN” was
used and this is a bit of a misnomer.
Disk optimization is performed on the Windows OS in the guest system,
which is unaware of the underlying storage hardware.
How NTFS Works
The key is in understanding how NTFS works. When a disk is
recognized by a Windows system, two pieces of information are passed along: the
size of the disk and the cluster size. Windows creates a bitmap file ($Bitmap)
comprised of enough bits to represent the disk. For example: A 100GB drive with a 4K cluster size would
produce a $Bitmap file of about 25,000,000 bits, where each bit represents 4K
of space. Now, let’s say a user creates
a 1GB file. NTFS creates a record for the file in the Master File Table (MFT)
and then asks the $Bitmap file for 1GB of space. If NTFS can find 1GB of contiguous
space, a single entry is made in the Extent List of the MFT record. This entry
contains the starting Logical Cluster Number (LCN) and the length of the space
$Bitmap allocated. The way NTFS sees it, this file is contiguous because it is
in one logical piece. NTFS conveys this address
information and the data to the storage controller in the form of a single SCSI
command. The SCSI command is mapped to the physical disks in the array by the
controller software. A single SCSI command will map to the physical disks in anywhere
from one to several physical I/O, depending on the controller software as shown
in Figure 1.

In a second scenario,
the same 1GB file is created and the same record is created in the MFT. The difference is that when space is
requested from $Bitmap, it returns 100 fragments of various sizes to
accommodate the file. This necessitates 100 entries in the Extent List of the
MFT record. Since there is more than one extent, NTFS sees this as a fragmented
file. NTFS conveys this address information
and the data to the storage controller as 100 individual SCSI commands. Each
SCSI command is independently mapped to the physical disks in the array by the
controller. A hundred SCSI commands
will result in a minimum of 100 physical accesses to the disks depending on the
controller software as shown here in Figure 2.

Logical fragmentation starts with the Windows guest system.
An increase in NTFS fragmentation on the guest server increases the number
of SCSI commands needed to move the same amount of data across the
virtualization layer. This increase in SCSI commands correspondingly increases
the volume of physical I/O to the disks in the array with an impact on disk
latency and throughput.
The Benefits of
Windows Guest Optimization
The Windows guest server is the primary workhorse in most
virtualized environments, but measuring the impact of Windows behavior on
virtualization performance is not easy. We used VMware’s vscsiStats to compare disks
optimized with Raxco’s PerfectDisk vSphere to baseline disks that were not
optimized. The results showed improvements in some key performance metrics.
Lower Hypervisor
Overhead
From previous discussions with VMware performance personnel,
it is clear they believe the fewer IOPS that go across the virtualization
storage stack the better. Each IOP represents overhead for the hypervisor, so
reducing SCSI commands reduces that overhead.
In the illustrations above, we see that a contiguous file on a
virtualized server would send one SCSI command across the stack while the same
file in 100 fragments will send 100 SCSI commands. This is a dramatic increase
in workload for the virtualization layer to access the same file. In testing
performed last year with the vscsiStats utility we found that contiguous files
reduced the number of I/O across the stack by 28%. This is a significant reduction in IOPS with
a corresponding reduction in overhead for the hypervisor, freeing additional
CPU and memory resources.
Larger I/O Packets
It stands to reason that if you perform the same work with
fewer IOPS, then each IOP must be bigger. Contiguous files facilitate larger
I/O. Going back again to the
illustrations above we see how a single SCSI command results in just a few
physical I/O to the disk. The larger I/O enables the transfer of more data and
better disk mapping at the controller level.
In the referenced test report, the vscsiStats utility measured the number
of I/O across several bucket sizes (the largest bucket being I/O greater than
524K). The test results showed that the optimized disks produced 12 times as
many of these larger I/O than the un-optimized disks (2959 v. 247).
Disk Latency
Improvement
Optimizing the
Windows guests produces fewer and larger I/O.
This in turn has an effect on disk latency, the time it takes an I/O to
complete. Figure 1 shows a contiguous
file can be mapped by the controller in as few as one and possibly just a few
physical I/O to the disk. When all the data comes in a single SCSI command the
controller software can map the data to the disks more efficiently provided
there is sufficient contiguous free space.
VMware
has indicated organizations should be concerned about I/O taking more than 15ms
to complete and there is a real performance problem if an I/O takes over 30ms.
Since optimized Windows guests generate fewer physical I/O, disk latency
improves. Going back to the referenced report we find that vscsiStats showed the
total number of I/O taking longer than 30ms was reduced by a whopping 49%.
This vscsiStats histogram shows Windows guest optimization reduced
the numbers of I/O in every time slice from 1ms to more than 100ms by almost
50%.
Better Sequential
I/O
When
disk controllers receive larger I/O, they issue more sequential I/O, provided
there is adequate consolidated free space.
Sequential I/O saves resources by reducing the number of separate I/O requests. The test report showed that on average the
optimized Windows guest had only one logical block separating each I/O 32.3% of
the time, while on the un-optimized disks it occurred 21.3% of the time, a 51%
improvement.
This vscsiStats histogram shows the improvement in distance
between successive commands across the spectrum of the disks.
Improved Queue
Utilization
Queue utilization is adversely affected by the size and
volume of I/O between the HBA and the LUN. VMware introduced Storage I/O
Control (SIOC) to mitigate some of these problems throttling throughput to
improve disk latency. As we have seen in the statistics here, Windows guest
optimization produces fewer and larger I/O which in turn can lead to better
queue optimization and utilization.
Better System
Throughput
The vscsiStats do a great job of measuring and sorting all
of the I/O that come through the virtualization layer. These statistics are very
helpful in assisting system administrators to understand the problems that pop
up in their environment. One telling statistic we observed in our testing was
measured with a stopwatch and that was the elapsed time. The optimized disks completed the tests 28%
faster than the un-optimized disks. The combined effect of fewer and larger
I/O, a reduction in the slowest I/O, an improvement in sequential I/O delivered
this result. This is good news for the system administrator.
SANs and RAID Arrays
As this article strives to point out, disk optimization
occurs in the Windows guest and it is independent of the underlying storage. The goal of optimization in a virtualized
environment is twofold:
·
Reduce the number of SCSI commands across the
storage stack
·
Reduce the physical I/O to the disk that results
when files are fragmented.
The storage disk controller processes the incoming SCSI
commands and determines which disks the data will be written to and how many
writes that will entail. Disk optimization software does not instruct a SAN or
RAID configuration on where or how to write data to the disks; it can only
reduce the number of SCSI commands the SAN or RAID controller has to process.
Since returning from VMWorld, we worked with a prospect with
a SAN-hosted 805GB VMDK with 5,000,000+ total files and 485,000 fragmented
files in 6,800,000+ fragments. This situation is due to the behavior of NTFS in
the Windows guest and has nothing to do with how the SAN writes these files to
the array. Optimizing this VMDK would
reduce SCSI traffic to the SAN controller and the number of physical I/O needed
to access the files. The net effect would be a reduction in virtualization
overhead and improved performance from the VMDK and on the SAN.
Summary
Virtualization offers
corporate IT a host of benefits in terms of lower hardware and energy costs, a
smaller footprint for data centers and fewer personnel to manage the
systems. However, these benefits can
only be realized if the guest systems are properly maintained. There are
several things administrators can do to mitigate resource contention in their
virtual environments. Some key steps
would include:
- Pre-virtualization Optimization- As one VMware consultant put
it; “If you virtualize poorly performing servers you get poorly performing
virtual servers”. Optimizing before
virtualization also gives you a smaller image since the files aren’t
scattered all over the disk and, with PerfectDisk, you also get
consolidated free space which slows file re-fragmentation.
- Use vscsiStats to analyze resource bottlenecks http://communities.vmware.com/docs/DOC-10095
. The vscsiStats utility comes with VMware and it is extremely useful in
identifying problems. In particular, look at the distribution of I/O by
size and disk latency. If you see lots of small I/O this might be an
indicator of a fragmented vmdk. If the same disk also shows lots of I/O
taking more than 30ms to complete you have a performance problem that warrants
investigation
- Maintain Windows Guest System Health- Virtualized servers will fragment files
and free space with use just like their physical counterparts. A proactive
program that keeps virtual servers optimized can go a long way to reduce
the potential for I/O bottlenecks, minimize CPU and memory consumption and
streamline throughput on the host.
Disk
optimization will not resolve all resource contention issues; but it can save
valuable time in troubleshooting I/O problems if you know that file and free
space fragmentation are not possible culprits.
Just like you take measures to prevent viruses or malware, proactive disk
optimization eliminates Windows guest system behavior as a potential problem
source, increases end user uptime and decreases helpdesk costs.
###
About the Author
Robert Nolan is the president and CEO of Raxco Software which is a
Microsoft Gold ISV and a VMware Elite Technical Alliance Partner. He
has over 25 years experience with system management software on multiple
platforms. Mr. Nolan's career includes positions in software
development, product management, sales and corporate management. He
frequently speaks at Windows User Groups, VMUGs and other venues about
virtualization performance.