Virtualization Technology News and Information
Article
RSS
How NTFS Causes IO Bottlenecks on Virtual Machines - Part 2

A Contributed Article By Robert Nolan, president and CEO of Raxco Software

Read Part 1 of How NTFS Causes IO Bottlenecks on Virtual Machines 

Part one of this article suggested that IO bottlenecks on virtual platforms may be due to the normal behavior of NTFS, the Windows file system.   In a virtual environment, where you have multiple copies of NTFS running on the same physical server, there is the potential for competition among VMs for access to the disk. Too much I/O results in disk latency issues, which causes throughput problems. 

As a recap, let's look at what is happening inside NTFS on each VM. When a VM is created, its virtual disk is formatted by NTFS and the volume index ($MFT) is created along with another metadata file called $Bitmap.  $Bitmap is a logical representation of the disk with one bit for each logical cluster on the disk. These bits are on/off based on whether the cluster is free or used.  When a user creates a file a record is created in $MFT and space is allocated by $Bitmap. If $Bitmap finds space in a single string of logical clusters, the starting address and its length are recorded in the file record Extent List in the $MFT. If $Bitmap cannot find space in a single string of logical clusters, it looks for space wherever it can find it until the whole file is allocated. The starting address and length of each piece of allocated space is recorded in the file record Extent List in the $MFT. Multiple extent entries in the $MFT means NTFS can fragment files before any user data is written to the disk.

It is important to remember that NTFS is unaware of its underlying disk technology. It doesn't know if it is running on IDE, SCSI, RAIDX or in a SAN.  NTFS allocates space for files based on availability as seen by the $Bitmap file, which is a logical map of the partition. The information in the $MFT Extent List represents where $Bitmap found enough space to allocate the file.

When a user needs a file, the $MFT is read and the logical address data in the file Extent List is passed to the storage controller. The controller maps these logical addresses to physical blocks on the disk.   The storage controller writes the file and updates its own index. Any subsequent access to the file (read or write) requires passing the entries in the Extent List to the disk controller.  If NTFS allocates the file in a single extent, it will pass a single entry to the disk controller. If NTFS allocates the file in multiple extents, it must pass each extent to the disk controller. The controller maps each extent as it is received.  A file in 2000 extents requires 2000 logical IO, while a file in a single extent requires one logical IO.

So, why would $Bitmap allocate space in multiple extents? $Bitmap is created when the disk is formatted and it simply indicates whether a logical cluster is used or free.  Installing Windows Server and the applications to be used on the VM uses a good deal of the disk space.  If you look at a disk after the software installation is complete, you will discover thousands of fragmented files. Even more significant is that the remaining free space on the disk is also fragmented. This is due to installations cleaning up and deleting their temporary files. Once the free space is fragmented, $Bitmap has no choice but to create multiple extents when it cannot locate sufficient contiguous space to allocate to a file.  Free space plays a key role in how NTFS behavior affects VM performance.

Mr. David Goebel was one of the four original Microsoft engineers who wrote NTFS.  In 2008, Mr. Goebel wrote a white paper that examined the impact of contiguous free space on NTFS performance.  His tests looked at two identical disks, one with fragmented files and free space and the other with contiguous files and free space.  The SYSMark benchmark test suite was run against both disks. We know that a logically contiguous file results in a single IO request from the file system to the disk controller. Ideally, the mapping of the file system data to the physical disk would be done in a single physical IO. Mr. Goebel developed a driver that counted each incoming logical IO request and its corresponding physical disk seeks.  When a logical IO generated more than one physical seek, Goebel counted the extra seeks as "wasted seeks".  In Figure 1, we see how the controller  breaks logical IO from a VM into multiple physical seeks to the disk.

Goebel's test results compared the two disks and counted the "wasted seeks".  Figure 2 shows that the disk with fragmented free space wasted 769,963 physical seeks.  In the first installment of this article, we noted that VMware considers disk latency in excess of 15ms a cause for concern. If we apply a 15ms access time here, the wasted seeks on the disk with fragmented free space would take 192 minutes to complete, while the wasted seeks on the disk with consolidated free space would take just 45 minutes.  The entire Goebel white paper can be read at http://www.balder.com/ImpactofFreeSpaceConsolidation.pdf.

 

In Goebel's test, the difference in elapsed IO completion time between the two disks was 147 minutes.   That is over 2 hours of unnecessary IO load. If we assume this is happening on 5 VMs hosted on one physical server, you can see how IO bottlenecks can develop.  RAID and SAN configurations will break a single logical IO into multiple physical IO as they spread data across multiple disks in an effort to optimize seeks. The one thing the RAID and SAN configurations cannot do is reduce the number of logical IOs which originate with NTFS in each Windows guest.

We took the principles of Goebel's testing and set up a similar test on an ESX cluster. We took a disk with fragmented files and free space and imaged it; then we defragmented the files and free space on the image.  Five VMs were set up on the cluster with access to the disk with fragmented files and free space and MS Office and MS SQL were installed. 

The test IO metrics were captured with VMware's vscsiStats utility, which counts and sorts every IO into buckets based on size, latency and other criteria.  The same test was then repeated on the VMs using the disk with defragmented files and free space.  The full VMware test report is available at http://www.raxco.com/user_data/white_papers/vmware_multi_test_new.pdf.  The results showed the disks with the defragmented files and free space averaged:

  • 29% reduction in total IO traversing the virtual storage stack
  •  49% improvement in disk latency
  •  58% improvement in sequential IO
  • 12x improvement in large IO (>524K)

A general mantra for virtualization should be "Less IO is better".   Virtualization platforms incur overhead when IO traverses the storage stack.   As we have shown, the normal NTFS behavior creates multiple logical IOs to the disk controller. The disk controller is likely to generate multiple physical IOs as it maps each logical IO to the disk, increasing the total number of IOs through the system each time the file is accessed. This physical IO load increases disk latency which in turn creates IO bottlenecks and chokes throughput.

While not a panacea for all virtualization IO ills, maintaining the IO health of Windows guest systems would be a first step in reducing any IO contention issues.  If the IO coming from the guest is optimized and you are still having issues, you can be pretty sure it is an infrastructure problem and not simply IO volume.  Storage engineers can then concentrate on the real problem affecting performance.

So, how do you get NTFS to reduce the number of logical IOs it is sending the storage controller? There are two possible solutions.

  • Reduce the number of VMs on the host, or
  • Defragment the VMs

For most, defragmenting the guests VMs is the most practical solution.  All disk defragmentation occurs at the file system level;  defragmentation software looks at a file's Extent List and reduces the number of extents to a single entry in each $MFT record.  Defragmentation software does this by using a Microsoft API that coordinates the defrag software strategy with NTFS, the memory manager and the cache manager. This is why defragmentation software can safely move open files.  Since defragmentation works at the file system level, it does not undo the work of any RAID or SAN controller and it has no affect on LUNs.   In virtual environments, it works discretely within each VM, reorganizing the file system's view of the disk at the logical level, not the physical.   

After defragmentation, most of the files on a VM can be accessed with a single logical IO; this means less physical IO to the disk.  In addition, consolidated free space allows new files to be created in a single logical IO, thereby reducing the "wasted seeks" as evidenced in Goebel's test.

In summarizing the two parts of this paper, we have shown:

  • NTFS can fragment a file before any user data is written to disk
  • Each file extent in the $MFT is a logical IO to the disk controller
  • Each logical IO is likely to produce multiple physical IOs when being written to the disk (wasted seeks)
  • NTFS behavior in multiple VMs on the same physical server can stress IO resources and impact CPU and memory
  • The most effective way to reduce logical IOs is to defragment the guests
  • Test results show defragmenting guests delivers dramatic improvements in throughput, disk latency and sequential IO
  • Defragmentation does not interfere with the physical storage on disk; the controller software decides where files physically are stored.

Virtualization offers tremendous economies of scale to organizations of all sizes. Lower energy costs, smaller data centers, fewer support personnel, and less downtime are some of the obvious benefits. Like any technology, virtualization needs proper maintenance if it is to deliver on these savings. As amazing as virtualization is, it still needs to deal with the workload presented to it by the guest machines properly maintaining the health of Windows guest VMs and you will get more from your virtualization investment dollar.  

While Windows provides a disk defragmenter, it is not well-suited for use with virtual platforms. The Windows utility has no virtualization awareness, no central management, poor free space consolidation, and it does not move some key (and very large) system files.  There are third-party solutions available that are virtualization aware, have a more robust feature set and are easily managed from a central console.  

About the Author

Robert Nolan is the president and CEO of Raxco Software which is a Microsoft Gold ISV and a VMware Elite Technical Alliance Partner.  He has over 25 years experience with system management software on multiple platforms.  Mr. Nolan's career includes positions in software development, product management, sales and corporate management. He frequently speaks at Windows User Groups, VMUGs and other venues about virtualization performance.

Published Wednesday, December 22, 2010 7:00 AM by David Marshall
Comments
Twitter Trackbacks for How NTFS Causes IO Bottlenecks on Virtual Machines - Part 2 : VMblog.com - Virtualization Technology News and Information for Everyone [vmblog.com] on Topsy.com - (Author's Link) - December 22, 2010 7:59 AM
How NTFS Causes IO Bottlenecks on Virtual Machines : VMblog.com - Virtualization Technology News and Information for Everyone - (Author's Link) - December 23, 2010 9:16 PM
Technology Short Take #9 - blog.scottlowe.org - The weblog of an IT pro specializing in virtualization, storage, and servers - (Author's Link) - December 29, 2010 11:46 AM
How NTFS Causes IO Bottlenecks on Virtual Machines – Part 2 ??? Raxco Software Blog - (Author's Link) - June 11, 2012 9:28 AM
SharePoint on Virtual Machines – Is Disk Defragmenting necessary for Performance? | SharePoint 2010 Performance Blog - (Author's Link) - June 12, 2012 12:53 PM
SharePoint on Virtual Machines - Is Disk Defragmenting necessary for Performance? | Keith Tuomi - SharePoint Server MVP - (Author's Link) - February 13, 2015 4:06 PM
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
top25
Calendar
<December 2010>
SuMoTuWeThFrSa
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678