A Contributed Article By Robert Nolan, president and CEO of Raxco Software
It should not surprise anyone that virtualized platforms can suffer from resource contention issues. Standalone servers often have performance problems, so why wouldn't performance be an issue when there are multiple instances of Windows Server running on the same physical machine? Just like their physical counterparts, performance issues on virtual machines stem from their competition for a finite pool of CPU, memory and disk resources. Of these, disk resources are the most important since the disk is the slowest component of the three. To the extent you are hammering away at the disk, you are also consuming excess CPU and memory, depriving other VMs access to these resources. In this two-part article, we will look at one potential source of IO problems, its impact on virtual performance and possible solutions.
At EMC World this past May, Mr. Scott Drummonds (formerly VMware's performance guru and now with EMC) gave a presentation entitled Performance Best Practices for VMware vSphere 4. Scott's presentation cites disk latency as your best indicator of a performance problem and suggests that latency between 15ms-30ms needs to be looked at, and over 30ms is a real problem. To combat this condition, VMware introduced Storage IO Control (SIOC). SIOC was designed to strike a balance between disk latency and throughput by limiting the VMs access to queue slots. Basically, this means trading throughput for better latency. Given what VMware could do to address this problem, SIOC was a practical workaround, but it doesn't solve the problem. If you have a headache, you want to take a remedy that makes it go away, not one that can cause dizziness, blurred vision or insomnia. At the end of the day, you want to get rid of the problem, not trade symptoms.
VMware viewed this IO problem seriously enough to develop SIOC. But why is disk latency increasing in the first place? I would like to suggest the source of resource contention in many virtualized environments is caused by the Windows NT File System (NTFS). In conversations with IT professionals, I have discovered that the relationship between NTFS and storage is misunderstood. There is usually a pretty good grasp of how the hardware works, but almost never an understanding of how the file system works. As a result, when IO performance issues arise, the tendency is to look at the storage as the problem. Look at it this way: the Windows guest is where the work is getting done. NTFS is in charge of creating files and allocating space, which generates logical IO activity to the disk controller. The disk controller takes what NTFS gives it and turns it into physical IO to the disk. If the IO load from the guest is creating a latency issue by saturating the queues, isn't it likely NTFS is the source?
To fully understand the relationship between NTFS and poor virtualization performance, we need to understand what is going on inside the Windows guest. The remainder of this discussion details the NTFS behavior that leads to IO bottlenecks and how this relates to disk latency and throughput issues.
Let's start with some basics. NTFS has no idea about its underlying disk technology. It doesn't know if it is IDE, SCSI, RAIDX or a SAN, and it has no idea if it is in a virtual environment or not. NTFS is hardware and virtualization agnostic.
NTFS needs two pieces of information to get started. It needs to know the size of the disk and the cluster size. It creates two metadata files we need to know about: $MFT and $Bitmap. The $MFT, or Master File Table, is the index to the volume and contains at least one record for every file on the disk. The $Bitmap file, as its name implies, is a logical representation of the disk space and it contains a bit for every logical cluster on the drive. The $Bitmap file indicates whether a logical cluster is used or free, and NTFS uses this information to allocate space to a file.
When a new file is created, NTFS generates a 1KB record for that file in the $MFT. This record contains the file name, file ID and other attribute data. Next, the $Bitmap file is accessed to allocate space for the file. NTFS searches the bitmap for enough free clusters to accommodate the size of the new file. For this discussion there are two possible outcomes from this search.
- $Bitmap finds enough space in a single contiguous string of logical clusters. Once this space is identified, the starting logical cluster number (LCN) and the Run Length (the number of consecutive clusters after the starting LCN) are recorded in the Extent List of the new files $MFT record. Or,
- $Bitmap cannot find enough space in a single contiguous string of logical clusters. In this case, $Bitmap allocates space wherever it finds it until sufficient space is found to accommodate the new file. Once this space is identified, the starting LCNs and the Run Lengths for each piece allocated are recorded in the Extent List of the new files $MFT record.
Figure 1 illustrates a portion of a $MFT record showing the Extent List entries. These entries are made after $Bitmap allocates the space. The highlighted area shows a single Extent List entry. The Virtual Cluster Number (VCN) indicates which piece of the file this extent is (1st, 2nd, 3rd etc.). The LCN is the starting logical cluster number of the extent and the run length is the number of contiguous clusters in that extent.
Figure 1- $MFT Extent List Entries
This file has 30 total extent entries and you can see 18 of them here. If we could freeze this system right here, we see that at this point in time NTFS sees this as a fragmented file (it's in 30 pieces) even though the user data has not yet been written to the disk.
The final step is to write the file to disk. NTFS reports the information from the $MFT record to the disk controller, including all of the extents in the Extent List and the user data. The controller software maps the file system data to physical blocks on the disk, writes the data to those blocks and updates its own index to point to the correct location. If the file has one extent, a single piece of data is passed to the controller. Our sample file in Figure 1 is in 30 extents, so it needs to pass 30 separate VCN, LCN and Run Lengths to the disk controller. A file in 2000 extents (not an unusual occurrence) would need to pass all 2000 extents to the controller. Obviously, the longer the extent list, the longer it takes to read/write the file. This is the first of the delays NTFS introduces.
Thus far we have shown that normal NTFS behavior creates logically fragmented files before any user data is written to the disk. The disk controller maps the extent information as it is received. As a result, a logically fragmented file is likely to produce a physically fragmented file. In a Microsoft Tech Ed article entitled Optimizing NTFS http://technet.microsoft.com/en-us/library/cc767961.aspx, the author notes “diligently maintaining a low level of file fragmentation on an NTFS volume is the most important way to improve volume performance”. In a virtual environment, this becomes more critical since a balance of resource usage between VMs is essential.
VMware has also weighed in on the negative effect of fragmented files in its ESX knowledge base article Verifying the Health of an Operating System where it states "Verify there is no disk fragmentation on the hard drive" http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1003956&sliceId=1&docTypeID=DT_KB_1_1&dialogID=142756735&stateId=0%200%20142758277
In the next installment, we'll examine how logical IO turns into lots of physical IO. We will also use VMware's vscsiStats utility to capture performance data and quantify the impact NTFS behavior can have on VM performance. Options to address these IO problems will also be presented.
##
And check out Part 2 to this article, here. Learn even more!
About the Author
Robert Nolan is the president and CEO of Raxco Software which is a Microsoft Gold ISV and a VMware Elite Technical Alliance Partner. He has over 25 years experience with system management software on multiple platforms. Mr. Nolan's career includes positions in software development, product management, sales and corporate management. He frequently speaks at Windows User Groups, VMUGs and other venues about virtualization performance.