Virtualization Technology News and Information
Article
RSS
40 Tips for Optimizing VMware Horizon VDI's Blast Extreme Protocol

 

By Val King

A stated goal in the VMware Horizon documentation is "to provide an excellent out-of-the-box configuration for most local area network (LAN) use cases, negating the need to perform complex tuning or learn hundreds of policy settings."  VMware notes that "some use cases and situations require additional tuning, especially for wide area networks (WANs)."

There is almost no scenario where tuning Blast Extreme does not lead to a better experience and if you want to deliver the closest thing possible to a PC experience. Some tuning should be in your future, or you are doing yourself, your end-users and the company relying on VDI to deliver work, a disservice.

There are hundreds of Blast Extreme settings that can be manipulated in the effort to improve VDI performance or that optimize resource utilization.  Today I will cover the very basics. If you want to know how to optimize every setting, contact us and we can work through them all with you, explaining what each does. 

The number one problem we see with VMware Horizon or Citrix Virtual Apps and Desktops (CVAD) environments is environment sizing. Forty percent (40%) of the time environments are undersized, robbing end-users of the resources they need to do their jobs effectively, creating an immediate, negative end-user experience and capping end-user productivity by not taking advantage of their ability to work faster than their assigned VDI desktops.

CPU is normally the bottleneck in modern VDI infrastructures limiting the user density of any given host in a VMware VDI environment. Pay careful attention to this and avoid two of the most common mistakes.

CPU utilization data is presented as an average.  Most companies do not utilize the VDI environment 24/7 with multiple shifts.  Average CPU utilization is always going to be lower than what is happening in reality during regular business hours, often by a lot.

Some applications, like Microsoft Excel, or most specialized applications for the Architecture, Energy, Construction, Media, Entertainment or Energy industries are single-threaded, meaning they can only utilize one CPU core to its fullest and will ignore the rest, no matter how many are assigned.  Some CPUs have their own tricks to help mitigate this problem to an extent. Without understanding how these applications utilize CPU, it is easy to look at CPU performance and think there are loads of resources available, when the reality is there are not.

Monitors with higher screen resolutions, and multi-monitor use cases require more system resources, especially CPU and network bandwidth, to operate effectively.  A 4k monitor, or multiple monitors with a combined resolution reaching 4k (3840 x 2160) can take approximately 6 times the resources as a standard HD 1080p resolution of 1920 x 1080.  4k resolution means 8,000,000 more pixels will need to be delivered, or approximately four times as many as running HD resolution.

Use the lowest monitor resolution possible that still meets the end-user use case to find your optimal balance.

Windows operating systems are not optimized for VDI out of the box. VMware has developed a free tool to optimize Windows desktops and servers. This tool takes a very high-level pass at turning off everything that only makes sense in physical server environments and tries to tune performance settings for VM environments. This tool can get you part of the way there, but manual optimization is needed if you want to squeeze maximum performance out of either a VMware or Citrix VDI environment.

VMware provides some basic registry settings in the documentation to get you started on the road to VDI environment optimization that provide a good place to start your optimization journey.

 

Tuning Blast Extreme

Most VDI Implementations consist of at least two different environments that must be considered separately to present the best user-experience to all end-users. The decisions made for Local Area Networks, where the VDI environment is essentially local to population using it, should be different that those made for Wide Area Networks, where end-users are relying on the Internet or WAN circuits to connect to the VDI environment from a remote office, branch or work from home setting.

For the purposes of tuning Blast Extreme over a Wide Area Network, overall link speed, latency and packet loss are the three primary areas to adjust in the Blast Extreme protocol.

VMware provides some basic recommendations to get you started. Keep in mind that your WAN circuit cost can be a significant ongoing expense, so reducing the bandwidth required to deliver a good end-user experience can create incremental savings. Keep in mind, however, that if a reduction in bandwidth comes at the expense of capping the productivity of employees you could be spending a dollar to save a penny.

The Blast Codec, when used with non-multimedia workloads, consumes the least bandwidth compared to all other codec options in the VMware arsenal. If the testing the Blast Codec in your environment is not successful, VMware directs you to use JPG/PNG.

With Horizon 8 (2111) VMware added the ability to adjust the size of the Blast codec cache. Set to a default 256MB, VMware's testing revealed a decrease in network bandwidth utilization. VMware has achieved a 12% reduction when the decoder cache was doubled to 512MB. VMware is quick to point out that individual VDI environment results will vary depending on the application workloads and usage patterns.

While I rarely find it practical to tell an end-user how to best use the VDI environments we provide, VMware testing proves that multi-media content (i.e., YouTube) viewed in a window vs. full screen can present significant resource savings. In testing with a single 4K display, VMware notes that viewing a typical YouTube video in standard windowed mode used 53% less bandwidth and 23% less virtual desktop CPU than watching the same video full screen.

For Microsoft Teams, Zoom, GoToMeeting, and other streaming video content use cases, the H.264 codec is preferred, but this is not true for multi-media or other forms of content. The encoder switch built into Blast Extreme provides the ability to dynamically switch between the Blast Codec, JPG/PNG, and the H.264 codec, leveraging the most efficient codec for the content being delivered, dynamically.

Do not use client-drive redirection unless absolutely required. Set FileTransferState to 0 to turn off client-drive redirection.

Use Blast Extreme clipboard settings to reduce or block using the clipboard. Set ClipboardState to 0 to turn off clipboard support.

Turn off audio unless absolutely required. Set AudioEnabled to 0 to turn off audio support.

Pictures of the kids or the family pet may be great for personalization, but the trade-off is that this will consume more CPU and require more bandwidth to deliver than providing a simple black background.

Use Adobe Flash redirection to direct the client to download and execute Flash content locally instead of rendering it in the virtual desktop and sending it across the WAN.

Modern browsers, like Chrome, Edge and Firefox, are commonly near the top of the list of applications consuming the most resources.  People are often surprised that the lowly browser can have such high resource consumption. It is, however, a fact.

HTML5 multimedia redirection will transfer this screen content to the client as HTML5 code, offering improved efficiency over traffic delivered by display protocol. According to VMware, HTML5 multimedia redirection can reduce desktop and per-user RDSH server CPU utilization by up to 60 percent and per-user session bandwidth by up to 80 percent.  However, the benefits achieved at the server level can have a significant impact on the client.

Client CPU utilization can increase by up to 200 percent for the duration of the redirection (from an average of 8 to 24 percent on a sample test system according to VMware). It also causes some screen content to letterbox, which may impair user experience.

VMware's own testing reveals that limiting frame rate provides little to no reduction in bandwidth or CPU utilization for typical applications and use cases. Quoting VMware, "Typical Microsoft Office use, for example, results in a very low display protocol frame rate. And limiting frame rate for multimedia use cases such as streaming video simply impairs playback quality and user experience. It is better to leverage HTML5 multimedia redirection to optimize such use cases."

Leverage Network QoS (Quality of Service) on your Cisco, Meraki, Ubiquiti, etc. network infrastructure to prioritize Blast Extreme traffic above general traffic and below the most time-sensitive form of traffic in any environment, voice. In short, we don't want lower priority traffic, like a print job for example, taking priority over delivering an optimal VDI desktop experience.

VMware recommends configuring QoS to prioritize Blast Extreme one level below Voice over IP traffic, commonly the highest prioritized application. This is typically achieved using a Differentiated Services Code Point (DSCP) marking of AF41.

However, if the network also supports interactive video, Blast Extreme is often marked one-level lower with a DSCP marking of AF31.

If the H.264 codec is used, set H264maxQP (the lowest starting H.264 quality) to 28. This will force H.264 to start at higher quality and prevent it from expending bandwidth to send initial low-quality screens.

VMware Horizon, Blast Extreme and NVIDIA virtual GPU

It is increasingly common to see NVIDIA virtual GPU incorporated into VDI environments to improve multi-media and streaming content performance previously mentioned. At Whitehat 66% of new VDI environments being deployed or that are going through a hardware refresh are incorporating NVIDIA vGPU to meet new use cases or generally improve the end-user experience.

Without NVIDIA vGPU, the CPU must manage all processing, typically distributing the work across 18 to 64 CPU cores, where an NVIDIA A16 vGPU card has 5,120 CUDA cores, plus Tensor cores and RT cores to optimize the visual experience.

NVIDIA CUDA cores, RT cores and Tensor cores explained:

  • CUDA cores are what we are traditionally using when we think of a graphics card.
  • RT cores, or ray tracing cores, rapidly calculate the effects of light rays bouncing around a scene in real time. Why would the normal person care? Because this takes flat scenes and makes them more real to our eyes.
  • Tensor cores used with NVIDIA's DLSS (Deep Learning Super Sampling), for example, can upscale lower-resolution images to higher resolution ones with stunning results. Leveraging these cores, the GPU doesn't work as hard creating the low-res image while the tensor cores shine it up a bit before sending it to the monitor.

Tensor cores can rapidly denoise an image, to give you a high-res, pristine picture.

RTX Voice, as another example of Tensor cores in action, removes background noise from live audio feeds.

NVIDIA GPUs support H.264 and High Efficiency Video Coding (HEVC) that can substantially increase session bandwidth. This is due to the much higher graphical quality this hardware-enabled configuration provides. Keep in mind that this may complicate WAN use cases.

Offload H.264 and High Efficiency Video Coding (HEVC) encoding from the ESXi hosts.

Introduced in Horizon 8 (2106): Enable support for High Dynamic Range (HDR) color

Support full-motion video at 4K display resolution or above without HTML5 redirection.

Enable build-to-lossless mode if you are supporting use cases such as non-diagnostic medical imaging, which requires the display to be transferred without a loss in quality. Note that this increases bandwidth and virtual desktop CPU utilization.

Enable High Color Accuracy (HCA) for H.264 if supporting an H.264 preferred use case that has exhibited display fuzziness, lack of font or image sharpness, or problems with color reproduction.

Introduced in Horizon 8 (2106): Leverage High Efficiency Video Coding (HEVC) with High Dynamic Range (HDR) encoding to provide higher graphical quality with improved color range and contrast. This configuration is ideal for digital photography, design, and video production.

Additional Optimizations for High-End Multimedia and Video Gaming

VMware recommends using NVIDIA Tesla or newer GPUs.

Increase virtual desktop resources. More than 8 virtual CPUs might be required to support the most demanding use cases, especially video gaming, even with NVIDIA hardware GPUs.

Increase the frame rate. By default, Blast Extreme is capped at 30 frames per second (FPS). You can increase the rate, up to 60 FPS, by using the Windows Registry setting EncoderMaxFPS.

 

First available in 2017, VMware's Blast Extreme is one of the newer VDI protocols on the market and one that has certainly come a long way from its humble beginnings after VMware decided to part company with Teradici and discontinuing licensing future versions of the PCoIP protocol.

Even with 40 optimizations listed here, this list is only scratching the surface of the 1,500+ optimizations we have cataloged over the years building and managing VDI environments.

For the best VDI performance, you will need to tune or optimize everything from hardware BIOs through common the most common VDI performance bottlenecks, Group Policy, SQL and storage through to the endpoint your end-users will be engaging with in their efforts to be productive and get their jobs done.

In ten years, we have yet to find a VDI environment that is at its maximum potential. Significant increases in performance occur in even the crustiest VDI environments.

VDI should unchain your employees from their desks and give them the freedom to do work anywhere it makes sense for the company. VDI, when properly resourced, is empowering, a strong recruiting tool, a tool to improve security, protect intellectual property and find and hire the best talent for the company anywhere. Not just within the commuting distance to a company office.

VDI should be fast, efficient and an effective tool to enable the companies that deploy it to be more agile in decision making because of the innate flexibility baked into the model.

If this sounds more like a dream than what you experience day-to-day, it does not have to be that way. We are here to advise, help, build, manage if necessary, and make VDI deliver the right end-user experience to work to their maximum capability, and not an arbitrary limit on productivity set by an average to underperforming VDI environment.

##

ABOUT THE AUTHOR

val-king 

Val is the CEO of Whitehat Virtual Technologies and responsible for day-to-day-operations, as well as leading the company's product development and technology strategy. Val has 20 years of experience in technology, compliance, and security in regulated industries, particularly in financial services and healthcare. Currently Val serves in a dual role as CIO for a regional healthcare system.

Published Tuesday, May 10, 2022 7:42 AM by David Marshall
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<May 2022>
SuMoTuWeThFrSa
24252627282930
1234567
891011121314
15161718192021
22232425262728
2930311234