By Val King
A stated goal in the VMware Horizon documentation is "to
provide an excellent out-of-the-box configuration for most local area network
(LAN) use cases, negating the need to perform complex tuning or learn hundreds
of policy settings." VMware notes that "some
use cases and situations require additional tuning, especially for wide area
networks (WANs)."
There is almost no scenario where tuning Blast Extreme does
not lead to a better experience and if you want to deliver the closest thing
possible to a PC experience. Some tuning should be in your future, or you are
doing yourself, your end-users and the company relying on VDI to deliver work, a
disservice.
There are hundreds of Blast Extreme settings that can be
manipulated in the effort to improve VDI performance or that optimize resource
utilization. Today I will cover the very
basics. If you want to know how to optimize every setting, contact us and we
can work through them all with you, explaining what each does.
The number one problem we see with VMware Horizon or Citrix
Virtual Apps and Desktops (CVAD) environments is environment sizing. Forty
percent (40%) of the time environments are undersized, robbing end-users of the
resources they need to do their jobs effectively, creating an immediate,
negative end-user experience and capping end-user productivity by not taking
advantage of their ability to work faster than their assigned VDI desktops.
CPU is normally the bottleneck in modern VDI infrastructures
limiting the user density of any given host in a VMware VDI environment. Pay
careful attention to this and avoid two of the most common mistakes.
CPU utilization data is presented as an average. Most companies do not utilize the VDI
environment 24/7 with multiple shifts. Average
CPU utilization is always going to be lower than what is happening in reality during
regular business hours, often by a lot.
Some applications, like Microsoft Excel, or most specialized
applications for the Architecture, Energy, Construction, Media, Entertainment
or Energy industries are single-threaded, meaning they can only utilize one CPU
core to its fullest and will ignore the rest, no matter how many are assigned. Some CPUs have their own tricks to help
mitigate this problem to an extent. Without understanding how these
applications utilize CPU, it is easy to look at CPU performance and think there
are loads of resources available, when the reality is there are not.
Monitors with higher screen resolutions, and multi-monitor
use cases require more system resources, especially CPU and network bandwidth,
to operate effectively. A 4k monitor, or
multiple monitors with a combined resolution reaching 4k (3840 x 2160) can take
approximately 6 times the resources as a standard HD 1080p resolution of 1920 x
1080. 4k resolution means 8,000,000 more
pixels will need to be delivered, or approximately four times as many as
running HD resolution.
Use the lowest monitor resolution possible that still meets
the end-user use case to find your optimal balance.
Windows operating systems are not optimized for VDI out of
the box. VMware has developed a free tool to optimize Windows desktops and
servers. This tool takes a very high-level pass at turning off everything that
only makes sense in physical server environments and tries to tune performance
settings for VM environments. This tool can get you part of the way there, but
manual optimization is needed if you want to squeeze maximum performance out of
either a VMware or Citrix VDI environment.
VMware provides some basic registry settings in the
documentation to get you started on the road to VDI environment optimization
that provide a good place to start your optimization journey.
Tuning Blast Extreme
Most VDI Implementations consist of at least two different
environments that must be considered separately to present the best
user-experience to all end-users. The decisions made for Local Area Networks,
where the VDI environment is essentially local to population using it, should
be different that those made for Wide Area Networks, where end-users are
relying on the Internet or WAN circuits to connect to the VDI environment from
a remote office, branch or work from home setting.
For the purposes of tuning Blast Extreme over a Wide Area
Network, overall link speed, latency and packet loss are the three primary
areas to adjust in the Blast Extreme protocol.
VMware provides some basic recommendations to get you
started. Keep in mind that your WAN circuit cost can be a significant ongoing
expense, so reducing the bandwidth required to deliver a good end-user
experience can create incremental savings. Keep in mind, however, that if a
reduction in bandwidth comes at the expense of capping the productivity of
employees you could be spending a dollar to save a penny.
The Blast Codec, when used with non-multimedia workloads, consumes
the least bandwidth compared to all other codec options in the VMware arsenal.
If the testing the Blast Codec in your environment is not successful, VMware
directs you to use JPG/PNG.
With Horizon 8 (2111) VMware added the ability to adjust the
size of the Blast codec cache. Set to a default 256MB, VMware's testing
revealed a decrease in network bandwidth utilization. VMware has achieved a 12%
reduction when the decoder cache was doubled to 512MB. VMware is quick to point
out that individual VDI environment results will vary depending on the
application workloads and usage patterns.
While I rarely find it practical to tell an end-user how to
best use the VDI environments we provide, VMware testing proves that
multi-media content (i.e., YouTube) viewed in a window vs. full screen can
present significant resource savings. In testing with a single 4K display,
VMware notes that viewing a typical YouTube video in standard windowed mode
used 53% less bandwidth and 23% less virtual desktop CPU than watching the same
video full screen.
For Microsoft Teams, Zoom, GoToMeeting, and other streaming
video content use cases, the H.264 codec is preferred, but this is not true for
multi-media or other forms of content. The encoder switch built into Blast
Extreme provides the ability to dynamically switch between the Blast Codec, JPG/PNG,
and the H.264 codec, leveraging the most efficient codec for the content being
delivered, dynamically.
Do not use client-drive redirection unless absolutely
required. Set FileTransferState to 0 to turn off client-drive redirection.
Use Blast Extreme clipboard settings to reduce or block
using the clipboard. Set ClipboardState to 0 to turn off clipboard support.
Turn off audio unless absolutely required. Set AudioEnabled
to 0 to turn off audio support.
Pictures of the kids or the family pet may be great for personalization,
but the trade-off is that this will consume more CPU and require more bandwidth
to deliver than providing a simple black background.
Use Adobe Flash redirection to direct the client to download
and execute Flash content locally instead of rendering it in the virtual
desktop and sending it across the WAN.
Modern browsers, like Chrome, Edge and Firefox, are commonly
near the top of the list of applications consuming the most resources. People are often surprised that the lowly browser
can have such high resource consumption. It is, however, a fact.
HTML5 multimedia redirection will transfer this screen
content to the client as HTML5 code, offering improved efficiency over traffic
delivered by display protocol. According to VMware, HTML5 multimedia
redirection can reduce desktop and per-user RDSH server CPU utilization by up
to 60 percent and per-user session bandwidth by up to 80 percent. However, the benefits achieved at the server
level can have a significant impact on the client.
Client CPU utilization can increase by up to 200 percent for
the duration of the redirection (from an average of 8 to 24 percent on a sample
test system according to VMware). It also causes some screen content to
letterbox, which may impair user experience.
VMware's own testing reveals that limiting frame rate
provides little to no reduction in bandwidth or CPU utilization for typical
applications and use cases. Quoting VMware, "Typical Microsoft Office use, for
example, results in a very low display protocol frame rate. And limiting frame
rate for multimedia use cases such as streaming video simply impairs playback
quality and user experience. It is better to leverage HTML5 multimedia
redirection to optimize such use cases."
Leverage Network QoS (Quality of Service) on your Cisco,
Meraki, Ubiquiti, etc. network infrastructure to prioritize Blast Extreme
traffic above general traffic and below the most time-sensitive form of traffic
in any environment, voice. In short, we don't want lower priority traffic, like
a print job for example, taking priority over delivering an optimal VDI desktop
experience.
VMware recommends configuring QoS to prioritize Blast
Extreme one level below Voice over IP traffic, commonly the highest prioritized
application. This is typically achieved using a Differentiated Services Code
Point (DSCP) marking of AF41.
However, if the network also supports interactive video,
Blast Extreme is often marked one-level lower with a DSCP marking of AF31.
If the H.264 codec is used, set H264maxQP (the lowest
starting H.264 quality) to 28. This will force H.264 to start at higher quality
and prevent it from expending bandwidth to send initial low-quality screens.
VMware Horizon, Blast Extreme and NVIDIA virtual GPU
It is increasingly common to see NVIDIA virtual GPU
incorporated into VDI environments to improve multi-media and streaming content
performance previously mentioned. At Whitehat 66% of new VDI environments being
deployed or that are going through a hardware refresh are incorporating NVIDIA
vGPU to meet new use cases or generally improve the end-user experience.
Without NVIDIA vGPU, the CPU must manage all processing,
typically distributing the work across 18 to 64 CPU cores, where an NVIDIA A16 vGPU
card has 5,120 CUDA cores, plus Tensor cores and RT cores to optimize the
visual experience.
NVIDIA CUDA cores, RT cores and Tensor
cores explained:
-
CUDA cores are what we are traditionally using
when we think of a graphics card.
-
RT cores, or ray tracing cores, rapidly
calculate the effects of light rays bouncing around a scene in real time. Why
would the normal person care? Because this takes flat scenes and makes them
more real to our eyes.
-
Tensor cores used with NVIDIA's DLSS (Deep
Learning Super Sampling), for example, can upscale lower-resolution images to
higher resolution ones with stunning results. Leveraging these cores, the GPU
doesn't work as hard creating the low-res image while the tensor cores shine it
up a bit before sending it to the monitor.
Tensor cores can rapidly denoise an image,
to give you a high-res, pristine picture.
RTX Voice, as another example of Tensor cores
in action, removes background noise from live audio feeds.
NVIDIA GPUs support H.264 and High Efficiency Video Coding
(HEVC) that can substantially increase session bandwidth. This is due to the
much higher graphical quality this hardware-enabled configuration provides.
Keep in mind that this may complicate WAN use cases.
Offload H.264 and High Efficiency Video Coding (HEVC) encoding
from the ESXi hosts.
Introduced in Horizon 8 (2106): Enable support for High
Dynamic Range (HDR) color
Support full-motion video at 4K display resolution or above
without HTML5 redirection.
Enable build-to-lossless mode if you are supporting use cases
such as non-diagnostic medical imaging, which requires the display to be
transferred without a loss in quality. Note that this increases bandwidth and
virtual desktop CPU utilization.
Enable High Color Accuracy (HCA) for H.264 if supporting an
H.264 preferred use case that has exhibited display fuzziness, lack of font or
image sharpness, or problems with color reproduction.
Introduced in Horizon 8 (2106): Leverage High Efficiency
Video Coding (HEVC) with High Dynamic Range (HDR) encoding to provide higher
graphical quality with improved color range and contrast. This configuration is
ideal for digital photography, design, and video production.
Additional Optimizations for High-End Multimedia and Video
Gaming
VMware recommends using NVIDIA Tesla or newer GPUs.
Increase virtual desktop resources. More than 8 virtual CPUs
might be required to support the most demanding use cases, especially video
gaming, even with NVIDIA hardware GPUs.
Increase the frame rate. By default, Blast Extreme is capped
at 30 frames per second (FPS). You can increase the rate, up to 60 FPS, by
using the Windows Registry setting EncoderMaxFPS.
First available in 2017, VMware's Blast Extreme is one of
the newer VDI protocols on the market and one that has certainly come a long
way from its humble beginnings after VMware decided to part company with
Teradici and discontinuing licensing future versions of the PCoIP protocol.
Even with 40 optimizations listed here, this list is only
scratching the surface of the 1,500+ optimizations we have cataloged over the
years building and managing VDI environments.
For the best VDI performance, you will need to tune or
optimize everything from hardware BIOs through common the most common VDI
performance bottlenecks, Group Policy, SQL and storage through to the endpoint
your end-users will be engaging with in their efforts to be productive and get
their jobs done.
In ten years, we have yet to find a VDI environment that is at
its maximum potential. Significant increases in performance occur in even the
crustiest VDI environments.
VDI should unchain your employees from their desks and give
them the freedom to do work anywhere it makes sense for the company. VDI, when
properly resourced, is empowering, a strong recruiting tool, a tool to improve
security, protect intellectual property and find and hire the best talent for
the company anywhere. Not just within the commuting distance to a company
office.
VDI should be fast, efficient and an effective tool to
enable the companies that deploy it to be more agile in decision making because
of the innate flexibility baked into the model.
If this sounds more like a dream
than what you experience day-to-day, it does not have to be that way. We are
here to advise, help, build, manage if necessary, and make VDI deliver the
right end-user experience to work to their maximum capability, and not an arbitrary
limit on productivity set by an average to underperforming VDI environment.
##
ABOUT THE AUTHOR
Val is the CEO of Whitehat Virtual
Technologies and
responsible for day-to-day-operations, as well as leading the company's product
development and technology strategy. Val has 20 years of experience in
technology, compliance, and security in regulated industries, particularly in
financial services and healthcare. Currently Val serves in a dual role as CIO
for a regional healthcare system.