As the ongoing saga of security vulnerabilities that has become known as Spectre takes place, VMblog spoke with Satinder Sharma, Senior Systems Engineer at Tintri
to learn more about these challenges and the key takeaways from these vulnerabilities and how it affects traditional HCI solutions.
VMblog: Many people are talking about it. So let's kick things off by asking, what is
Spectre?
Satinder Sharma: In brief, Spectre is a security
‘feature' that creates vulnerabilities in affected processors. Since we're
talking about Intel's processors, their reach means the issue is quite
widespread. The vulnerability allows rogue processes to access unprivileged
information and kernel memory. The ‘feature' can be patched, but these patches
can have a negative effect on processor performance.
VMblog: What are the takeaways from these vulnerabilities?
Sharma: One key takeaway is the pain of
unpredictable performance. Lack of predictability is never a concern for Tintri
customers because we isolate storage from compute, and isolate every individual
application. We're the only system that can provide per-VM quality of service
(QoS) to guarantee performance resources for every VM. Per-VM QoS can be set
manually or handled autonomously.
The second takeaway is the risk
of hyperconverged infrastructures (HCI). Sure, they promised web scale, but the
reality is that with mixed workloads, beyond a few nodes all sorts of problems
surface. That includes the ‘best practice' of balancing nodes, the challenge of
moving data across nodes and now the danger of shared systems.
These risks are exposed by the
Intel bug, but they're certainly not new. The patching process outlined above
will have to be repeated for any widespread virus or vulnerability.
VMblog: What are the main issues with HCI solutions?
Sharma: It seems as if every few years
there's a new infrastructure approach that promises to revolutionize the
enterprise data center. We know from experience that many of these trends don't
live up to the initial hype and some even end up taking IT operations in the
wrong direction. So it is with conventional HCI today. Despite claims to the
contrary, conventional HCI can increase deployment costs in a number of ways:
requirement for balanced nodes, increased software licensing costs and
increased storage costs. Conventional
HCI architectures lag behind best-of-breed external storage systems-both in
hybrid flash and all-flash configurations-in terms of latency, IOPS (especially
with all-flash) and predictability. If you're due for an infrastructure
refresh, look carefully at performance before deploying HCI.
VMblog: What happened
to My Flash Storage Capacity?
Sharma: Many conventional HCI adopters have been surprised by how little
usable storage capacity they end up with. With two or three copies of every
data block, capacity gets consumed quickly. Additional sources of overhead also
exist in many of those HCI implementations:
- It is recommended to keep used capacity below 70% to avoid rebalancing
that adds performance overhead.
- For disk firmware upgrades, additional free space may be needed in the
cluster equivalent to the used capacity of the largest disk group.
- Additional node(s) with disks/SSDs may be needed as spares. IT users
may need to have one-to-two additional nodes with full storage capacity.
All of this
can drop the usable capacity in a conventional HCI environment below 30%.
That's a lot of wasted resources and spend, especially for all-flash
configurations. Flash is still an expensive resource at scale. To minimize the
impact, the tendency is to opt for just two copies of data to save capacity and
increase usable space, but that can spell disaster if a component fails during
maintenance on another node.
##