
Welcome to Virtualization and Beyond
Monitoring and Data Normalization: Clarity via a Single Pane of Glass
Written by Kevin M. Sparenberg, Product Manager, SolarWinds
As technology professionals, we live in an interruption-driven
world-responding to incidents is part of the job. All our other job duties go
out the window when a new issue hits the desk. Having the right information and
understanding the part it plays in the organization is key to handling these incidents
with speed and accuracy. This is why it's critical to have the ability to compare
apples to apples when it comes to the all-important troubleshooting process.
What is our job as IT professionals?
Simply put, our job is to deliver services to end-users. It
doesn't matter if those end-users are employees, customers, local, remote, or some
combination of these. This may encompass things as simple as making sure a
network link is running without errors, a server is online and responding, a
website is handling requests, or a database is processing transactions. Of
course, for most of us, it's not a single thing-it's a combination of them. And
considering 95 percent of organizations report having migrated critical
applications and IT infrastructure to the cloud over the past year, according to
the SolarWinds IT Trends Report 2017, visibility into our
infrastructure is getting increasingly murky.
So, why does this matter? Isn't it the responsibility of
each application owner to make sure their portion of the environment is
healthy?
Yes and no. Ultimately, everyone is responsible for making
sure that the services necessary for organizational success are met. Getting
mean time to resolution (MTTR) down requires cooperation, not hostility.
Blaming any one individual or team will invariably lead to a room full of
people pointing fingers. This is counterproductive and must be avoided. There
is a better way: prevention via comprehensive
IT monitoring.
Solution silos
Monitoring solutions come in all shapes and sizes.
Furthermore, they come with all manner of targets. We can use solutions
specific to vendors or specific to infrastructure layers. A storage
administrator may use one solution while a virtualization and server
administrator may use another, and the team handling website performance may
even use a third solution. And, of course, none of these tools may be
applicable to the database administrators.
At best, monitoring infrastructure with disparate systems
can be confusing; at worst, it can be downright dangerous. Consider the simple
example of a network monitoring solution seeing traffic moving to a server at
50 megs/second, but the server monitoring solution sees incoming traffic at 400
megs/second. Which one is right? Maybe both of them, depending on if they mean
50 MBps and 400 Mbps. This is just the start of the confusion. What
happens if your virtualization monitoring tool reports in Kb/sec and your
storage solution reports in MB/sec? Also, when talking about kilos, does it
mean 1,000 or 1,024?
You can see how the complexity of analyzing disparate metrics
can very quickly grow out of hand. In the age of hybrid IT, this gets even more
complex, since cloud monitoring is inherently different than monitoring
on-premises resources.
You shouldn't have to massage the monitoring data you
receive when troubleshooting a problem, which only serves to lengthen MTTR.
Data normalization
In the past, I've worked in environments with multiple
monitoring solutions in place. During multi-team troubleshooting sessions,
we've had to handle the above calculations on the fly. Was it successful? Yes,
we were able to get the issue remedied. Was it as quick as it should have been?
No, because we were moving data into spreadsheets, trying to align timestamps,
and calculating differences in scale (MB, Mb, KB, Kb, etc.). This is what I
mean by data normalization: making sure everyone is on the same page in regards
to the time and scale.
Single pane of glass
Having everything you need in one place with the timestamps
lined up and everything reporting with the same scale-a single pane of glass
through which you see your entire environment-is critical to effective
troubleshooting. Remember, our job is to provide services to our end-users and
resolve issues as quickly as possible. If we spend the first half of our
troubleshooting time trying to line up data, are we really addressing the
problem?
##
Read more articles like this from the Virtualization and Beyond Series.
About the Author
With 20 years of IT
systems engineering and support experience across multiple environments, Kevin
M. Sparenberg currently serves as a product manager for the SolarWinds
Orion Platform Online Demo. In this role, he is responsible for defining compelling stories that
IT professionals face within the publicly accessible demos. He is a THWACK
MVP and in 2017, was awarded the title of VMware vExpert 2017.