Welcome to Virtualization and Beyond
Finding the Right Data Among the Noise
Written By Chris Paap, Technical Product
Manager, SolarWinds
In today's world, nearly everyone
is inundated with constant data streams that have to be deciphered on the fly. As
a result of all of this data, some decisions are made subconsciously, while
other data streams actively vie for our attention and require a deliberate
thought process to make a decision. Indeed, in our quest for more data to help
with the decision-making process, we have acquired an abundance-some may say
over-abundance-of data, out of which we must now determine what is important to
us and what is not, and then isolate that important data from among the noise.
While this is true across the
board, it's especially apparent in the realm of infrastructure monitoring, where
quickly isolating critical data from the noise is crucial for ensuring
infrastructure uptime, IT performance, and ultimately, business success.
Remember: a simple abundance of metrics does not equate to good infrastructure monitoring.
More often than not, it simply results in alerts being ignored and a lack of
clear insight on the actions required for remediation.
There are several keys to
making your monitoring more effective in this data-rich era:
-
Simplicity
-
Context
-
Severity
-
Correlation
Some enterprises are able to
dedicate multiple engineers just to the full-time care and attention needed for
the most effective monitoring. For others, however, there's a constant battle
between fighting fires and make progress on larger projects, all while ensuring
uptime. Having time to devote to complex monitoring solutions is simply a
luxury many do not have. This is why it's so important for monitoring and
alerting to be easy to maintain, helpful in quickly identifying issues that
require urgent attention, and proactive in assisting to avoid unplanned downtime.
In short: simplicity.
Context is
also critical for effective monitoring. This includes truly understanding what
it is you're monitoring. For example, having a piece of infrastructure always being
alerted as down, but the application-where the rubber really meets the road-is functioning
normally with optimal performance is not helpful, because the story likely
doesn't end there. You must understand whether or not that piece of
infrastructure is critical to your application. If it is, you shouldn't ignore
the alerts even if the application is currently performing fine, because it
could indicate an issue is building over time that will eventually affect your
application performance. Without this context and understanding, monitoring and
alerts aren't nearly as helpful as they should be.
Also, keep in mind that not
all applications and infrastructure components are created equal. Some have
higher priority based on role and what would be affected if they were to go
down. This needs to be reflected in your monitoring and alerting by taking severity into account. It's all about
priorities.
Lastly, any monitoring and
alerting should be able to help you and your infrastructure teams correlate events and occurrences to
identify root cause. After all, root cause identification should always be your
goal; anything else is just a Band-Aid®. For
example, being able to identify a performance issue as the result of a faulty
upstream switch and not because of the back-end storage can mean greatly
improved mean time to resolution.
At the end of the day, what's
most important is aligning your monitoring and alerting with your business's
priorities. This can be overwhelming, so simplicity, context, severity, and correlation
should be your monitoring and alerting objectives.
##
Read more articles like this from the Virtualization and Beyond Series.
About the Author
With 14 years of IT systems engineering experience
across multiple corporate environments, Chris Paap currently serves as a
technical product manager for hybrid
IT performance management software provider SolarWinds, where he
focuses specifically on the award-winning SolarWinds®
Virtualization Manager. In this role, he is responsible for defining the product
roadmap and identifying new key features to solve IT problems.