By Frank Reno, Sr. Product Manager, Humio, a CrowdStrike company
DevOps workflows and practices drive the
success and efficiency of most applications today. For DevOps teams to do their
jobs well, organizations need key metrics that bring insights into application
health and performance at each stage of the DevOps
process: planning, coding,
testing, building, deployment and monitoring. This information enables the
delivery of reliable, predictable and secure applications.
The key is to make sure that the necessary
information is easy to interpret and constantly available to DevOps engineers.
To get there, there are a few requirements:
- Data collection must be automated
and performed in real time.
- Ingested data must go through
normalization for ease of parsing.
- Engineers must have easy access to
intuitive reporting.
After all, if you can understand what's
happening under the hood, you can quickly and effectively automate,
troubleshoot, and optimize workflows. For example, you may see the trend of a
database query running slowly based on the telemetry. You can then use it with
different related information to identify a suboptimal database design. Then,
you can optimize the query and deploy it in an automated way.
Here, the observable data is presented as part
of the monitoring stage of the DevOps process. Finding the suboptimal database
design is part of the monitoring stage; choosing to fix it is part of the
planning stage while creating the optimized query is part of the coding stage.
Finally, there's the deployment stage, which includes automated provisioning of
the server that runs the query. In each stage, further data can indicate
whether the step was successful.
This article will focus on the relationship
between DevOps lifecycle effectiveness and observability. What we'll see is
that improvements to DevOps processes and workflows can only happen when you
start with the hard data that comes through observability. We'll consider the
benefits that observability through log management can bring to DevOps as well
as how these affect the requirements for an effective log management solution.
What Is Observability, and
How Does It Relate to DevOps?
Let's begin by revising our definition of observability:
Observability is when you're able to understand the internal state of a
system from the data it provides, and you can explore that data to answer any
question about what happened and why.
In IT, observability describes a methodology
for efficient detection and diagnosis of operational, performance and security
issues. This requires gaining near-real-time visibility into all the layers of
a system.
While observability uses system logs, traces,
events and metrics, we'll focus primarily on logs to keep things simple. The
reason for this simplistic approach is that logs are a prime source of
information for both sides of DevOps - development and operations - when
monitoring and troubleshooting. Logs also provide the foundation for tracing
and metrics and are simply foundational to each aspect of observability.
For example, let us take the classic example
of web server logs. These logs, when properly grouped and plotted over time,
provide insight into the performance and health of our web services. If you see
the proportion of error responses increase significantly, or the number of
successful responses drop suddenly, you can infer there is an anomalous
situation. These same logs are going to allow us to start identifying that
issue as you explore what dimensions are relevant to the trend.
This is a fine example of one use of logs, but
you may be asking yourself: How does "observability through logs" relate to
DevOps practices, processes and teams?
First, a development team may have different
functions and roles contributing to different DevOps lifecycle stages. For
example, a developer might write a code patch for our suboptimal query. But how
does she know this patch actually improves performance, or that the unit tests
and deployment were successful?
Logs: feedback at every stage
of DevOps
It all
comes back to data. Without direct feedback, it's
difficult to ensure you haven't missed anything. Data - in this case, logs -
can help individuals and teams in each stage of the DevOps lifecycle.
Development
Developers write, test and commit their code
in the development stage. Code
instrumentation messages, exceptions, unhandled exception traces, custom code
library events, and compiler error logs can all provide valuable feedback in this
stage. The result is better quality code. For example, a developer can craft
custom log messages from their Java application using one of the Java-logging
APIs. These can help them troubleshoot issues before they deploy code into
production.
Testing
The testing
stage involves various disciplines like unit, integration and system
testing, all of which check code for proper functionality. Performance testing
measures the code's execution time or throughput against set thresholds.
All these tests can generate large volumes of
metrics, logs and traces. Involving multiple systems makes it even more
complex. However, when an observability solution collects and correlates this
data, and then shows meaningful results, the testing stage becomes easier. For
example, a side-by-side comparison of failed tests in the development
environment and the user acceptance testing (UAT) environment may reveal the
code in UAT is not the latest or perhaps that other parameters have changed.
Deployment
The deployment
stage releases the code into production. DevOps engineering typically
implements this with automated, continuous deployment pipelines. This is where
you provision new infrastructure stacks, update configurations and push
application code. Every tool in this space (such as Terraform, Jenkins or
Ansible) will have its own logs. Naturally, receiving comprehensive real-time
logs from deployment pipelines will facilitate the troubleshooting of any
problems.
Monitoring
Once your application is deployed, the DevOps
lifecycle enters the monitoring stage.
In this stage, you'll want to know how each component of your solution is
performing. The volume of log streams from different sources - whether those
are APIs, service meshes, SIEM systems or others- will significantly increase
in the monitoring stage. And once again, you will need to collect, parse and
analyze this data to gain meaningful insights.
Why Do You Need a Log
Management Solution for DevOps Observability?
A log management solution can be a core
observability component for DevOps processes and teams. Here are a few reasons
why.
Proactive issue remediation
You can configure a logging solution to detect
errors, warnings and anomalies in real time and warn appropriate teams through
alerts. This means DevOps teams can proactively address problems before they
become incidents. In this way, you can spare developers from needing to
suppress many production issues. Log management solutions sometimes go one step
further, taking proactive steps like creating service management tickets or
executing the predefined runbook.
Context for issue discovery,
insights for capacity planning
In many cases, DevOps doesn't' have to depend
on developers if they have relevant information available. For example, if the
charts show that the number of 5XX errors isn't constant, or at least slowly
increasing, this indicates a problem. Many log management solutions add
contextual information to alerts, showing historical trends and comparisons
over time. Automatic correlation between different log sources can also unearth
otherwise inconspicuous problems. Dashboards and charts give a bird's-eye view
of the overall health of an application. These advanced features can also help
plan for future capacity.
Improved security practices
Finally, a modern log management solution
helps foster security in DevOps practices and lets DevOps and security teams
work collaboratively. Logs ingested from across your security toolchain can
help identify (and therefore secure against) potential threats. Such insights allow
a DevOps team to bake security into its practices. For example, a Node.js
application can pass all the deployment pipeline tests but fail a security test
if one of its NPM packages contains vulnerabilities. Such preventive measures
can be the result of the DevOps teams regularly dealing with vulnerability
messages.
Now, let's say the ITOps or SecOps teams still
see a new kind of vulnerability warning from a scanner log. This zero-day
vulnerability hasn't been tested in the deployment pipeline, so the DevOps team
is unaware of it. That's when ITOps can contact DevOps to check their
deployment history. Based on the deployment logs, the DevOps team can correlate
the latest warnings to a deployment made a week ago. They can now roll back the
change and work with ITOps to ensure the warnings are gone, and the system is
functioning correctly.
What Capabilities Should a
Log Management Solution Have?
Handling the volume and
velocity of data from today's systems
Data from even a moderately sized IT environment
can be massive, so the platform must be able to handle both the volume and
velocity of the data and efficiently store them. It needs the ability to filter
out the data you don't want. Whatever information it contains, you should
easily be able to search for events, create and save complex queries, analyze
and compare values and correlate trends. The tool should also allow you to
create graphical representations of your analysis in charts and dashboards,
assign thresholds for field values you want to monitor, and integrate with your
team's communication channels like Slack, Teams, SMS, email or PagerDuty.
Seamless integration
To choose a great log management platform, you
should consider its capability to integrate seamlessly with most (if not all)
of your infrastructure, network and application resources. This means the
platform should natively capture data coming from your web servers, databases,
firewalls, third-party tools, operating systems, networking devices, etc.
Other advanced options include proactive
measures like running remediation steps, automatic anomaly and threat
detection, and predictive modeling.
Conclusion
We've looked at observability, its components,
and how it helps DevOps practices, lifecycles, and teams. You've also seen how
logs, a crucial part of observability, can make typical DevOps workflows
simpler and more efficient. A log management system is thus a necessary tool in
the DevOps arsenal.
##
About
the Author
Frank
Reno is a Senior Product Manager for Humio, a CrowdStrike company, where he is
focused on all things DevOps and Observability. He is a Kubernetes and Open
Telemetry enthusiast with a passion for building products that bring
Observability to everyone.