Virtualization Technology News and Information
Article
RSS
Why Observability Has Become A DevOps Must-Have

By Frank Reno, Sr. Product Manager, Humio, a CrowdStrike company

DevOps workflows and practices drive the success and efficiency of most applications today. For DevOps teams to do their jobs well, organizations need key metrics that bring insights into application health and performance at each stage of the DevOps process:  planning, coding, testing, building, deployment and monitoring. This information enables the delivery of reliable, predictable and secure applications.

The key is to make sure that the necessary information is easy to interpret and constantly available to DevOps engineers. To get there, there are a few requirements:

  • Data collection must be automated and performed in real time.
  • Ingested data must go through normalization for ease of parsing.
  • Engineers must have easy access to intuitive reporting.

After all, if you can understand what's happening under the hood, you can quickly and effectively automate, troubleshoot, and optimize workflows. For example, you may see the trend of a database query running slowly based on the telemetry. You can then use it with different related information to identify a suboptimal database design. Then, you can optimize the query and deploy it in an automated way.

Here, the observable data is presented as part of the monitoring stage of the DevOps process. Finding the suboptimal database design is part of the monitoring stage; choosing to fix it is part of the planning stage while creating the optimized query is part of the coding stage. Finally, there's the deployment stage, which includes automated provisioning of the server that runs the query. In each stage, further data can indicate whether the step was successful.

This article will focus on the relationship between DevOps lifecycle effectiveness and observability. What we'll see is that improvements to DevOps processes and workflows can only happen when you start with the hard data that comes through observability. We'll consider the benefits that observability through log management can bring to DevOps as well as how these affect the requirements for an effective log management solution.

What Is Observability, and How Does It Relate to DevOps?

Let's begin by revising our definition of observability:

Observability is when you're able to understand the internal state of a system from the data it provides, and you can explore that data to answer any question about what happened and why.

In IT, observability describes a methodology for efficient detection and diagnosis of operational, performance and security issues. This requires gaining near-real-time visibility into all the layers of a system.

While observability uses system logs, traces, events and metrics, we'll focus primarily on logs to keep things simple. The reason for this simplistic approach is that logs are a prime source of information for both sides of DevOps - development and operations - when monitoring and troubleshooting. Logs also provide the foundation for tracing and metrics and are simply foundational to each aspect of observability.

For example, let us take the classic example of web server logs. These logs, when properly grouped and plotted over time, provide insight into the performance and health of our web services. If you see the proportion of error responses increase significantly, or the number of successful responses drop suddenly, you can infer there is an anomalous situation. These same logs are going to allow us to start identifying that issue as you explore what dimensions are relevant to the trend.

This is a fine example of one use of logs, but you may be asking yourself: How does "observability through logs" relate to DevOps practices, processes and teams?

First, a development team may have different functions and roles contributing to different DevOps lifecycle stages. For example, a developer might write a code patch for our suboptimal query. But how does she know this patch actually improves performance, or that the unit tests and deployment were successful?

Logs: feedback at every stage of DevOps

It all comes back to data. Without direct feedback, it's difficult to ensure you haven't missed anything. Data - in this case, logs - can help individuals and teams in each stage of the DevOps lifecycle.

Development

Developers write, test and commit their code in the development stage. Code instrumentation messages, exceptions, unhandled exception traces, custom code library events, and compiler error logs can all provide valuable feedback in this stage. The result is better quality code. For example, a developer can craft custom log messages from their Java application using one of the Java-logging APIs. These can help them troubleshoot issues before they deploy code into production.

Testing

The testing stage involves various disciplines like unit, integration and system testing, all of which check code for proper functionality. Performance testing measures the code's execution time or throughput against set thresholds.

All these tests can generate large volumes of metrics, logs and traces. Involving multiple systems makes it even more complex. However, when an observability solution collects and correlates this data, and then shows meaningful results, the testing stage becomes easier. For example, a side-by-side comparison of failed tests in the development environment and the user acceptance testing (UAT) environment may reveal the code in UAT is not the latest or perhaps that other parameters have changed.

Deployment

The deployment stage releases the code into production. DevOps engineering typically implements this with automated, continuous deployment pipelines. This is where you provision new infrastructure stacks, update configurations and push application code. Every tool in this space (such as Terraform, Jenkins or Ansible) will have its own logs. Naturally, receiving comprehensive real-time logs from deployment pipelines will facilitate the troubleshooting of any problems.

Monitoring

Once your application is deployed, the DevOps lifecycle enters the monitoring stage. In this stage, you'll want to know how each component of your solution is performing. The volume of log streams from different sources - whether those are APIs, service meshes, SIEM systems or others- will significantly increase in the monitoring stage. And once again, you will need to collect, parse and analyze this data to gain meaningful insights.

Why Do You Need a Log Management Solution for DevOps Observability?

A log management solution can be a core observability component for DevOps processes and teams. Here are a few reasons why.

Proactive issue remediation

You can configure a logging solution to detect errors, warnings and anomalies in real time and warn appropriate teams through alerts. This means DevOps teams can proactively address problems before they become incidents. In this way, you can spare developers from needing to suppress many production issues. Log management solutions sometimes go one step further, taking proactive steps like creating service management tickets or executing the predefined runbook.

Context for issue discovery, insights for capacity planning

In many cases, DevOps doesn't' have to depend on developers if they have relevant information available. For example, if the charts show that the number of 5XX errors isn't constant, or at least slowly increasing, this indicates a problem. Many log management solutions add contextual information to alerts, showing historical trends and comparisons over time. Automatic correlation between different log sources can also unearth otherwise inconspicuous problems. Dashboards and charts give a bird's-eye view of the overall health of an application. These advanced features can also help plan for future capacity.

Improved security practices

Finally, a modern log management solution helps foster security in DevOps practices and lets DevOps and security teams work collaboratively. Logs ingested from across your security toolchain can help identify (and therefore secure against) potential threats. Such insights allow a DevOps team to bake security into its practices. For example, a Node.js application can pass all the deployment pipeline tests but fail a security test if one of its NPM packages contains vulnerabilities. Such preventive measures can be the result of the DevOps teams regularly dealing with vulnerability messages.

Now, let's say the ITOps or SecOps teams still see a new kind of vulnerability warning from a scanner log. This zero-day vulnerability hasn't been tested in the deployment pipeline, so the DevOps team is unaware of it. That's when ITOps can contact DevOps to check their deployment history. Based on the deployment logs, the DevOps team can correlate the latest warnings to a deployment made a week ago. They can now roll back the change and work with ITOps to ensure the warnings are gone, and the system is functioning correctly.

What Capabilities Should a Log Management Solution Have?

Handling the volume and velocity of data from today's systems

Data from even a moderately sized IT environment can be massive, so the platform must be able to handle both the volume and velocity of the data and efficiently store them. It needs the ability to filter out the data you don't want. Whatever information it contains, you should easily be able to search for events, create and save complex queries, analyze and compare values and correlate trends. The tool should also allow you to create graphical representations of your analysis in charts and dashboards, assign thresholds for field values you want to monitor, and integrate with your team's communication channels like Slack, Teams, SMS, email or PagerDuty.

Seamless integration

To choose a great log management platform, you should consider its capability to integrate seamlessly with most (if not all) of your infrastructure, network and application resources. This means the platform should natively capture data coming from your web servers, databases, firewalls, third-party tools, operating systems, networking devices, etc.

Other advanced options include proactive measures like running remediation steps, automatic anomaly and threat detection, and predictive modeling.

Conclusion

We've looked at observability, its components, and how it helps DevOps practices, lifecycles, and teams. You've also seen how logs, a crucial part of observability, can make typical DevOps workflows simpler and more efficient. A log management system is thus a necessary tool in the DevOps arsenal.

##

About the Author

Frank-Reno 

Frank Reno is a Senior Product Manager for Humio, a CrowdStrike company, where he is focused on all things DevOps and Observability. He is a Kubernetes and Open Telemetry enthusiast with a passion for building products that bring Observability to everyone.

Published Wednesday, June 08, 2022 7:30 AM by David Marshall
Filed under: ,
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<June 2022>
SuMoTuWeThFrSa
2930311234
567891011
12131415161718
19202122232425
262728293012
3456789