Virtualization Technology News and Information
Article
RSS
Topology-based observability: the hows, whats, and whys

With the rise of cloud native development, observability and monitoring are often interchangeably used. While monitoring doesn't need any formal introduction, having been around for ages, according to the CNCF Glossary, observability is the property of a system that defines the degree to which it can generate actionable insights. These insights help users understand the system's state and enable them to take corrective action if needed.

As a natural evolution of traditional data collection methods, observability leverages several specialized tools to track and measure low-level signals from the system, some of which are visible on the observability and analysis section of the cloud native landscape.

However, in the case of cloud native systems, relying solely on observability data to troubleshoot and resolve issues would be insufficient. Not only are the system architectures more complex and distributed but they're also ephemeral as opposed to long-lived and mutable traditional infrastructure. This is where the capabilities introduced by topology-based observability shine.

What is topology-based observability?

Topology-based observability is an approach that not only tracks metrics, logs, and traces but also maps how the components of a system interact with each other. This allows users to visualize and analyze the relationships between different components in a system, giving them a more contextual understanding of performance and reliability issues.

Fundamentally, the approach answers the following three questions,

  • How do the different services within a workload interact with each other?
  • What are their dependencies?
  • Where are the potential failure points or bottlenecks?

Answering these questions empowers organizations to make more informed decisions about service recovery during an outage and architectural improvements.

How does topology-based observability help?

In the world of cloud native, where dynamic, microservices-based workloads are the norm, topology-based observability offers several advantages that are in sync with these modern practices.

Improved recovery times and root cause analyses

In cloud-native environments, failures can cascade quickly through interconnected services, making it difficult to pinpoint the problem area. This situation makes it challenging to analyze the root cause and severely impacts service recovery times. Topology-based observability allows teams to visualize service dependencies and interconnections between components, helping them quickly identify and isolate faults. What's more? It also empowers them to adapt their monitoring strategies dynamically by providing real-time updates on changes in service interactions and dependencies as they occur, including providing insights about vulnerabilities and malicious actors in the threat landscape.

Enhanced contextual information without reinventing the wheel

While projects like OpenTelemetry provide the raw telemetry data required for observability, such as metrics, logs, and traces, topology-based observability adds a layer of context to the existing information by illustrating how the different components within a workload are connected. This visualization helps teams discover where the issues occur and point to the underlying problem by presenting the state of the dependencies and interactions between various services.

Effective resource optimisation

Cost and resource management are crucial to developing and maintaining infrastructure efficiently-whether cloud-native or otherwise. Topping the benefits of traditional observability metrics with a clear understanding of service dependencies, topology-based observability enables organizations to make informed scaling decisions, ensuring that only the required resources are used without falling into the trap of over-provisioning!

Implementing topology-based observability

As is evident, leveraging topology-based observability can significantly enhance an organization's ability to monitor and troubleshoot complex cloud-native applications. Let's look at how you can leverage existing tools in the open source landscape to implement topology-based observability for your workloads.

Define your observability goals

Before discussing any tools or their implementation, you must ascertain what metrics, logs, and traces are most relevant to your context. Always allow your goals to define your tool choice and not vice versa.

Choose your core observability stack

Of course, one look at the number of options on the CNCF Landscape is enough to intimidate even the steeliest of SREs. However, setting up a basic observability stack would involve fulfilling the following requirements.

  • Collecting raw telemetry data with projects like OpenTelemetry
  • Storing the collected data
  • Visualizing the data and creating separate views with projects like Grafana
  • Analyzing data

Create and visualize dependency maps

However, what about the added layer of context? This is where the capabilities of projects like Jaeger, GUAC (Graph for Understanding Artifact Composition) shine. By presenting information about service and application dependencies, respectively, these projects enable you to create dependency maps that would add a rich layer of contextual information over and above the raw telemetry data you collected with the basic observability stack. What's even better? You can also plugin these insights into a Grafana dashboard for a visual representation of these dependencies.

But, building out this system isn't enough! Continually monitoring, maintaining, and updating the entire stack you've built is equally crucial for interaction changes or any new vulnerabilities. Also, encouraging and fostering a collaborative culture around the stack you've built is vital to reaping the benefits of this approach.

We believe that topology-based observability is the next step to improving your observability game. It will enable you to gain valuable insights into your software ecosystems, allow for proactive management of dependencies, and improve overall system performance. Have you or your organization implemented it yet? Did we miss something important? We'd love to learn about your approaches and swap stories at the SUSE booth ( D2) during the upcoming KubeCon + CloudNativeCon in Salt Lake City.

To learn more about Kubernetes and the cloud native ecosystem, join us at KubeCon + CloudNativeCon North America, in Salt Lake City, Utah, on November 12-15, 2024.

##

ABOUT THE AUTHOR

Divya Mohan, Principal Technology Advocate at SUSE

Divya Mohan 

Divya is a Principal Technology Advocate at SUSE, where she contributes to Rancher's cloud native open source projects. She co-chairs the documentation for the Kubernetes & LitmusChaos projects & has previously worked extensively in the systems engineering space during her tenure with HSBC & IGate Global Solutions Pvt Ltd. A co-creator of the KCNA exam & a CNCF ambassador, she is invested in making technical communities & technologies more accessible & inclusive.

Published Tuesday, October 22, 2024 7:30 AM by David Marshall
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<October 2024>
SuMoTuWeThFrSa
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789