Virtualization Technology News and Information
Kubernetes Prometheus Best Practices

Kubernetes Prometheus 

Prometheus is an open-source toolkit you can use to monitor microservices and containers. It is designed to be a customizable and lightweight solution for alerting and metrics. Prometheus is best known for its use in Kubernetes, an open-source container orchestration platform. In this article, you'll learn the why and how of using Prometheus for Kubernetes monitoring.

Prometheus Features

Prometheus was developed by SoundCloud in 2012 and joined the Cloud Native Computing Foundation (CNCF) in 2016. It became the second CNCF "graduated" project in 2018.

Features of Prometheus include:

  • Kubernetes integration-provides support for service discovery and monitoring for dynamically-scheduled services. 
  • Built-in alert manager-includes customizable notification and notification methods.
  • Supports white and black box monitoring-includes client libraries, instrumentation, and exporters for both white and black box monitoring.
  • Pull-based metrics-enable you to expose metrics via an HTTP endpoint with pull requests.

Why Use Prometheus for Kubernetes Monitoring?

Outside of the obvious reason that Prometheus is included with Kubernetes by default, there are several reasons you should use Prometheus for Kubernetes monitoring.

To understand the benefits of Prometheus, it is helpful to first understand two technology shifts that affected modern monitoring standards.

  • Rise of DevOps-teams need to be able to monitor complex pipelines and multiple tech stack layers. Any monitoring solutions used need to centralize data and integrate with existing continuous integration / continuous development (CI/CD) tools.
  • Containerization-container environments are highly dynamic, with many varied elements, and extensive networks. This makes container deployments difficult to monitor and requires real-time analysis and alerting.

To adapt help teams adapt to these changes and meet these new demands, Prometheus offers:

  • A multidimensional data model-based on key-value pairs with label-based metadata. This enables you to capture time-series data that you can query using Prometheus Query Language (PQL).
  • Accessible format-provides greater visibility with human-readable formats. Metrics are published via a standard HTTP transport which you can check through any web browser.

Metrics to Focus on

There are numerous metrics available to you, which can make setting up your monitoring strategies overwhelming. The following metrics are a good place to start.

Cluster Monitoring

To ensure that your Kubernetes deployment is running smoothly, you need to monitor the health of your clusters. This includes monitoring the number and capacity of nodes in operation, the status of applications operating on each node, and your cluster resource utilization.

Specific metrics to focus on include:

  • Resource utilization-focus on disk, CPU, and memory utilization, and network bandwidth. These metrics can help you identify bottlenecks and provide insight for optimization.
  • Nodes-focus on node health and the number of nodes available. Understanding these numbers can help you ensure that your resources are suitably available and prevent wasted costs.
  • Pods-in regards to cluster monitoring, focus on the number of pods running in total and per node. This can provide insight as to whether you have sufficient node resources available and help you predict how workloads can shift if a node fails.

Pod Monitoring

Pods are the heart of your services and workloads, making their operation an obvious focus of your monitoring efforts. When monitoring pods, there are three areas you should focus on:

  • Kubernetes metrics-focus on how pod deployments are handled by Kubernetes. This includes the number of pod instances currently vs expected, pod health, traffic flow, and process of any rolling updates.
  • Container metrics-focus on network, CPU, and memory utilization and compare these values with the allowed maximum. Note, these metrics are often gathered through integrations with other tools, such as Cadvisor and Heapster.
  • Application metrics-the specific metrics you monitor depend on the type of applications you're using. For example, for eCommerce applications, you could focus on the number of users, percentage of revenue lost to authentication charges, or purchases per time period.

Kubernetes Prometheus Best Practices

When using Prometheus in Kubernetes, there are several best practices you can adopt to gain the greatest benefit.

Restrict Your Label Use

Labels enable you to customize and refine the exact data that your metrics are based on. Every labelset you create requires resources, including  RAM, CPU, disk space, and bandwidth. On a small scale, this is negligible, but on a large scale, it can generate significant resource costs.

In general, you should try to limit labels on your metrics to 10 or fewer. Most of your metrics shouldn't need labels at all. If for some reason you have metrics with a large number of labels, you should consider using alternative, dedicated analysis tools.

Use Timestamps Carefully

When you need to track event timing, use timestamps that indicate when the event happened, not the time since it happened. Doing so eliminates the need for update logic and reduces the chance of error. To determine time since, you can use time() - my_timestamp_metric to calculate this value.

Protect Your Inner loops

Try to limit the number of metrics you include in code with a high call rate (100k+ times a second) or code that is performance-critical. In Java applications, it takes around 12-17ns to increment a counter. When compounded this can create significant performance issues.

Limiting the number of metrics you call in your inner loops and avoiding the use of labels can help prevent these issues. If you need to use labels, you might consider caching your label results to reduce impact.

You should also take care when using metrics that require time or duration measurements since this information requires a syscall. To ensure that your metrics inclusion is meeting performance requirements, you can use benchmarks to measure the impact of any changes. 

Understand Available Metrics

There are four main metric types you can use-counter, gauge, summary, and histogram. Knowing when to use these types can ensure that your metrics are accurate and provide the greatest insight.

  • Counter-can only increment up or be reset. This type is useful for measuring the amount of something at event start or for counting a total number of events.
  • Gauge-can measure changes in both positive and negative directions. This type is useful for point in time values such as memory use, temperature, or in-progress requests.
  • Histogram-samples and categorizes events with a sum of all observed values. This type is useful for aggregating data.
  • Summary-performs the same as histograms, plus calculates quantiles over a sliding time window based on total event counts and sums of observed values. This type is useful if you need an accurate quartile range.


Prometheus can help you gain better visibility into your Kubernetes operations. However, you need to properly configure your monitoring processes. Otherwise, you might end up with a mess. Use the standard metrics for cluster and pod monitoring. Don't be tempted to invent the wheel with a brand new classification system. Remember that in an agile ecosystem there is often more than one collaborator, and the monitoring system needs to make sense to everyone involved in the project.


About the Author


Gilad David Maayan is a technology writer who has worked with over 150 technology companies including SAP, Samsung NEXT, NetApp and Imperva, producing technical and thought leadership content that elucidates technical solutions for developers and IT leadership.
Published Monday, February 24, 2020 7:29 AM by David Marshall
Filed under: ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<February 2020>