Virtualization Technology News and Information
Article
RSS
Is Kubernetes Optimization the Key to a Greener Cloud?

In an era where cloud computing is experiencing exponential growth, with Gartner projecting global end-user spending to reach $600 billion this year, the industry finds itself at a critical juncture. As cloud usage skyrockets, so do concerns about its carbon footprint. In fact, recent estimates suggest that data centers have a greater carbon footprint than the entire commercial aviation industry.

These two increasingly important costs, financial and environmental, share a similar goal: reduce resource overallocation. This convergence is giving rise to new tools designed to measure, monitor, and mitigate the energy consumption of cloud infrastructure. The first step in optimizing cloud usage is gaining visibility into resource allocation and utilization patterns.

At the heart of this optimization effort lies Kubernetes. With 84% of organizations using or evaluating Kubernetes, targeting sustainability in Kubernetes can make a significant difference in cutting both costs and carbon emissions.

A common misconception is that managing carbon emissions and energy use is only the cloud provider's responsibility. While cloud providers should continue to work to reduce their environmental footprint, they supply the computing infrastructure that customers request. According to the shared responsibility model, once customers provision their Kubernetes infrastructure, they are also responsible for the efficient use of these resources.

There are many incentives to reduce waste in Kubernetes clusters both from a cost and environmental perspective. There are also many great patterns for building green software that may or may not reduce cost such as demand-shifting, which involves moving compute to regions with a lower carbon intensity. Here, the focus is on waste reduction. After all, the greenest energy is the energy that we don't use.

Kepler: Shedding Light on Cloud Energy Consumption

One of the most pressing challenges is the measurement of energy consumption in cloud environments. Traditional methods fall short in the complex, virtualized world of cloud computing. This is where projects like Kepler (Kubernetes Efficient Power Level Exporter) come into play. Kepler, a CNCF sandbox project, uses eBPF technology to attribute power to processes and pods, providing engineers with crucial data to optimize their workloads for energy efficiency.

Kepler works by aggregating energy metrics, using either RAPL (Running Average Power Limit) or an estimation model based on machine learning when RAPL is not available. This allows for the collection of both pod-level and node-level energy metrics, which can then be exported to Prometheus for further analysis using the open source Kubernetes Monitoring Helm Chart.

Once deployed, engineers can use tools like Grafana to visualize their cloud carbon footprint. You can track the energy consumption of Kubernetes components, which makes it possible to convert resource utilization into CO2 grams per kilowatt-hour per day for everything from cluster to container. This allows you to monitor carbon emissions per cluster or power usage per pod or tenant, showing how much power individual applications or customers consume.

grafana-1 

Metrics about cost, energy, and resource utilization not only enable software engineers to contribute to environmental sustainability but also offer potential cost savings. As global awareness and environmental consciousness grow, monitoring the energy consumption and carbon emissions of Kubernetes workloads is becoming an emerging practice. There are many open source groups leading the innovation around this in a cloud native context, such as the CNCF Environmental Sustainability Technical Advisory Group.

Resource utilization metrics can be a proxy for optimizing around both carbon and cost. Such metrics can help with Kubernetes optimizations by measuring the ‘idle ratio' of fleets of Kubernetes clusters so that we can measure the impact of our optimizations. In the Platform team of Grafana Labs, we monitor the ‘idle ratio' of our fleet of clusters for each cloud service provider. The idle ratio is calculated by dividing the cost of the cluster's unused CPU and memory by the cost of its full capacity. These metrics can help Kubernetes users identify and eliminate resource waste through techniques such as right-sizing and bin-packing.

Practical Steps Towards a More Sustainable Kubernetes

Beyond energy measurement, Kubernetes' dynamic nature and ability to efficiently manage resources make it a powerful tool for optimizing cloud usage. However, it can also introduce complexity that requires sophisticated monitoring solutions. When done correctly, monitoring Kubernetes clusters helps:

  • Identify and eliminate resource wastage: By monitoring CPU, memory, and energy utilization across pods and nodes, teams can spot overprovisioned resources and rightsize their deployments, potentially leveraging tools such as the Kubernetes Vertical Pod Autoscaler or Descheduler.
  • Optimize horizontal autoscaling: Proper monitoring allows for fine-tuning of horizontal scaling through the Horizontal Pod Autoscaler (HPA) and Kubernetes-based Event Driven Autoscaler (KEDA), ensuring that resources scale efficiently based on actual demand.
  • Detect and resolve performance bottlenecks: Quick identification of issues like CPU throttling, Out of Memory (OOM) errors or memory leaks helps balance optimal performance with minimal resource usage.
  • Implement intelligent workload scheduling: With detailed metrics on node utilization and workload patterns at hand, engineers can safely and reliably implement more energy-efficient scheduling policies through bin-packing, potentially leveraging features like the Kubernetes Scheduler's MostAllocated scoring strategy or Karpenter to consolidate workloads onto fewer nodes.

grafana-2 

This information enables organizations to make informed decisions to optimize their infrastructure and reduce unnecessary energy consumption. By fine-tuning resource allocation and identifying inefficiencies, companies can significantly reduce their carbon footprint while improving overall system performance.

The Road Ahead: Platform Capacity Management

Platform Capacity Management optimizes cloud infrastructure by enhancing resource utilization and cutting costs while preventing incidents. Tools that help include those that right-size resources at the workload level (e.g., VPA) and those that optimize Kubernetes scheduling decisions through bin-packing at the cluster level (e.g., Karpenter, GKE Autopilot, Kubernetes descheduler). These tools help to ensure resources align with actual demand.

Right-sizing practices like setting CPU and memory requests and limits are crucial for effective resource use, impacting both reliability and efficiency. In fact, 37% of organizations have 50% or more workloads that require container rightsizing. Simple adjustments, such as setting memory requests and limits, can reduce costs and environmental impact. It is important to have monitoring and alerting in place when optimizing around CPU and memory to be able to catch issues that may arise from this, such as OOM errors.

Taking this a step further, engineering teams can benefit from tools that automate right-sizing such as VPA. This tool provides built-in mechanisms to be able to react to scenarios in which the optimization went too far (such as in the case of OOM errors) and help prevent or react faster to incident scenarios.

Don't forget that a resource utilization of 60-80% is the realistic optimal range for managing cloud costs. Reaching 80-90% (for both CPU and memory) requires careful skill to avoid risks while maximizing efficiency. At the cluster level, tools for bin-packing can help with optimizing the Kubernetes scheduler to "pack" as many workloads on as few nodes as possible.

The goal for us in our Platform is to reduce this idle ratio across our fleet of Kubernetes clusters. Cluster-level configurations that can help with this include the Kubernetes descheduler's HighNodeUtilization strategy, Karpenter's disruption strategies, and the Kubernetes scheduler's MostAllocated scoring strategy.

For a deep dive into our journey with bin-packing on EKS, make sure to read a previous post from our team: How Grafana Labs switched to Karpenter to reduce costs and complexities in Amazon EKS.

A Greener Cloud on the Horizon

Moving forward, we must embrace "stubborn optimism" - acknowledging the enormity of the sustainability challenge while maintaining the belief that collective efforts can make a real difference. Stubborn optimism is the input we need to keep advocating for environmental awareness in our industry. The future of cloud computing must be sustainable, and it's up to the entire tech community to make it so.

Engineers/developers can take immediate steps to contribute to this mission by adopting monitoring tools, visualizing energy consumption and leveraging platform capacity tools but given the cost benefits, it's also a smart business decision.

To learn more about Kubernetes and the cloud native ecosystem, join us at KubeCon + CloudNativeCon North America, in Salt Lake City, Utah, on November 12-15, 2024.

##

ABOUT THE AUTHOR

Niki Manoledaki, Software Engineer at Grafana Labs

Niki-Manoledaki 

Niki Manoledaki is a Software Engineer at Grafana Labs. In the open-source ecosystem, she advocates for cloud-native environmental sustainability through the CNCF Environmental Sustainability Technical Advisory Group, OpenGitOps, Kepler, and SustainabilityCon.

++

Vasil Kaftandzhiev, Staff Product Manager at Grafana Labs

Vasil-Kaftandzhiev 

Vasil Kaftandzhiev is a staff product manager at Grafana Labs where he leads development of the company's Kubernetes and AWS monitoring solutions.

Published Wednesday, October 23, 2024 7:32 AM by David Marshall
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<October 2024>
SuMoTuWeThFrSa
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789