Virtualization Technology News and Information
Article
RSS
Declarative Computing: The Future of Cloud Infrastructure Optimization
The cloud computing industry is facing a critical challenge - how to efficiently allocate resources and optimize costs at scale. As organizations increasingly rely on cloud platforms like Databricks to power their data and AI initiatives, managing the complexity and costs of cloud infrastructure has become a major pain point. Enter Sync Computing, an innovative startup that is pioneering a new paradigm called "declarative computing" to tackle this problem head-on.

Founded by MIT and UC Berkeley alums with backgrounds in high-performance computing, Sync Computing has developed a machine learning-powered platform that aims to revolutionize how organizations manage their cloud resources. In a recent briefing at the 58th IT Press Tour in Boston, Sync Computing CEO and co-founder Jeff Chou shared insights into the company's technology and vision.

The Resource Allocation Problem

Chou outlined what he calls the "resource allocation problem" in cloud computing:

"Anytime you want to spin up any resources, even on-prem or on the cloud, the old way of doing it was always, you have your code, you have your data and then you always have to specify the compute resources. You have to say, I want this many nodes of this instance type, this storage, this network, etc. And you specify it and then you run it and then you get that on the cloud and then the output is always these things - these kind of business metrics that everyone sees, that job costs you a hundred dollars, that took one hour to run, your latency was 300 milliseconds, etc. And this is literally how the entire world works today."

This traditional approach leads to three major business problems:

  1. High compute costs
  2. Inability to tune infrastructure at scale
  3. Difficulty meeting SLA deadlines

As Chou explained, "Compute costs are just incredibly high. Cloud costs as everyone knows is a huge problem and everyone wants to be more efficient. Two, you can't tune at scale. These are kind of more application specific settings and compute things are very complicated and companies might have thousands or tens of thousands of pipelines running. And no amount of employees can manage and optimize the compute at that scale. And the third bucket is more on a performance side which is like you can't hit SLA deadlines."

Introducing Declarative Computing

To address these challenges, Sync Computing is proposing a new paradigm called "declarative computing." The key idea is to flip the traditional model on its head:

"Why can't we flip it around and say, instead of the outputs being the cost and runtime, why can't the input be what we want the business performance to be?" Chou asked. "For example, here's the code. And instead of specifying low-level compute resources, I want to specify high-level business goals because that's all I really care about."

With declarative computing, users simply specify their desired business outcomes - like minimizing cost, maximum runtime, and target latency. An intelligent system then figures out the optimal infrastructure configuration to meet those goals.

Closing the Feedback Loop with Machine Learning

The secret sauce behind Sync Computing's approach is a closed-loop feedback system powered by machine learning. As Chou described:

"What's actually missing in computing at all in the entire industry is a feedback loop. There is no feedback loop going from 'hey this job ran' and how did it do and should we try to improve things? There is nothing today. The entire cloud industry is completely static and the resources are fixed."

Sync Computing's platform continuously monitors workloads as they run, collecting data and using machine learning models to understand the relationship between infrastructure choices and performance outcomes. This allows the system to automatically optimize and tune infrastructure over time.

"The key concept that we're doing is we have this closed-loop feedback that goes between the customer environment and then the Sync environment, and we're actually closing this loop and actually managing customers' infrastructure," Chou explained.

Gradient: Automated Infrastructure Optimization for Databricks

sync-computing-gradient 

Sync Computing's flagship product, called Gradient, applies this declarative computing approach specifically to Databricks environments. Gradient integrates with customers' Databricks jobs and automatically optimizes infrastructure settings to reduce costs and improve performance.

Chou demonstrated how Gradient works:

sync-computing-gradient-1 

"This is our product page. And this is a, for example, a job, a Databricks job. Let's say it runs once an hour. The top graph is the total cost of your job. This is both the Databricks cost and your cloud cost. So we kind of put it all together for users which people really like because that's very hard to do. The bottom graph is the runtime."

He showed how Gradient goes through an initial learning phase for each job, then switches to an optimization phase where it dramatically reduces costs:

sync-computing-gradient-2 

"Once it kind of figures it out, we have these proprietary ML models behind the scenes where we say, okay, I got it. And then it switches to green so that it can start optimizing. And it might go back and forth between green and gray because you might want to explore more search opportunities, but eventually, it'll figure it out and then drop costs tremendously."

In one example, Gradient reduced the cost of a job from $8.31 to $0.90 - an 89% cost savings - while maintaining similar runtime performance.

Beyond Cost Optimization

While cost savings are a major benefit, Chou emphasized that Gradient's capabilities go beyond just reducing spend. The platform can also help organizations hit specific performance targets and SLAs.

He demonstrated how users can set a target runtime for a job, and Gradient will automatically reconfigure the infrastructure to meet that goal:

"All the user has to do is come in here to our settings and change the SLA from zero to five, click save, and now the algorithm says, alright, instead of optimizing for cost, you're going to go back to that declarative computing concept. Your goal is to cut down runtime."

This allows organizations to make intelligent tradeoffs between cost and performance based on their business needs.

Key Differentiators and Competitive Landscape

When asked how Sync Computing's approach differs from traditional job schedulers or auto-scaling, Chou highlighted several key differentiators:

  1. Custom ML models for each workload: "If we're monitoring a thousand Databricks jobs, for example, there are a thousand different models. Each one custom tuned for each job."

  2. Intelligence beyond simple rules: "Auto scaling is literally in a one-line if statement on what are the rules to add and remove workers. And what we're trying to do is next level, which is much more intelligence measurement data-driven analysis."

  3. Focus on batch workloads: "One kind of technical requirement of ours is it's batch workloads. Meaning it runs, it finishes, it ends. As opposed to streaming for example where it's just always on all the time."

Chou noted that while Databricks recently launched a serverless offering that aims to address some of the same challenges, it is still very new and optimized primarily for performance rather than cost. He believes Sync Computing's more flexible, ML-driven approach provides additional value.

Looking Ahead: Expansion Beyond Databricks

While Sync Computing is currently focused on optimizing Databricks environments, Chou sees significant opportunity to expand to other platforms and use cases:

"Databricks is just strategically for us. We wanted to be very specific. We get requests all the time. Snowflake is probably one of our top requests. But then other kind of more general computing like on AWS - containers, Kubernetes, Lambda functions, ECS, EKS, these kind of compute resources that are used all the time in batch workloads."

He indicated that the company plans to gradually expand its offerings each quarter based on customer demand and market opportunity.

The Road Ahead

As cloud adoption continues to accelerate and organizations grapple with rising infrastructure costs, solutions like Sync Computing's declarative computing approach are likely to become increasingly critical. By leveraging machine learning to automate infrastructure optimization, Sync Computing aims to give engineers and data scientists more time to focus on building products and deriving insights rather than managing compute resources.

While the company is still in its early stages, with its Gradient product only becoming commercially available earlier this year, Chou believes they are just scratching the surface of what's possible:

"We're really just getting started on what we can do. And then we want to generalize this because there are some shades of this you can apply to Snowflake for example, Kubernetes containers, etc. But the challenge is building a really good model and that's kind of what we focused on."

As organizations increasingly seek ways to tame cloud costs without sacrificing performance, Sync Computing's innovative approach to infrastructure optimization positions them as a company to watch in the evolving cloud computing landscape.

##

Published Tuesday, October 22, 2024 7:30 AM by David Marshall
Filed under:
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<October 2024>
SuMoTuWeThFrSa
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789