Galileo emerged from stealth with
the first machine learning (ML) data intelligence platform for unstructured
data that gives data scientists the ability to inspect, discover and fix
critical ML data errors 10x faster across the entire ML
lifecycle - from pre-training to post-training to post-production. The platform
is currently in private beta with the Fortune 500 and startups across multiple
industries.
"There are many MLOps platforms available on the market today,
each fully capable of orchestrating the model lifecycle," said Bradley Shimmin,
Chief Analyst of AI Platforms, Analytics and Data Management. "However, when it
comes to addressing the complex problem of inspecting and fixing the data -- especially
for unstructured data -- many platforms still presume that enterprise
practitioners work with data they already know and trust across the ML
lifecycle. This couldn't be further from the truth and is one of the biggest
bottlenecks for ML adoption today. What they need are tools that elevate the
importance of data from the outset, putting data with a capital ‘D' back into
Data Science. Galileo is tackling this critical need head on."
More than 80% of the world's data today is unstructured (text,
image, speech, etc.) and historically has been vastly untapped for ML. Recent
advancements have made it easy for any data scientist to plug and play complex
models for unstructured data, leading to a surge in their adoption across
industries.
It is common for data scientists to use spreadsheets and Python
scripts to inspect and fix their training unstructured data. Doing this ‘data
detective work' consumes more than 50% of a data scientist's time, is ad-hoc,
manual, error prone and leads to poor data transparency across the
organization, causing avoidable mispredictions and biases in production models.
Galileo takes a unique approach to this problem - with just a
few lines of code added by the data scientist while training a model, Galileo
auto-logs the data, leverages some advanced statistical algorithms the team has
created and then intelligently surfaces the model's failure points with actions
and integrations to immediately fix them, all within one platform. This short
circuits the time taken to proactively find critical errors in ML data across
training and production models from weeks today to minutes with Galileo.
Galileo goes a step further by acting as a collaborative system
of record for the data scientist's training runs, bringing transparency towards
how specific data and model parameter changes impact overall performance - this
is key for ML teams to truly be data-driven.
"The motivation for Galileo came from our personal experiences
at Apple, Google and Uber AI and from conversations with hundreds of ML teams
working with unstructured data where we noticed that, while they have a long
list of model-focused MLOps tools to choose from, the biggest bottleneck and
time sink for high quality ML is always around fixing the data they work with.
This is critical, but prohibitively manual, ad-hoc and slow, leading to poor
model predictions and avoidable model biases creeping into production for the
business," said Vikram Chatterji, co-founder and CEO of Galileo. "With
unstructured data across the enterprise being generated at an unprecedented
scale and now rapidly leveraged for ML, we are building Galileo with the goal
of being the intelligent data bench for data scientists to systematically and
quickly inspect, fix and track their ML data in one place."
Galileo Founded
by Engineering Leaders from Apple Google and Uber AI
The co-founding team at Galileo spent more than a decade building ML products
where they faced the huge challenges that ML with unstructured data present
first-hand.
- Vikram Chatterji (CEO) led product management at Google
AI where his team worked on building models with unstructured data but
spent weeks analyzing the data across the ML workflow, often using Google
sheets and scripts. This was a massive under-utilization of an expensive
resource (the data scientist) and led to poor model outcomes due to ad-hoc
tooling.
- Atindriyo Sanyal (CTO) led engineering at Uber AI
(Michelangelo) where he was a co-architect of Uber's feature store and
spearheaded company-wide ML data quality tooling, leading to huge
prediction performance improvements across thousands of production models.
He was also an early member of the Siri team at Apple where he built
foundational infrastructure for better ML data management.
- Yash Sheth (VP of Engineering) led the speech recognition
platform team at Google. He was instrumental in growing speech recognition
800x across more than 20 consumer products at Google and across thousands
of businesses globally through their cloud speech API.
Galileo Focused
on Data-Driven ML Research
Half of the Galileo team comprises researchers from Apple, Google and Stanford
AI who are focused on pushing the envelope of data-centric research that is
then baked into the Galileo platform for any ML team to leverage. The other
half of the team is focused on building novel systems that can perform
extremely low latency in-memory computations on millions of data points using
minimal system resources. This combination allows Galileo customers to get
quick, intelligent data insights throughout the entire ML workflow.
Galileo Raises
$5.1 Million in Seed Funding
Today Galileo also announced that it has raised $5.1 million in seed funding.
The Factory led the round and Anthony Goldbloom (co-founder and CEO at Kaggle)
and other angel investors also participated. Company advisers include Amy Chang
(Disney, P&G board member) and Pete Warden (one of the creators of
TensorFlow).
"Finding and fixing data errors is one of the biggest
impediments for effective ML across the enterprise. The founders of
Galileo felt this pain themselves while leading ML products at Apple, Google
and Uber," said Andy Jacques, investor at The Factory and Galileo board member.
"Galileo has built an incredible team, made product innovations across the
stack and created a first of its kind ML data intelligence platform. It has
been exciting to see rapid market adoption and positive reactions with one of
the customers even calling the product ‘magic'!"
The company plans to use the funding to hire across all
departments and accelerate research and development to meet the demand of the
industry for a purpose-built product to find and fix ML data blind spots across
the workflow while working with unstructured data.