Galileo announced Galileo Community
Edition, a free version of its platform that enables data scientists working on
Natural Language Processing (NLP) to build high performing ML models quickly
with better quality training data. The free edition is available today and will
be showcased during the Galileo Demo Hour on November 15: https://hopin.com/events/galileo-demo-hour.
More than 80% of the world's data today is unstructured (text,
image, speech, etc.). Before Galileo launched six months ago, there was not a
tool on the market for debugging and fixing unstructured data during the ML
workflow so data scientists spent a vast majority of their time data-debugging
in Excel sheets and Python scripts, causing the productionization of high
quality models to take months.
"While data powers ML, debugging unstructured data is incredibly
manual and time-intensive. My co-founders Atindriyo Sanyal, Yash Sheth and I
noticed a complete absence of data focused tooling for unstructured data ML
while at Apple, Google and Uber AI. We repeatedly heard the same from data
science teams across the globe. This is why we started Galileo - to build ML
unstructured data tooling. Today we are making Galileo available for free
through the Galileo Community Edition for any data scientist to sign up and get
the superpowers to fix their ML data instantly," said Vikram Chatterji,
co-founder and CEO of Galileo.
Galileo instantly surfaces the erroneous/bad unstructured data
(mislabels, imbalance, drifted data, etc.) with actions and integrations to fix
them, all within one platform. This short circuits the time taken by data
scientists to curate a high quality training dataset, by fixing data errors and
selecting the highest value production data, from weeks today to minutes with
Galileo.
Users on Galileo:
- Pratik Bhavsar, founding engineer at Enterpret, said:
"Using Galileo quickly gave us a 10% absolute bump in our F1 score! It is
like having an expert on the team that identifies data errors that would
otherwise go unnoticed."
- Talal Alqadi, data scientist at involve.ai, said:
"Galileo helped us dramatically reduce the time required to find errors in
our training data and improve our Named Entity Recognition (NER) model's
F1 score by ~50%, reduce false negatives by 2x and create a model that was
able to generalize across multiple domains!"
- Viktoria Rojkova, vice president of data science at
MasterControl, said: "Galileo is a very intuitive and powerful tool that
helped us quickly curate a high quality training dataset ready for the
real world. Galileo has been clearly created by fellow data scientists for
data scientists."
- Loreto Parisi, head of ML at Musixmatch, said: "Galileo
has enabled us to build a NLP pipeline that instantly inspects training
data and improves prediction quality, to assist human data curation across
the entire ML lifecycle."
With Galileo Community Edition, anyone can sign up for free, add
a few lines of code while training their model with labeled data or during an
inference run with unlabeled data to instantly inspect, find and fix data
errors or select the right data to label next using the powerful Galileo
UI.
Galileo's Demo
Hour
Galileo's online event kicks off at 10 a.m. PT on November 15 with a fireside
chat with Anthony Goldbloom (founder of Kaggle), lightning talks by customers
on how they are instantly debugging their unstructured data and building better
ML models and a live demo of Galileo Community Edition.
Galileo Raises
$18 Million in Series A Funding and Dharmesh Thakker and Lip-Bu Tan Join the Board
Today Galileo also announced that it has raised $18 million in Series A
funding, bringing the total raised to $23.1 million. This round was led by
Battery with participation from previous investor The Factory and new investors
Walden Catalyst and FPV Ventures and industry luminaries Anthony Goldbloom,
Pegah Ebrahimi (former COO at Morgan Stanley) and Wesley Chan (former general
partner at Google Ventures). Galileo plans to use the new funding to continue
to grow its engineering and go-to-market teams and to expand its platform to
support new data modalities like Computer Vision (CV).
"It's no secret that the ML training and data quality problems
are ballooning along with the rise in ML adoption. The Galileo team has been
laser focused on this problem and has taken a unique approach to provide quick
time-to-value with a category-defining product. Going forward, ML data
intelligence will be table stakes for ML teams, and we feel Galileo is
extremely well positioned to capitalize on this trend," said Dharmesh Thakker,
a general partner at Battery Ventures and Galileo board member.
"At Walden Catalyst, we've observed an exponential adoption of
ML with unstructured data in enterprises as models get commoditized and ML
accuracy is now increasingly dependent on the quality of the data the models
are fed. At Apple, Google and Uber AI, the founders of Galileo faced the
challenges of not having any solutions while working with unstructured data to
find and fix ML data errors fast. They are tackling this fundamental problem
head on with a first to market solution. This is a huge and critical problem in
a rapidly growing enterprise market and we are excited to back them," said
Lip-Bu Tan, founding managing partner of Walden Catalyst and Galileo board
member. Tan also sits on Intel's board and has seen 130 companies he invested
in IPO.