VMblog spoke with Michel
Tricot, co-founder and CEO of Airbyte, a company which
started in 2020 as an open-source data integration platform with a vision of
commoditizing data integration pipelines across all industries and
organizations. Read to learn more about the company, how they are solving data integration issues for their customers, and what he sees as the future of data movement.
VMblog: First of all, give us an overview
of Airbyte and a bit of its history.
Michel Tricot: Airbyte
is the fastest-growing open-source data integration platform. Airbyte was
co-founded by myself (former director of engineering and head of
integrations at Liveramp and RideOS) and John Lafleur (serial dev tool
entrepreneur) and has raised total funding of $181.2 million with a valuation
of $1.5 billion since its inception in mid-2020.
With
our growing community of 9,000 data practitioners and 400 contributors, Airbyte
is redefining the standard for moving and consolidating data from different
sources like APIs and databases to destinations like data warehouses and data
lakes in a process called extract, load, and transform (ELT). Over the past
year and a half, more than 30,000 companies have used Airbyte to sync data from
sources such as Postgres, MySQL, Facebook Ads, Salesforce, Stripe, and connect
to destinations that include Redshift, Snowflake, Databricks and BigQuery.
Airbyte's
open-source data integration solves two big problems: First, removing the need
for companies to build or maintain data connectors. Second, providing access to
hundreds of long-tail out-of-the-box connectors.
VMblog: Why has Airbyte put Open Source at
the forefront of your mission to solve data integration?
Tricot: No company with a
closed-source solution will be able to keep up with a community-powered
open-source platform when it comes to the speed at which we can build and
maintain reliable connectors. If you're using a closed-source solution, it will
only support your needs for a limited number of connectors. Outside of those,
you will need an in-house data engineering team to build and maintain
connectors, which most of us know is not a walk in the park. This is why Open
Source is the only future-proof way to solve data integration, especially as we
use more and more tools to move data.
VMblog: What are the problems with data
integration today?
Tricot: Data integration
and reliable piping of data are EXTREMELY hard problems. But they always
disguise themselves as simple. Most people eventually realize that while
building data integration might be easy, maintaining data integration is
increasingly more complex every month. There will always be something missing
or needing to be fixed, and this quickly becomes unsustainable as illustrated
in our "just a little script" article.
VMblog: What problem are you solving for
your users? How do you solve them better?
Tricot: This inability to
maintain a growing number of data integrations is why many closed-source
solutions plateau at about 150 connectors. The cost of adding and maintaining
new connectors grows continually as the demand for those long-tail connectors
drops off rapidly. We would argue that the ROI quickly deteriorates after about
50 connectors. You can see examples of this with closed-source companies that
attempt to address the long tail of connectors, but quickly run into quality
issues by spreading their dev teams too thin. This demonstrates that they can't
maintain connectors at scale on their own.
With Airbyte, we
have built an amazing community of core users, with over 40,000 deployments and
over 400 contributors. Every day, we receive dozens of pull requests (PRs) for
connector improvements, documentation requests, schema changes and because of
this, we are able to evolve the Airbyte offering at an exponential rate. It's a
textbook flywheel effect: more contributors = more capabilities and connectors
= even more contributors, and so on. This is how we will achieve our mission of
making data available and actionable to everyone, everywhere.
VMblog: How is Open Source fundamental to
solving these problems?
Tricot: Open Source gives
control, freedom, and agency to a data engineer. If a connector breaks, either
the fix has already been addressed by the community or you can fix it on your
own time, in your own way, and share the improvement with everyone else.
VMblog: What do you see as the future of
data movement?
Tricot: There
are three main components to a data stack: Extract-Load (Airbyte), the
processing engine (Snowflake, Databricks, Bigquery and others), and activation
(BI, Dashboarding, reverse-ETL).
Of the three main
components, only Processing & BI/Dashboarding have mature solutions.
Everything related to integrations (Extract-Load and reverse-ETL) can't be
considered mature if the solution is closed-source and doesn't address the
custom and long-tail needs of all companies.
The same applies
to reverse-ETL, and Airbyte is in a great position to tackle it. All companies
want to know where their data comes from, and want access to data lineage,
which reverse-ETL-only solutions can't provide. Airbyte can fill this gap by
knowing where the data came from to begin.
This explains our
investors' reasoning behind our Series-B valuation: Open Source will not only
dominate EL, but also reverse-ETL. That's our mission: making data available
and actionable to everyone, everywhere.
VMblog: What's next for Airbyte?
Tricot: These past few
years haven't been the most calm for startups. In a way, we've been fortunate
to be born in turmoil (COVID outbreak, the war in Ukraine, the new economic climate) since it has been a forcing function for us
to remain focused on building our product and delivering value to our users. We
are growing on top of strong fundamentals: our product, our community, our team
and our customers. This is what we've been busy doing over the past two years
and this is what will keep us busy for many more years.
We like to think
of Airbyte as a sequence of steps:
- Become the leading EL open-source
platform and vendor:
address all your EL needs (customizability, long-tail, giving agency), via
open source or Cloud.
- Become the leading reverse-ETL
open-source platform and vendor:
address all your reverse-ETL needs (customizability, long-tail, giving
agency). Open source or Cloud, with all the metadata necessary (eg: data
lineage).
- Become the operating system of
data: EL,
reverse-ETL, and the intelligence layer on top of it: data lineage,
discovery/cataloging, quality, privacy, etc.
A lot of it will
be built by the Airbyte team and our community, and other parts will be
developed by integrating with existing solutions that have deep expertise and
very strong products already.
Today, we are at
step 1. We released the first version of our Cloud offering back in April, and
adoption is already way higher than our open-source adoption in its early days.
The usage pattern is very similar to what we see in Open Source, and we already
have over 2,000 companies using Airbyte Open Source as their primary EL
platform. Stay tuned for steps 2 and 3.
We
are on a mission to make data available and actionable to everyone, everywhere.
It is a problem that will become harder and harder as data fragmentation
continues to grow, so we won't solve it alone. We will solve it with our
community, customers, contributors,partners and users.
##