Virtualization Technology News and Information
VMblog Expert Interview: Michel Tricot Talks Airbyte, Open Source, and Data Integration


VMblog spoke with Michel Tricot, co-founder and CEO of Airbyte, a company which started in 2020 as an open-source data integration platform with a vision of commoditizing data integration pipelines across all industries and organizations.  Read to learn more about the company, how they are solving data integration issues for their customers, and what he sees as the future of data movement.

VMblog:  First of all, give us an overview of Airbyte and a bit of its history. 

Michel Tricot:  Airbyte is the fastest-growing open-source data integration platform. Airbyte was co-founded by myself (former director of engineering and head of integrations at Liveramp and RideOS) and John Lafleur (serial dev tool entrepreneur) and has raised total funding of $181.2 million with a valuation of $1.5 billion since its inception in mid-2020. 

With our growing community of 9,000 data practitioners and 400 contributors, Airbyte is redefining the standard for moving and consolidating data from different sources like APIs and databases to destinations like data warehouses and data lakes in a process called extract, load, and transform (ELT). Over the past year and a half, more than 30,000 companies have used Airbyte to sync data from sources such as Postgres, MySQL, Facebook Ads, Salesforce, Stripe, and connect to destinations that include Redshift, Snowflake, Databricks and BigQuery.

Airbyte's open-source data integration solves two big problems: First, removing the need for companies to build or maintain data connectors. Second, providing access to hundreds of long-tail out-of-the-box connectors. 

VMblog:  Why has Airbyte put Open Source at the forefront of your mission to solve data integration?

Tricot:  No company with a closed-source solution will be able to keep up with a community-powered open-source platform when it comes to the speed at which we can build and maintain reliable connectors. If you're using a closed-source solution, it will only support your needs for a limited number of connectors. Outside of those, you will need an in-house data engineering team to build and maintain connectors, which most of us know is not a walk in the park. This is why Open Source is the only future-proof way to solve data integration, especially as we use more and more tools to move data.

VMblog:  What are the problems with data integration today?

Tricot:  Data integration and reliable piping of data are EXTREMELY hard problems. But they always disguise themselves as simple. Most people eventually realize that while building data integration might be easy, maintaining data integration is increasingly more complex every month. There will always be something missing or needing to be fixed, and this quickly becomes unsustainable as illustrated in our "just a little script" article

VMblog:  What problem are you solving for your users? How do you solve them better?

Tricot:  This inability to maintain a growing number of data integrations is why many closed-source solutions plateau at about 150 connectors. The cost of adding and maintaining new connectors grows continually as the demand for those long-tail connectors drops off rapidly. We would argue that the ROI quickly deteriorates after about 50 connectors. You can see examples of this with closed-source companies that attempt to address the long tail of connectors, but quickly run into quality issues by spreading their dev teams too thin. This demonstrates that they can't maintain connectors at scale on their own.

With Airbyte, we have built an amazing community of core users, with over 40,000 deployments and over 400 contributors. Every day, we receive dozens of pull requests (PRs) for connector improvements, documentation requests, schema changes and because of this, we are able to evolve the Airbyte offering at an exponential rate. It's a textbook flywheel effect: more contributors = more capabilities and connectors = even more contributors, and so on. This is how we will achieve our mission of making data available and actionable to everyone, everywhere.

VMblog:  How is Open Source fundamental to solving these problems?

Tricot:  Open Source gives control, freedom, and agency to a data engineer. If a connector breaks, either the fix has already been addressed by the community or you can fix it on your own time, in your own way, and share the improvement with everyone else.

VMblog:  What do you see as the future of data movement?

Tricot:  There are three main components to a data stack: Extract-Load (Airbyte), the processing engine (Snowflake, Databricks, Bigquery and others), and activation (BI, Dashboarding, reverse-ETL).

Of the three main components, only Processing & BI/Dashboarding have mature solutions. Everything related to integrations (Extract-Load and reverse-ETL) can't be considered mature if the solution is closed-source and doesn't address the custom and long-tail needs of all companies. 

The same applies to reverse-ETL, and Airbyte is in a great position to tackle it. All companies want to know where their data comes from, and want access to data lineage, which reverse-ETL-only solutions can't provide. Airbyte can fill this gap by knowing where the data came from to begin. 

This explains our investors' reasoning behind our Series-B valuation: Open Source will not only dominate EL, but also reverse-ETL. That's our mission: making data available and actionable to everyone, everywhere.

VMblog:  What's next for Airbyte?

Tricot:  These past few years haven't been the most calm for startups. In a way, we've been fortunate to be born in turmoil (COVID outbreak, the war in Ukraine, the new economic climate) since it has been a forcing function for us to remain focused on building our product and delivering value to our users. We are growing on top of strong fundamentals: our product, our community, our team and our customers. This is what we've been busy doing over the past two years and this is what will keep us busy for many more years. 

We like to think of Airbyte as a sequence of steps:

  1. Become the leading EL open-source platform and vendor: address all your EL needs (customizability, long-tail, giving agency), via open source or Cloud.
  2. Become the leading reverse-ETL open-source platform and vendor: address all your reverse-ETL needs (customizability, long-tail, giving agency). Open source or Cloud, with all the metadata necessary (eg: data lineage).
  3. Become the operating system of data: EL, reverse-ETL, and the intelligence layer on top of it: data lineage, discovery/cataloging, quality, privacy, etc.

A lot of it will be built by the Airbyte team and our community, and other parts will be developed by integrating with existing solutions that have deep expertise and very strong products already. 

Today, we are at step 1. We released the first version of our Cloud offering back in April, and adoption is already way higher than our open-source adoption in its early days. The usage pattern is very similar to what we see in Open Source, and we already have over 2,000 companies using Airbyte Open Source as their primary EL platform. Stay tuned for steps 2 and 3.

We are on a mission to make data available and actionable to everyone, everywhere. It is a problem that will become harder and harder as data fragmentation continues to grow, so we won't solve it alone. We will solve it with our community, customers, contributors,partners and users.


Published Tuesday, September 13, 2022 7:30 AM by David Marshall
Filed under: ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<September 2022>