Virtualization Technology News and Information
Article
RSS
Airbyte 2023 Predictions: The Future of Data Integration

vmblog-predictions-2023 

Industry executives and experts share their predictions for 2023.  Read them in this 15th annual VMblog.com series exclusive.

The Future of Data Integration

By Michel Tricot - Co-founder and CEO, Airbyte

Data integration has evolved a lot since data became centralized in data warehouses and lakes and is now often referred to as the modern data stack. The issue with "modern" is that today's modern becomes tomorrow's outdated. Gartner estimates that by 2025, 80% of organizations seeking to scale digital business will fail because they do not take a modern approach to data and analytics governance.

So, how will data integration evolve in 2023? Let's have a look specifically at the issues related to ELT and reverse-ETL at the bottom of a modern data stack.

data-pyramid-data-needs 

The data pyramid of data needs

Growth in in-house connectors

I predict that organizations will drastically increase the number of in-house connectors that they have to build and maintain in 2023 for two specific reasons.

  1. Long tail needs: Most ELT solutions cannot keep up with the number of tools that companies use. ELT solutions plateau at around 150 to 200 data connectors. And the reason is simple: the hard part about data integration is not only about building connectors, but also about maintaining them. Any cloud-based, closed-source solution will be restricted by ROI. It isn't profitable to support the long tail of connectors, so the focus is only on the most popular integrations.
  2. Custom needs: Companies all have different data needs. Sooner or later, they are stuck because their ELT solution is missing one API stream, or they need a specific synchronization method that is specific to their business needs. In this case, there is no other choice but to build and maintain it themselves.

Consolidation of ELT and reverse-ETL

I predict organizations will consolidate their ELT and reverse-ETL processes. The technological differences between the two are small enough that this consolidation should happen easily enough.

There are several significant benefits to this consolidation:

  • Teams want to know where the data comes from (also called data lineage), for example, which API, source, campaign, channel, etc. Unfortunately, reverse-ETL solutions don't have access to that metadata. Data teams need to add that information manually to their sync, which requires even more data engineering work. 
  • It's easier to monitor all your pipelines within one platform to reduce data engineering efforts and context-switching between multiple tools and platforms.

The advent of Open Source

While open source software has been rising across technologies, I predict there will be a significant open-source push within the ELT industry. As discussed earlier, closed-source ELT vendors can't address every company's long-tail and custom needs. In my opinion, the only way to address them is through open source with an active contributor community engaged in publishing their own data connectors for the benefit of all. This is how the industry can get from 200 to thousands of data connectors.

Open source can maintain a high level of reliability across all those connectors. There are two ways this will be done:

  1. By providing tools enabling the building and maintaining of data connectors easy by abstracting everything that is not specific to integration itself. In other words, the craft of moving data in a secure, fast and reliable way will be taken care of by the open-source tool, the maintainer will just have to configure the data connector endpoints to handle the API logic.
  2. By incentivizing the open-source contributors to maintain their contributed connectors through financial or recognition incentives. It could even be a marketplace for connectors, where individuals and companies could publish their connectors, much like an App Store.

A New Operating System of Data Pipelines

As Snowflake is fast-becoming the data cloud - the place where all operations are performed on your data - I predict that the same will happen with data integration.

Such a platform will do the following:

  • ELT and reverse-ETL with open-source connectors that you can customize at will
  • An active data engineering community incentivized to maintain the long tail of connectors
  • Data lineage at the reverse-ETL level
  • Observability across all data pipelines
  • Orchestration with other data third-party tools

This platform - the operating system of data pipelines - will make the current modern data stack obsolete. This is what all data teams are striving for. They don't want to integrate lots of tools, they want a platform that does it all while providing the customizability they need to address all their needs.

The need for change is here

These predictions are based on a need we see growing. As of 2008, only 3 of the 10 most valuable enterprises were actively taking a data-driven approach, that number more than doubled and reached 7 out of 10 in 2021. Why? This strategy pays off. Data-driven organizations are 162% more likely to exceed revenue goals than their non-data-driven counterparts - let's see what 2023 makes out of this.

##

ABOUT THE AUTHOR

Michel-Tricot 

Michel Tricot is co-founder and CEO of Airbyte, which started in 2020 as an open-source data integration platform with a vision of commoditizing data integration pipelines across all industries and organizations. He has been working in data engineering for the past 15 years and previously was head of integrations and director of engineering at Liveramp (NYSE: RAMP).

Published Monday, December 19, 2022 7:36 AM by David Marshall
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<December 2022>
SuMoTuWeThFrSa
27282930123
45678910
11121314151617
18192021222324
25262728293031
1234567