Industry executives and experts share their predictions for 2023. Read them in this 15th annual VMblog.com series exclusive.
The Future of Data Integration
By Michel Tricot - Co-founder and CEO,
Airbyte
Data integration has evolved a lot since data
became centralized in data warehouses and lakes and is now often referred to as
the modern data stack. The issue with "modern" is that today's modern becomes
tomorrow's outdated. Gartner estimates that by 2025, 80% of organizations
seeking to scale digital business will fail because they do not take a modern
approach to data and analytics governance.
So, how will data
integration evolve in 2023? Let's have a look specifically at the issues
related to ELT and reverse-ETL at the bottom of a modern
data stack.
The data pyramid of data needs
Growth in in-house connectors
I predict that
organizations will drastically increase the number of in-house connectors that
they have to build and maintain in 2023 for two specific reasons.
- Long tail needs: Most ELT solutions
cannot keep up with the number of tools that companies use. ELT solutions
plateau at around 150 to 200 data connectors. And the reason is simple:
the hard part about data integration is not only about building
connectors, but also about maintaining them. Any cloud-based,
closed-source solution will be restricted by ROI. It isn't profitable to
support the long tail of connectors, so the focus is only on the most
popular integrations.
- Custom needs: Companies all have different data needs. Sooner or later, they are
stuck because their ELT solution is missing one API stream, or they need a
specific synchronization method that is specific to their business needs.
In this case, there is no other choice but to build and maintain it
themselves.
Consolidation of ELT and reverse-ETL
I predict
organizations will consolidate their ELT and reverse-ETL processes. The
technological differences between the two are small enough that this
consolidation should happen easily enough.
There are several
significant benefits to this consolidation:
- Teams want to know where the data
comes from (also called data lineage), for example, which API, source,
campaign, channel, etc. Unfortunately, reverse-ETL solutions don't have access
to that metadata. Data teams need to add that information manually to their
sync, which requires even more data engineering work.
- It's easier to monitor all your
pipelines within one platform to reduce data engineering efforts and
context-switching between multiple tools and platforms.
The advent of Open Source
While open source
software has been rising across technologies, I predict there will be a
significant open-source push within the ELT industry. As
discussed earlier, closed-source ELT vendors can't address every company's
long-tail and custom needs. In my opinion, the only way to address them is
through open source with an active contributor community engaged in publishing
their own data connectors for the benefit of all. This is how the industry can
get from 200 to thousands of data connectors.
Open source can
maintain a high level of reliability across all those connectors. There are two
ways this will be done:
- By providing
tools enabling the building and maintaining of data connectors easy by
abstracting everything that is not specific to integration itself. In
other words, the craft of moving data in a secure, fast and reliable way
will be taken care of by the open-source tool, the maintainer will just
have to configure the data connector endpoints to handle the API logic.
- By incentivizing the open-source contributors to maintain their
contributed connectors through financial or recognition incentives. It
could even be a marketplace for connectors, where individuals and
companies could publish their connectors, much like an App Store.
A New Operating System of Data Pipelines
As Snowflake is
fast-becoming the data cloud - the place where all operations are performed on
your data - I predict that the same will happen with data integration.
Such a platform
will do the following:
- ELT and reverse-ETL with
open-source connectors that you can customize at will
- An active data engineering
community incentivized to maintain the long tail of connectors
- Data lineage at the reverse-ETL
level
- Observability across all data
pipelines
- Orchestration with other data
third-party tools
This platform -
the operating system of data pipelines - will make the current modern data
stack obsolete. This is what all data teams are striving for. They don't want
to integrate lots of tools, they want a platform that does it all while
providing the customizability they need to address all their needs.
The need for change is here
These predictions are based on a need we see
growing. As of 2008, only 3 of the 10 most valuable enterprises
were actively taking a data-driven approach, that number more than doubled and
reached 7 out of 10 in 2021. Why? This strategy pays off. Data-driven
organizations are 162% more likely to exceed revenue goals than
their non-data-driven counterparts - let's see what 2023 makes out of this.
##
ABOUT THE AUTHOR
Michel Tricot is co-founder and CEO of
Airbyte, which started in 2020 as an open-source data integration platform with
a vision of commoditizing data integration pipelines across all industries and
organizations. He has been working in data engineering for the past 15 years
and previously was head of integrations and director of engineering at Liveramp
(NYSE: RAMP).