Industry executives and experts share their predictions for 2022. Read them in this 14th annual VMblog.com series exclusive.
The Meltano Open Source Community Weighs In on the Future of Data
By Amanda Folson, Meltano
What does 2022 hold for data-especially data
integration and data tooling? I asked the people I know who are most passionate
about helping organizations realize the full potential of their data-the
Meltano community.
Meltano was birthed inside GitLab in 2018 as
an open source tool built for GitLab's data and analytics team, who wanted an
end-to-end data platform to address all their needs throughout the data science
life cycle. (In fact, that lifecycle is how Meltano got its name: it's an
acronym that stands for model, extract, load, transform, analyze, notebook,
orchestrate.)
In June of this year, Meltano spun out of Gitlab with its own seed funding
and a singularly focused goal to make the power of data integration available to all by making Meltano
the foundation of every team's ideal data stack.
So, what did the quickly growing Meltano community-now
more than 1900 people in Slack!-see when they looked
into the crystal ball? Their collective wisdom can be shared in the context of
these three predictive themes:
1. Advancements in data tooling will play a key role in
democratizing the use of data and redefining roles within enterprises.
As solutions like
Meltano advance and fulfill the mission of "enabling everyone to realize the
full potential of their data," we'll see once siloed data science knowledge and
skills more broadly permeate the organization. As one community member put it,
"Everything will continue to compress in a good way. Technical people will
understand more of the business context, and business folks will be able to do
more technical things." In addition, people will be able to redirect energy and
intellect from the mundane to the innovative and impactful. For example, the
focus on data assets will increase, because less time will be needed to manage
pipelines. Similarly, data engineering will become more important, but the body
of work will change. And DataOps evangelists will join the ranks of DevOps
evangelists as DevOps principles are used to transform DataOps processes.
2. Tomorrow's "Modern Data Stack" won't look like
today's.
The Meltano community
offered a variety of ways they expect the data stack to evolve:
- Defragmentation of the data stack
(bundling) will continue to grow
- Collaborative data stack
- Focus on developing a data
management foundation/ data stack overhauls /data governance platforms
- The "Modern Data Stack"
will (continue to) spill into the broader backend of SaaS. Businesses will use
it internally to ingest, model and aggregate data for display to their end
customers.
- Backend platform services will
also spill over into the modern data stack. Particularly, when it comes to more
"exotic" datastores and services like Clickhouse, where
platform/backend teams might already have adopted them. So wider adoption
within a company by a data team might encounter a lot less friction, and doing
so could yield big savings or access to more detailed data.
Meltano Founder and CEO Douwe Maan
agrees that the data stack will not only become broader in scope and
capabilities, but also be built upon a unified, flexible platform that
incorporates DataOps and DevOps best practices. He writes:
...while we
talk about the ‘modern data stack' as if it has a clear definition, every
team's actual ideal data stack will look different based on the tools they've
chosen and exactly how they're all hooked up. With all the competition and
rapid iteration, data teams have gained amazing abilities, and it's become
clear that no one-size-fits-all tool will be able to compete with the
pick-and-choose approach that allows teams to use the best tool for the job at
every stage and finely tune their stack to their own unique needs. . . . The
many tools and choices have also made it a daunting task to set up a new data
stack from scratch, integrate the various components, and manage configuration
and deployment, especially for small teams and people new to data. DataOps and
other end-to-end functionality like observability, governance, and lineage
cannot effectively be implemented separately in each individual tool that makes
up the stack. Nor is it realistic that they can be achieved through a
one-size-fits-all platform that supplants everything. It is time for something
new: something complementary that you can add to your stack to enable new
functionality and fill in the gaps between components.
Already today Meltano
is a great open source ELT solution, offering the ability to easily extract
data from any SaaS tool or database and load it into any data warehouse or file
format. Going forward, we will build an open source DataOps OS by incrementally
adding plugin support for all of the tools in the modern data stack that are
compatible with DataOps best practices.
3. Sources and applications of data will continue to
multiply and be more distributed, escalating the challenge of data
integration.
The Meltano community
offered several examples of how data and data sources will continue to
proliferate.
- I think streaming setups will
continue to become more common, so ways to simply
integrate and ingest high volume or low latency event streams in data tooling
will hopefully be a side effect. I'm not sure what the implications are for big
providers like Snowflake because they tend to be cost-prohibitive - so I think
you'll see things start to segment some. Where folks have large data warehouses
but also have dedicated analytics/event stores.
- People will want to leverage data
sitting in their data warehouses even more. "Reverse ELT" will start
commoditizing, though not as extensively as EL.
- SQL Warehouse providers will
continue trying to attract the Data Scientist persona with unstructured data
and more statistical capabilities built into their SQL dialects.
In the Gartner Hype
Cycle for Data Management 2021, the authors write, "Today, data is even more
distributed than ever in multicloud, intercloud, and hybrid cloud
architectures. As a result, the technologies that support these ecosystems and
their integration must evolve. Furthermore, there is a need for new solutions
that address current data management issues in innovative and unprecedented
ways."
The Meltano community
is embracing this challenge. To learn more about our strategy for the future of
data, read this blog by Taylor Murphy, Head of Product
& Data. Stay tuned to see the innovative and unprecedented ways we enable
everyone to realize the full potential of their data.
##
ABOUT THE AUTHOR
From humble beginnings as a PHP4 web
developer in school, Amanda Folson now works as a Developer Relations Manager
at Meltano where she gets to share her passion for technology with others.
Outside of tech, you'll find her playing guitar, riding a horse, or spending
far too much time on TikTok.