Virtualization Technology News and Information
Article
RSS
Meltano 2022 Predictions: The Meltano Open Source Community Weighs In on the Future of Data

vmblog predictions 2022 

Industry executives and experts share their predictions for 2022.  Read them in this 14th annual VMblog.com series exclusive.

The Meltano Open Source Community Weighs In on the Future of Data

By Amanda Folson, Meltano

What does 2022 hold for data-especially data integration and data tooling? I asked the people I know who are most passionate about helping organizations realize the full potential of their data-the Meltano community.

Meltano was birthed inside GitLab in 2018 as an open source tool built for GitLab's data and analytics team, who wanted an end-to-end data platform to address all their needs throughout the data science life cycle. (In fact, that lifecycle is how Meltano got its name: it's an acronym that stands for model, extract, load, transform, analyze, notebook, orchestrate.)

In June of this year, Meltano spun out of Gitlab with its own seed funding and a singularly focused goal to make the power of data integration available to all by making Meltano the foundation of every team's ideal data stack.

So, what did the quickly growing Meltano community-now more than 1900 people in Slack!-see when they looked into the crystal ball? Their collective wisdom can be shared in the context of these three predictive themes:

1.  Advancements in data tooling will play a key role in democratizing the use of data and redefining roles within enterprises.

As solutions like Meltano advance and fulfill the mission of "enabling everyone to realize the full potential of their data," we'll see once siloed data science knowledge and skills more broadly permeate the organization. As one community member put it, "Everything will continue to compress in a good way. Technical people will understand more of the business context, and business folks will be able to do more technical things." In addition, people will be able to redirect energy and intellect from the mundane to the innovative and impactful. For example, the focus on data assets will increase, because less time will be needed to manage pipelines. Similarly, data engineering will become more important, but the body of work will change. And DataOps evangelists will join the ranks of DevOps evangelists as DevOps principles are used to transform DataOps processes.

2.  Tomorrow's "Modern Data Stack" won't look like today's. 

The Meltano community offered a variety of ways they expect the data stack to evolve:

  • Defragmentation of the data stack (bundling) will continue to grow
  • Collaborative data stack
  • Focus on developing a data management foundation/ data stack overhauls /data governance platforms
  • The "Modern Data Stack" will (continue to) spill into the broader backend of SaaS. Businesses will use it internally to ingest, model and aggregate data for display to their end customers.
  • Backend platform services will also spill over into the modern data stack. Particularly, when it comes to more "exotic" datastores and services like Clickhouse, where platform/backend teams might already have adopted them. So wider adoption within a company by a data team might encounter a lot less friction, and doing so could yield big savings or access to more detailed data.

Meltano Founder and CEO Douwe Maan agrees that the data stack will not only become broader in scope and capabilities, but also be built upon a unified, flexible platform that incorporates DataOps and DevOps best practices. He writes

...while we talk about the ‘modern data stack' as if it has a clear definition, every team's actual ideal data stack will look different based on the tools they've chosen and exactly how they're all hooked up. With all the competition and rapid iteration, data teams have gained amazing abilities, and it's become clear that no one-size-fits-all tool will be able to compete with the pick-and-choose approach that allows teams to use the best tool for the job at every stage and finely tune their stack to their own unique needs. . . . The many tools and choices have also made it a daunting task to set up a new data stack from scratch, integrate the various components, and manage configuration and deployment, especially for small teams and people new to data. DataOps and other end-to-end functionality like observability, governance, and lineage cannot effectively be implemented separately in each individual tool that makes up the stack. Nor is it realistic that they can be achieved through a one-size-fits-all platform that supplants everything. It is time for something new: something complementary that you can add to your stack to enable new functionality and fill in the gaps between components.

Already today Meltano is a great open source ELT solution, offering the ability to easily extract data from any SaaS tool or database and load it into any data warehouse or file format. Going forward, we will build an open source DataOps OS by incrementally adding plugin support for all of the tools in the modern data stack that are compatible with DataOps best practices.

3.  Sources and applications of data will continue to multiply and be more distributed, escalating the challenge of data integration.

The Meltano community offered several examples of how data and data sources will continue to proliferate.

  • I think streaming setups will continue to become more common, so ways to simply integrate and ingest high volume or low latency event streams in data tooling will hopefully be a side effect. I'm not sure what the implications are for big providers like Snowflake because they tend to be cost-prohibitive - so I think you'll see things start to segment some. Where folks have large data warehouses but also have dedicated analytics/event stores.
  • People will want to leverage data sitting in their data warehouses even more. "Reverse ELT" will start commoditizing, though not as extensively as EL.
  • SQL Warehouse providers will continue trying to attract the Data Scientist persona with unstructured data and more statistical capabilities built into their SQL dialects.

In the Gartner Hype Cycle for Data Management 2021, the authors write, "Today, data is even more distributed than ever in multicloud, intercloud, and hybrid cloud architectures. As a result, the technologies that support these ecosystems and their integration must evolve. Furthermore, there is a need for new solutions that address current data management issues in innovative and unprecedented ways."

The Meltano community is embracing this challenge. To learn more about our strategy for the future of data, read this blog by Taylor Murphy, Head of Product & Data. Stay tuned to see the innovative and unprecedented ways we enable everyone to realize the full potential of their data.

##

ABOUT THE AUTHOR

Amanda Folson 

From humble beginnings as a PHP4 web developer in school, Amanda Folson now works as a Developer Relations Manager at Meltano where she gets to share her passion for technology with others. Outside of tech, you'll find her playing guitar, riding a horse, or spending far too much time on TikTok.

Published Friday, January 28, 2022 7:33 AM by David Marshall
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<January 2022>
SuMoTuWeThFrSa
2627282930311
2345678
9101112131415
16171819202122
23242526272829
303112345