Virtualization Technology News and Information
Article
RSS
Datafold 2022 Predictions: Ever-Expanding Tooling and Automation for Data Quality and Reliability

vmblog predictions 2022 

Industry executives and experts share their predictions for 2022.  Read them in this 14th annual VMblog.com series exclusive.

Ever-Expanding Tooling and Automation for Data Quality and Reliability

By Gleb Mezhanskiy, Founder and CEO, Datafold

2021 saw the modern data stack go mainstream. Companies of all sizes recognized the need for a modular approach to data, adopting the best tool or platform for a given stage in the data flow. With data warehouses and lakes making storage less of a problem, most data teams shifted focus to building the best data stack for their unique use cases. This primarily focused on data warehouses, ETL (or ELT) tools, and BI platforms. Now, with data scale and usage expanding even more, there's renewed focus on governance, reliability and quality of the data.

Based on the trends and movements set in motion throughout 2021, these are my predictions for 2022.

Hiring and Efficiency

The labor market shifted in 2021, making it hard to hire and retain data practitioners. As companies amplify their focus on culture and the impact of data on the rest of the business, these roles are integral to business success. Hiring and retention challenges will continue in 2022, expanding the competition in the market for qualified data experts. Indie job boards, like dbt and Locally Optimistic slack groups, will become integral to finding and recruiting top talent.

As a result, I anticipate that we will see an increase in the average salary for data scientists, analysts and engineers; job openings have displayed a trend toward including meaningful salary ranges, which is also promising and in line with augmented compensation for data practitioners. This could also price some early-stage companies out of the market for highly skilled data practitioners.

The lack of readily available employees to fill these roles will force data teams to be even more efficient and productive with limited resources. To stay agile, businesses will continue to move away from legacy systems and embrace the modern data stack for improved output from smaller-than-desired data teams.

Manual Work as a Stumbling Block

As reported in 2021, manual work related to finding, profiling, testing and monitoring data for correctness will continue to be the biggest frustration for data teams. Many routine tasks such as testing the changes to ETL code or tracing data dependencies can take days without proper automation.

Numerous data analysts report spending hours or even days doing manual work. Not only is manual work highly error-prone and potentially inaccurate, but it can also reduce team morale and overall productivity and performance. When data teams are bogged down with manual work, they are less likely to have time for meaningful contributions to insights and data-driven decisions.

Data Quality KPIs With Tools to Match

Heading into 2022, I predict that KPIs/OKRs focused on data quality and observability will continue to increase, with tools expanding to match. Right now, data teams are recognizing that data quality and reliability KPIs are vital to ensure that businesses can move fast with trustworthy data. However, much of that data quality is based on manual processes and procedures that bog down teams and reduce output.

The data quality and observability market has expanded at a blazing fast pace. Numerous startups have emerged in the past few years, offering a range of tools. As a founder in this space, I've had numerous conversations with analysts and investors who all see this gap in the market that is being filled by a wide variety of solutions based on each company's understanding of the data practitioner's workflow. In many ways, it appears that data engineering will follow a similar path paved by software engineering, with expanded tools and processes to automate as much manual work as possible.

The Rise of Data Quality and Reliability Engineering

In the same way that software engineering has led to software reliability engineering, the growth in data for every organization will lead to data quality and reliability engineering. Depending on the size of the organization and the maturity of its data practices, data reliability engineering can be a dedicated role or embedded in the workflow of analytics engineers - a similar pattern to the DevOps practice in software engineering. This new model for improved data quality and reliability through dedicated practices and automation through tooling will allow data teams to work lean and focus on more value-add activities.

As more data practitioners demand these tools and automation, we will see platforms and solutions provide improved data observability, quality and reliability across every stage in the data pipeline. My hope is that data teams become more proactive in their approaches in 2022, adopting the "shift left" mentality of stopping problems as early in the process as possible to avoid major fallouts from data incidents down the line.

To summarize, data practitioners will continue to be hard to hire in 2022, leading to an increase in average salaries as well as companies demanding more from leaner teams. Those lean data teams will continue to struggle with manual work as their biggest constraint, especially regarding data quality, which will lead to greater adoption of tools and automation to streamline data quality and reliability - an increasingly important KPI. 2022 will be the year of stopping data issues before they start, as tooling solutions and platforms help data teams "shift left" for improved data quality and trust.

##

ABOUT THE AUTHOR

Gleb Mezhanskiy 

Gleb Mezhanskiy is founding CEO of Datafold, a data reliability platform that helps data teams deliver reliable data products faster. He has led data science and product areas at companies of all stages. As a founding member of data teams at Lyft and Autodesk and head of product at Phantom Auto, Gleb built some of the world's largest and most sophisticated data platforms, including essential tools for data discovery, ETL development, forecasting, and anomaly detection.

Published Friday, December 31, 2021 7:35 AM by David Marshall
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<December 2021>
SuMoTuWeThFrSa
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678