Industry executives and experts share their predictions for 2022. Read them in this 14th annual VMblog.com series exclusive.
Ever-Expanding Tooling and Automation for Data Quality and Reliability
By Gleb Mezhanskiy, Founder and CEO,
Datafold
2021 saw the modern data stack go mainstream.
Companies of all sizes recognized the need for a modular approach to data,
adopting the best tool or platform for a given stage in the data flow. With
data warehouses and lakes making storage less of a problem, most data teams
shifted focus to building the best data stack for their unique use cases. This
primarily focused on data warehouses, ETL (or ELT) tools, and BI platforms.
Now, with data scale and usage expanding even more, there's renewed focus on
governance, reliability and quality of the data.
Based on the trends and movements set in
motion throughout 2021, these are my predictions for 2022.
Hiring and Efficiency
The labor
market shifted in 2021, making it hard to hire and retain data
practitioners. As companies amplify their focus on culture and the impact of
data on the rest of the business, these roles are integral to business success.
Hiring and retention challenges will continue in 2022, expanding the
competition in the market for qualified data experts. Indie job boards, like dbt and Locally Optimistic slack
groups, will become integral to finding and recruiting top talent.
As a result, I anticipate that we will see an
increase in the average salary
for data scientists, analysts and engineers; job openings have displayed a
trend toward including meaningful salary ranges, which is also promising and in
line with augmented compensation for data practitioners. This could also price
some early-stage companies out of the market for highly skilled data practitioners.
The lack of readily available employees to
fill these roles will force data teams to be even more efficient and productive
with limited resources. To stay agile, businesses will continue to move away
from legacy systems and embrace the modern data stack for improved output from
smaller-than-desired data teams.
Manual Work as a Stumbling Block
As reported
in 2021, manual work related to finding, profiling, testing and monitoring
data for correctness will continue to be the biggest frustration for data
teams. Many routine tasks such as testing the changes to ETL code or tracing
data dependencies can take days without proper automation.
Numerous data analysts report spending hours
or even days doing manual work. Not only is manual work highly error-prone and
potentially inaccurate, but it can also reduce team morale and overall
productivity and performance. When data teams are bogged down with manual work,
they are less likely to have time for meaningful contributions to insights and
data-driven decisions.
Data Quality KPIs With Tools to Match
Heading into 2022, I predict that KPIs/OKRs
focused on data quality and observability will continue
to increase, with tools
expanding to match. Right now, data teams are recognizing that data quality and
reliability KPIs are vital to ensure that businesses can move fast with
trustworthy data. However, much of that data quality is based on manual
processes and procedures that bog down teams and reduce output.
The data quality and observability market has
expanded at a blazing fast pace. Numerous startups have emerged in the past few
years, offering a range of tools. As a founder in this space, I've had numerous
conversations with analysts and investors who all see this gap in the market
that is being filled by a wide variety of solutions based on each company's
understanding of the data practitioner's workflow. In many ways, it appears
that data engineering will follow a similar path paved by software engineering,
with expanded tools and processes to automate as much manual work as possible.
The Rise of Data Quality and Reliability Engineering
In the same way that software engineering has
led to software reliability engineering, the growth in data for every
organization will lead to data quality and reliability engineering. Depending
on the size of the organization and the maturity of its data practices, data
reliability engineering can be a dedicated role or embedded in the workflow of
analytics engineers - a similar pattern to the DevOps practice in software
engineering. This new model for improved data quality and reliability through
dedicated practices and automation through tooling will allow data teams to
work lean and focus on more value-add activities.
As more data practitioners demand these tools
and automation, we will see platforms and solutions provide improved data
observability, quality and reliability across every stage in the data pipeline.
My hope is that data teams become more proactive in their approaches in 2022,
adopting the "shift
left" mentality of stopping problems as early in the process as possible to
avoid major fallouts from data incidents down the line.
To summarize, data practitioners will continue
to be hard to hire in 2022, leading to an increase in average salaries as well
as companies demanding more from leaner teams. Those lean data teams will
continue to struggle with manual work as their biggest constraint, especially
regarding data quality, which will lead to greater adoption of tools and
automation to streamline data quality and reliability - an increasingly
important KPI. 2022 will be the year of stopping data issues before they start,
as tooling solutions and platforms help data teams "shift left" for improved
data quality and trust.
##
ABOUT THE AUTHOR
Gleb Mezhanskiy is founding CEO of Datafold, a data reliability platform that helps data
teams deliver reliable data products faster. He has led data science and
product areas at companies of all stages. As a founding member of data teams at
Lyft and Autodesk and head of product at Phantom Auto, Gleb built some of the
world's largest and most sophisticated data platforms, including essential
tools for data discovery, ETL development, forecasting, and anomaly detection.