Airbyte made available results of the biggest data engineering survey in
the market which provides
insights into the latest trends, tools, and practices in data engineering - especially adoption of tools in the modern data stack.
Its first worldwide State of Data survey
displays results in an interactive format so that anyone can drill
further into the information using filters to see, for example, adoption
patterns by organization size. There were 886 respondents in the survey
- the largest related to data engineering - that was fairly evenly
distributed by geography (North America, Europe, and Asia), as well as
company size, and years of experience working. The primary job title was
data engineer at 38%, another 20% in management positions, and 11%
software engineers. Analytics engineer, data analyst, and data scientist
were around 5% apiece.
"In the past year, the data ecosystem has been evolving rapidly, so this
research of the user community is a way to see the signal through the
noise in the modern data stack," said John Lafleur, co-founder and chief
operating officer, Airbyte. "New options are introduced every month, so
this research is a way for us to take a step back and understand what
the community is using and feeling excited about."
Noteworthy findings include the following.
-
Nearly half the respondents were looking to hire for their data teams
with consistent results across different worldwide geographic regions.
-
In terms of compensation, larger companies correlate with more pay, and North America has the highest salaries.
-
For the Data Ingestion category of the modern data stack, clear leaders
are Airbyte and Fivetran. Airbyte shows double the number of people that
want to try it. In terms of company size, Airbyte is strong in the
small/medium-sized segment with less adoption in the mid-size market
(500-1,000 employees). However, the enterprise segment (1,000+
employees) shows a propensity for enterprises to adopt an open-source
self-hosted platform. (Airbyte Open Source being the dominant solution
there.)
-
For Data Transformation, most used is Pandas while dbt shows the most
"want to try" among respondents. This is even more noticeable in the
larger organization segments where both Spark and Pandas are more used
than dbt. However, dbt shows the most "want to try" among those users.
-
The most used data warehouses are Snowflake and Google BigQuery, then
AWS Redshift and Databricks with Azure Synapse lagging behind. In the
larger organization segments, Databricks popularity is near on par with
Snowflake and BigQuery.
-
For Data Orchestration, most people are still using self-hosted Airflow,
especially in the enterprise segment, but Dagster and Prefect show lots
of interest. Most people are still using self-hosted Airflow, which may
again (like in Data Ingestion) indicate a preference for self-hosted
deployments for larger organizations. It should be noted that Dagster is
definitely coming up the ranks with the highest number of ‘want to
try'.
-
For Business Intelligence, the leaders are Looker and Tableau, but newer
technologies are close behind and show lots of interest.
-
For Data Quality, leaders are Great Expectations and Monte Carlo and a lack of awareness among other alternatives.
-
For Reverse ETL, it was essentially a tie between Hightouch and Census as leaders, and pretty much open after that.
-
For Data Catalogs, there were three companies leading the way in terms of popularity, DataHub, Alation, and Amundsen.
To view the full results of the survey, go to https://state-of-data.com.
Here is what some data engineering influencers said about the State of Data survey:
"The data engineering community stands out for its open-mindedness and
collaborative spirit. Every day, I'm impressed by how we've created a
culture of learning and sharing that transcends organizational
boundaries and geographical constraints." Ananth Packkildurai, editor,
Data Engineering Weekly
"Amazing Data Engineering survey! I highly recommend checking out the
insights into the adoption of engineering tools from Data Ingestion,
transformation to reverse ETL and Data Catalogs. That section was my
highlight. Congratulations to Airbyte for leading the Data Ingestion
section." Andreas Kretz, founder of Learn Data Engineering
"I am particularly happy to see the growth of Data Quality tools that
have evolved for good. This signals maturity is coming along. It's not a
shocker to me Airbyte still leading the way for the Data Ingestion
Layer." From Ravit Jain, founder & host of The Ravit Show