Virtualization Technology News and Information
Trifacta 2019 Predictions: A Data Visionary Rings in the New Year

Industry executives and experts share their predictions for 2019.  Read them in this 11th annual series exclusive.

Contributed by Joe Hellerstein, Co-Founder & CSO at Trifacta

A Data Visionary Rings in the New Year

As we head into 2019, it's more true than ever that all eyes are on data as the differentiator for innovative companies. Your organization is either maximizing the potential of its data-or falling behind. It's also true that the data landscape is shifting rapidly as machine learning takes hold, serverless technologies gain popularity and the cloud becomes more and more of a foregone conclusion.

As we put 2018 behind us, below are six emerging trends in data science and engineering that will shape how data-driven companies operate in 2019 and into the future.

1. Machine learning potential will run into unpleasant realities without high-quality data.

Investors and the tech press are all abuzz about machine learning (ML), but those neural nets and regression models are only as good as the training data from which they learn. High-quality datasets-typically the bigger the better-yield more accurate models. As ML initiatives scale, we expect that the pain of cleansing and preparing high-quality data for ML models will become more apparent in 2019. Data preparation  is still widely regarded as the biggest bottleneck in any data project, which means that data scientists often spend more time preparing data than actually building and tuning machine learning systems. In order for ML to make an impact at scale, organizations will need to first accelerate their data preparation processes.

2. DataOps will be the new DevOps.

As organizations have shifted toward self-service, data analysts now have the right tools to wrangle and analyze their own data instead of endlessly iterating with IT. But after this shift occurs, then the question becomes, how do you make such operations scalable, efficient, and repeatable? Enter DataOps. As an adaptation of the software development methodology DevOps, DataOps refers to the tools, methodology and organizational structures that businesses must adopt to improve the velocity, quality, and reliability of analytics.  Data engineers fill the critical roles powering DataOps and, as these practices become commonplace, data engineers will become critical resources. In fact, 73% of organizations polled said they planned to invest in DataOps this year. In the same way that DevOps engineers are a highly sought-after role today, we predict that data engineers will be in the near future.

3. The cloud transition remains both unstoppable and partial.

The move toward cloud has been on for a while-and we saw proof of this in 2018- but the migration wasn't anywhere near as fast or as total as predicted. Transitioning to cloud platforms can require a costly, time-consuming architectural overall, which can't be done in one fell swoop. The journey toward cloud isn't nearly over for many organizations, and we predict that in 2019, we'll continue to see a lot more progress. Still, some data and processes will remain on-premises for many organizations for the foreseeable future, largely due to regulatory concerns. The final destination for many organizations will be a hybrid cloud approach.

4. Data lakes aren't going anywhere.

The merger between Hortonworks and Cloudera prompted a lot of chatter about an end to data lakes.  This prognosis will prove to be off-target, especially as the cloud migration continues. In fact, storage offerings like AWS S3 make data lakes easier to maintain and use. In 2019, we predict that the data lake will continue to be sound architectural strategy for many organizations.

5. Autoscaling serverless solutions will become increasingly common.

Data is everywhere, and even small businesses and individuals want to roll up their sleeves and wrangle datasets alongside the Fortune 500. One size doesn't fit all, however, which means serverless, pay-as-you-go solutions for DataOps will become a hot commodity for fledgling companies that are uninterested in setting up their own DataOps infrastructure right away. Larger companies will seek out technologies whose costs scale automatically, allowing for surges at usage peaks and lower maintenance-level fees during idle periods. Above all else, convenience and flexibility will be key selection factors, regardless of company size.

6. Self-service technologies without governance will hit their limits.

As self-service solutions grow and adoption no longer becomes the primary metric of success, organizations will increasingly question whether these solutions are efficient, scalable, and secure. Without governance in place, IT organizations in particular will feel increasing pressure as the number of technologies to maintain and processes to schedule multiply unchecked. Heightened DataOps practices will offer new guidance on self-service technologies, and we predict that in 2019, self-service products without governance will hit their limits.

There are undoubtedly surprises waiting for us in the year ahead, but these six trends should help you to build a framework for anticipating, interpreting, and even leveraging whatever data challenges come your way in 2019.


About the Author


Joe Hellerstein is Trifacta's Chief Strategy Officer, Co-founder and Jim Gray Chair of Computer Science at UC Berkeley. His career in research and industry has focused on data-centric systems and the way they drive computing. Fortune Magazine included him in their list of 50 smartest people in technology , and MIT's Technology Review magazine included his work on their TR10 list of the 10 technologies "most likely to change our world".

Published Thursday, January 24, 2019 7:31 AM by David Marshall
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<January 2019>