Virtualization Technology News and Information
Article
RSS
Waterline Data 2019 Predictions: Big Data in the Enterprise

Industry executives and experts share their predictions for 2019.  Read them in this 11th annual VMblog.com series exclusive.

Contributed by Alex Gorelik, founder and CTO, Waterline Data

2019 Predictions for Big Data in the Enterprise

If 2018 was about anything, it was about preparing big data for its next big phase. After several years of being stalled, stymied and even charged with being one big bust, big data is about to live up to its hype. Many of the CDOs and other data professionals I've spoken with in recent months agree we're on the cusp of something truly "big". These are the same organizations that have continued to make big investments in data, learning from their mistakes, applying those lessons, and adopting the technologies that are allowing them to blow past many of their remaining obstacles. It's this new optimism, investment and innovation that's bringing about the actual execution everyone's been expecting for some time now, and it's the same trifecta of opportunity that's behind my top five big data predictions for 2019:

#5: Now Proven, AI and ML Will Dig Deeper into Enterprise

In 2018, we watched as the time, cost, and labor-intensive manual processes that have been holding up the big data initiatives within organizations began to melt away. Automation, AI and ML-proven now not just in terms of speed but also accuracy-is now being applied to more and more business functions. This fits into a general trend of moving away from hard-coding business process and operations into software--and adjusting people and physical operations to match the predefined and rigid business processes--and toward dynamically adapting business processes and operations to the physical realities and historical learnings. For example, universities are measuring historical admission and acceptance trends to determine who is likely to accept admission and how much would scholarships affect their decision. Alternative credit risk analysis is being performed to determine creditworthiness of first-time or low income borrowers. Customer churn predictions are being gleaned from sentiment analysis of social media. Key to all these applications is the ability to create good stable models and the key to building good stable models is being able to find the right data and create the right features. In 2019, AI and ML will play a big role in finding and understanding the data needed to build those models.

#4: Say Hello to Hybrid Environments

Last year, I predicted broad adoption of the cloud will finally force object stores to be hardened and properly governed, and that the new standards would require data governance that's cloud, location and platform agnostic. In 2019, you will see more organizations that are now comfortable with the cloud rowing a hybrid, heterogeneous data estate that includes multiple fit-for-purpose big data, relational and NoSQL data stores both on-premise and in the cloud. With a hybrid model in place, applications that work best on the public cloud can reside there. Those that need to remain on-premises can do so. While this seems like it would create greater complexity, in 2019, you will see more and more solutions that abstract this complexity through location and compute transparency. From file systems like MapR's data fabric that create a single name space to AIOps, which addresses complexity in virtual data centers, end users will be increasingly shielded from the complexity of hybrid architectures while getting full benefits of fit-for-purpose, elastic solutions that it offers.  

#3: It's the Data Lake's Great Return

While organizations have been traditionally focused on the mechanics of creating and hydrating the data lakes, but frequently creating data swamps instead, 2019 will see a renewed focus on data lake adoption. This is very similar to what we experienced with data warehousing where the initial generation of data warehouses were often misguided and lacked adoption, but they taught the organizations what was really required to create value and achieve broad adoption. I believe we are at the same stage with data lakes and in 2019 the focus will turn from the mechanics of the data lake to making the data in the lakes findable, usable and governed at scale and in automated manner, powered by the new spate of AI-driven data catalogs and governance solutions. Even new data lakes will get rolled out in a much more deliberate manner with clear initial use cases and usage and governance policies. We will also see more data lakes being built or migrated to the cloud to take advantage of managed infrastructure, elastic storage and compute and rich ecosystems as more organizations begin adopting Virtual Data Lakes that span multiple systems.

#2 Big Data Becomes Little Data

No, organizations won't be dumping all the stockpiles of their data, but well, they will in limited scope. With greater visibility into the data they have will come opportunities to rationalize and consolidate for significant savings in storage costs and even more accurate analytics now that organizations know which data is corrupted and can be jettisoned. But "becoming little" also speaks to large volumes of data that used to choke the organization now becoming manageable enough to put to use, thanks to the automation of key processes like cataloging.

#1: Explainability Will Emerge as Key AI Requirement

As more and more business (and government) is run using AI and ML algorithms, there will be more focus on transparency and explainability. Why was a mortgage denied? Can a bank prove that none of the illegal demographics (like race, gender and so forth) were used to make the decision or train the model that made the decision. Finding the appropriate data sets and documenting their lineage and quality is the first step to such transparency and explainability. If we do not know where data came from or what it means, we will not be able to explain the model or insure it's proper and legal operations.

##

About the Author 

 

Alex Gorelik, Founder and Chief Technical Officer

Waterline is Alex's third startup. Prior to Waterline Data, Alex served as General Manager of Informatica's Data Quality Business Unit, driving Marketing, Product Management and R&D for an $80M business. Alex joined Informatica from IBM, where he was an IBM Distinguished Engineer for the Information Integration team. IBM acquired Alex's second startup, Exeros, where he was founder, CTO and VP of Engineering.

Previously, Alex was co-founder, CTO and VP of Engineering at Acta Technology which was subsequently acquired by Business Objects. Prior to founding Acta, Alex managed development of Replication Server at Sybase and worked on Sybase's strategy for enterprise application integration (EAI).

Alex holds a B.S. in Computer Science from Columbia University School of Engineering and a M.S. in Computer Science from Stanford University.
Published Tuesday, January 29, 2019 7:31 AM by David Marshall
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<January 2019>
SuMoTuWeThFrSa
303112345
6789101112
13141516171819
20212223242526
272829303112
3456789