
Industry executives and experts share their predictions for 2019. Read them in this 11th annual VMblog.com series exclusive.
Contributed by Alex Gorelik, founder and CTO, Waterline Data
2019 Predictions for Big Data in the Enterprise
If 2018 was about anything, it was
about preparing big data for its next big phase. After several years of being
stalled, stymied and even charged with being one big bust, big data is about to
live up to its hype. Many of the CDOs and other data professionals I've spoken
with in recent months agree we're on the cusp of something truly "big". These
are the same organizations that have continued to make big investments in data,
learning from their mistakes, applying those lessons, and adopting the
technologies that are allowing them to blow past many of their remaining
obstacles. It's this new optimism, investment and innovation that's bringing
about the actual execution everyone's been expecting for some time now, and
it's the same trifecta of opportunity that's behind my top five big data
predictions for 2019:
#5: Now Proven, AI and ML Will
Dig Deeper into Enterprise
In 2018, we watched as the time,
cost, and labor-intensive manual processes that have been holding up the big
data initiatives within organizations began to melt away. Automation, AI and
ML-proven now not just in terms of speed but also accuracy-is now being applied
to more and more business functions. This fits into a general trend of moving
away from hard-coding business process and operations into software--and
adjusting people and physical operations to match the predefined and rigid
business processes--and toward dynamically adapting business processes and
operations to the physical realities and historical learnings. For example,
universities are measuring historical admission and acceptance trends to
determine who is likely to accept admission and how much would scholarships
affect their decision. Alternative credit risk analysis is being performed to
determine creditworthiness of first-time or low income borrowers. Customer
churn predictions are being gleaned from sentiment analysis of social media.
Key to all these applications is the ability to create good stable models and
the key to building good stable models is being able to find the right data and
create the right features. In 2019, AI and ML will play a big role in finding
and understanding the data needed to build those models.
#4: Say Hello to Hybrid
Environments
Last year, I predicted broad
adoption of the cloud will finally force object stores to be hardened and
properly governed, and that the new standards would require data governance
that's cloud, location and platform agnostic. In 2019, you will see more organizations
that are now comfortable with the cloud rowing a hybrid, heterogeneous data
estate that includes multiple fit-for-purpose big data, relational and NoSQL
data stores both on-premise and in the cloud. With a hybrid model in place,
applications that work best on the public cloud can reside there. Those that
need to remain on-premises can do so. While this seems like it would create
greater complexity, in 2019, you will see more and more solutions that abstract
this complexity through location and compute transparency. From file systems
like MapR's data fabric that create a single name space to AIOps, which
addresses complexity in virtual data centers, end users will be increasingly
shielded from the complexity of hybrid architectures while getting full
benefits of fit-for-purpose, elastic solutions that it offers.
#3: It's the Data Lake's Great
Return
While organizations have been
traditionally focused on the mechanics of creating and hydrating the data
lakes, but frequently creating data swamps instead, 2019 will see a renewed
focus on data lake adoption. This is very similar to what we experienced with
data warehousing where the initial generation of data warehouses were often
misguided and lacked adoption, but they taught the organizations what was
really required to create value and achieve broad adoption. I believe we are at
the same stage with data lakes and in 2019 the focus will turn from the
mechanics of the data lake to making the data in the lakes findable, usable and
governed at scale and in automated manner, powered by the new spate of
AI-driven data catalogs and governance solutions. Even new data lakes will get
rolled out in a much more deliberate manner with clear initial use cases and
usage and governance policies. We will also see more data lakes being built or
migrated to the cloud to take advantage of managed infrastructure, elastic
storage and compute and rich ecosystems as more organizations begin adopting
Virtual Data Lakes that span multiple systems.
#2 Big Data Becomes Little Data
No, organizations won't be dumping
all the stockpiles of their data, but well, they will in limited scope. With
greater visibility into the data they have will come opportunities to
rationalize and consolidate for significant savings in storage costs and even
more accurate analytics now that organizations know which data is corrupted and
can be jettisoned. But "becoming little" also speaks to large volumes of data
that used to choke the organization now becoming manageable enough to put to
use, thanks to the automation of key processes like cataloging.
#1: Explainability Will Emerge
as Key AI Requirement
As more and more business
(and government) is run using AI and ML algorithms, there will be more focus on
transparency and explainability. Why was a mortgage denied? Can a bank prove
that none of the illegal demographics (like race, gender and so forth) were
used to make the decision or train the model that made the decision. Finding
the appropriate data sets and documenting their lineage and quality is the
first step to such transparency and explainability. If we do not know where
data came from or what it means, we will not be able to explain the model or
insure it's proper and legal operations.
##
About the Author
Alex
Gorelik, Founder
and Chief Technical Officer
Waterline
is Alex's third startup. Prior to Waterline Data, Alex served as General
Manager of Informatica's Data Quality Business Unit, driving Marketing, Product
Management and R&D for an $80M business. Alex joined Informatica from IBM,
where he was an IBM Distinguished Engineer for the Information Integration
team. IBM acquired Alex's second startup, Exeros, where he was founder, CTO and
VP of Engineering.
Previously,
Alex was co-founder, CTO and VP of Engineering at Acta Technology which was
subsequently acquired by Business Objects. Prior to founding Acta, Alex managed
development of Replication Server at Sybase and worked on Sybase's strategy for
enterprise application integration (EAI).
Alex holds a B.S. in Computer Science from
Columbia University School of Engineering and a M.S. in Computer Science from
Stanford University.