Virtualization and Cloud executives share their predictions for 2016. Read them in this 8th Annual VMblog.com series exclusive.
Contributed by Matt Bencke, CEO, Spare5
Big Data or Big Problem? If You’re Not Google or Facebook, You’re F*cked
"Without
data it's very hard to be intelligent." - Duncan Anderson, CTO of IBM Watson Europe.
We are living in the land grab era of "Big Data." Whether
you're running an e-commerce business, online directory, healthcare provider,
or financial institution, you know that your ability to ingest, process,
synthesize, and act based on an ever-growing supply of metadata may very well
determine your success or failure. The problem is, unlike land, data is
effectively infinite. The average enterprise is seeing the data relevant to
them grow geometrically. Ironically, a disproportionate amount of the value
theoretically stored in these data are opaque to large-scale computing systems
- they're in photos, video, and audio, and they're often highly personal,
social, and context-specific.
So who are the "white hats" in this overwhelming struggle?
Cognitive computing, artificial / augmented intelligence, and machine learning
are riding in to the rescue. The good news is that AI technologies, approaches,
and services are improving significantly. The bad news is that the resulting
models' ability to reason, emulate, and predict human responses in ways that
actually help your business is limited by - you guessed it - the quality of the
human-derived training data.
In order to train, test, and tune any AI, companies need
human insights that are specific to their domain, and of good quality. For
example, a computer doesn't know on its own that an insurance claim is valid
(or even what an insurance claim is). "Quality" is a dangerously ambiguous
term. In this field, quality is really a ratio of confidence in ground truth to
the cost (a function of dollars, effort, and time). A data scientist needs to
know how many people with specific traits agree, thereby establishing "truth." The
more complex, subjective and unstructured the puzzle, the more difficult truth
is to determine. In theory, given infinite money, resources, and time, we can ask
dozens of qualified medical accountants whether they agree that a given medical
claim is valid and reimbursable at 95% for a given patient. But in practice,
most engineers need to solve millions, or even billions, of such questions daily
- economically and reliably.
Here's the thing: More than a billion people are regularly
providing exactly that kind of domain-specific training data for Google and
Facebook every day. We all teach them, with our searches, posts, comments,
tags, emails, and reactions. This gives Google and Facebook an almost
insurmountable advantage in a winner-take-all flywheel that is further
accelerated by their embarrassment of engineering riches. In short, if you're
not Google or Facebook, you're screwed. (Okay, there are other major players
who are similarly advantaged. Amazon has reams of data about our buying,
viewing, and even listening habits. And let's not count out Microsoft, nor
forget WeChat or Baidu. Still, the amount of quality data that Google and
Facebook are receiving daily is simply unparalleled.)
As we look to 2016, it's clear that companies need to remain
competitive with their data. Those that do, will win. Those that do not find a
solution, will be buried. There is a shortage of domain-specific insights of
quality and scale. This is where big data management comes into the picture.
But contrary to popular belief, machine learning is not the silver bullet. The
limiting factor for your success in understanding your data is the
domain-specific training data you need to create useful machine-based models.
So how can companies other than Google and Facebook make
sense of their data in 2016?
Break it down
It's easy to find yourself overwhelmed with big data
initiatives, not knowing where to start or what to look for. A wise approach is
to break it all into smaller, more manageable chunks. At Spare5, we break
complex problems down into digestible tasks in order to provide quality
insights at scale. You need to crawl before you can run, so start small to
understand how to analyze results and implement new strategies. An economy is
only as efficient as its currency is small and fluid. Spare5 is revolutionizing
the creation of high-quality, domain-specific training data by reducing the
currency of human insight to spare moments provided by targeted members of our
curated community. Speaking of community...
Utilize a community
You're not going to build a community to rival that of
Google and Facebook overnight. However, a targeted community willing to provide
insights can help solve complex business challenges. "Targeted community" is
the key here, though - a "crowd" isn't so helpful. There are a number of community-based
resources out there to consider, so spend time understanding what benefits each
provides and what makes the most sense for your business.
Train your machines.
Well.
Machine learning is only as smart as the quality of training
provided by humans. There needs to be a marriage of machine and humans to
provide the best results. Machines are (at best) limited by the quality of
human insights that are training them. Seek sources of high-quality,
domain-specific training data - it's critical to effective, truly beneficial
machine learning technologies.
Don't let your big data become a big problem in 2016. Get in
front of it and you'll be reaping the benefits by this time next year.
##