Big Data has never been more central to our lives. And consider the advanced analytics technology that enables value to be extracted from this resource and results achieved in some challenging areas – such as research into COVID-19. So where is analytics heading next and what kind of solutions will enable this? To find out more, VMblog spoke with Manuvir Das, head of Enterprise Computing at
NVIDIA.
VMblog: What is happening with the ‘datasphere’, and where is data going?
Manuvir Das: It's
no secret that data is growing exponentially; it has been for decades and
is expected to continue to accelerate in coming years. Autonomous vehicles will hit the roads,
and more robots and devices are being deployed in hospitals and factories.
Companies are re-architecting their businesses around AI, and the arrival
of fast, low-latency 5G networks that will create countless new data
centers close to the edge.
Given the billions of dollars invested in all
of these systems, it's imperative for enterprises to gain faster insights from
their data. Accelerated computing platforms play a key role in creating the
composable data center infrastructure needed to deliver the speed, security and
efficiency that will be essential in the next decade of computing.
This shift is already having an important
impact. Consider drug discovery, which has been critical
to responding to the COVID-19 crisis. To help save lives and economies around
the world, the healthcare industry has sped up processes, creating solutions in
record time. It's taken less than a year to bring vaccines and treatments to
patients - processes that often had taken a decade or more in the past.
As businesses deploy more data-driven
services, it will become critical for companies to add compute capabilities at
the edge, close to where the data is created.
This not only will create more data growth, but will also be critical
for intelligent retail services like
checkout-free shopping, or
managing fleets of
robotaxis. We'll see shifts in IT
infrastructure to support these new opportunities, with technologies like DPUs
being added to speed applications and boost security.
VMblog: What
has changed with data science tools that can make an impact on
businesses/research now?
Das: Data science jobs have been at the top of
Linkedin's careers rankings for a few years now, inspiring more and more people
to learn the skills needed to meet the demand for this work. All of this
interest has created a boom for data science, yet enterprises are still only
just beginning to operationalize their data science and AI pipelines.
NVIDIA has been investing in helping data
scientists work more efficiently with GPU-accelerated software for the
end-to-end data science and AI pipeline. This year, we helped bring GPU acceleration to Apache Spark 3.0,
the world's leading data analytics platform. We're working with Cloudera to
accelerate the Cloudera Data Platform with
the RAPIDS suite of GPU-accelerated data science software libraries. BlazingSQL
is also using RAPIDS on their high-performance SQL engine to help SQL experts
analyze their databases at lightning speed.
Additionally, we're collaborating with VMware to enable businesses to run both modern AI
workloads and existing applications on their infrastructure through NVIDIA NGC software. We're teaming
up to bring pre-trained
models, helm charts and other AI software available on the NGC catalog into
VMware vSphere, VMware Cloud Foundation and VMware Tanzu.
Turning
data into insight through data science, and then putting that insight into
production in an AI model is a journey with many steps. By intelligently
accelerating these steps with our partners, we'll be able to help enterprises
put their data to work to transform their industries.
VMblog: Focusing on the role of supercomputers and Big Data, what is the
significance of NVIDIA's RAPIDS open source APIs?
Das: RAPIDS GPU-accelerated software speeds up the
data science pipeline, bringing accelerated computing to the most popular data
science libraries and frameworks. This means that companies can not only gain
faster insights from their data, but also boost their efficiency and speed
time-to-solution.
With more than 70% of data work estimated to
be spent on ETL, the faster data can be prepared, the faster customers can gain
insights. This is why we developed the RAPIDS Accelerator for Apache
Spark, which combines the power of the RAPIDS cuDF
library and the scale of the Spark distributed computing framework.
The openness of RAPIDS also speeds innovation
at HPC scale. For example, the Oak Ridge National Laboratory is
using BlazingSQL with RAPIDS to accelerate drug discovery research using NVIDIA
GPUs on Summit, America's fastest supercomputer. ORNL researchers found that
the GPU-accelerated
open-source platform helped them process enormous datasets extremely quickly
with scalable SQL queries.
It's also
important to note the breakthrough efficiency that NVIDIA products are bringing
to supercomputers. An NVIDIA DGX A100 SuperPOD won the top spot in the recent Green500 rankings, and in fact, four out of the
five most efficient supercomputing systems on the Green500 list used NVIDIA
technology. As data continues to grow, so will computing, which is why it is so
essential for our supercomputers to be as efficient as possible.
VMblog: Considering Big Data, AI and edge processing, can you give some
real-world examples?
Das: GPU-accelerated big data, AI and edge
computing are helping businesses across every industry capture more market
share and create new opportunities.
In marketing and professional services, Adobe was among the first to put big data to work
with GPU-accelerated Spark 3.0 on Adobe Experience Cloud. The Adobe team gained
more than 7x faster performance with NVIDIA-accelerated Spark 3.0 compared to
running Spark on CPUs, while also helping to save more than 90% on computation
costs.
Financial leader Capital One is integrating AI and
machine learning into its customer-facing applications such as fraud monitoring
and detection, call center operations and customer experience. From operations
to services, AI is helping Capital One transform the business of finance.
At the edge, global supply chain solutions
provider KION Group is using the NVIDIA EGX
AI platform to develop AI applications for its intelligent warehouse systems,
increasing the throughput and efficiency in its more than 6,000 retail
distribution centers.
VMblog: For the future, are the many aspects of IoT points an important
part of the big data movement?
Das: In his
GTC Fall 2020 keynote,
NVIDIA CEO Jensen Huang explained how the internet of human beings is rapidly
becoming the Internet of Things. Trillions of devices, sensors and robots will
be AI-enabled to offer a broad range of services.
Soon, the kind of computing that used to only
be possible in a data center will be available everywhere -- in stores, on
telephone poles, even in parking lots. As Jensen noted, the missing piece of
IoT is AI. Together, edge computing and powerful AI software are essential to
pioneering new breakthroughs.
In healthcare, AI IoT will be able to help
patients ask questions about their care. They will help factory workers stay
safe, and help us check out faster in stores. On a micro scale, the AI IoT will
deliver our lunches, and at the macro level, it will boost efficiency in
transportation with fleets of autonomous vehicles.
While these innovations might sound futuristic, companies are already
pioneering these new products and services, working with NVIDIA software and
platforms to bring their ideas to life. We expect AI IoT products and services
to accelerate as projects come to market and inspire even more companies to
leverage AI to better serve their customers.
##