Industry executives and experts share their predictions for 2018. Read them in this 10th annual VMblog.com series exclusive.
Contributed by Mathew Lodge, SVP of Products and Marketing, Anaconda
The Beginning of the End of Data Gravity
I'm expecting that data science and machine learning applications become a big focus of cloud-native application architectures using technologies like Docker and Kubernetes. Three trends are driving that.
Firstly, big data architectures like Hadoop and Spark solve the distributed database problem well, but have as an article of faith that moving compute closer to data is important for performance. They also assume your code is written in Java or another JVM-based language like Scala.
The big problem? Data Science, predictive analytics and ML don't happen in JVM-based languages. They happen in Python, R and to a lesser extent C/C++.
Secondly, today's data center networks have 1000x the bandwidth at a lower total cost versus 2005, when Hadoop was first conceived, meaning that data locality doesn't matter so much.
Lastly, all the major players like AWS, Microsoft, Google, IBM, Red Hat and Docker are lined up behind Kubernetes. Containers and Kubernetes make great language-agnostic distributed computing clusters: it's just as easy to deploy Python as it is Java.
Put all three pieces together and the days of deploying Java code to Hadoop and Spark data lakes for data science and ML are numbered. The cloud-native vendors have realized you can deploy your Python and R data science apps on a Kubernetes-managed container cluster and just access the data lake over your modern network. It's a great opportunity for the cloud computing vendors.
##
About the Author
Mathew has well over 20 years' diverse experience in cloud computing and product leadership. Prior to joining Anaconda, he served as Chief Operating Officer at Weaveworks, the container and microservices networking and management start-up; and previously as Vice President in VMware's Cloud Services group. At VMware he was co-founder of what became its vCloud Air IaaS service.
Early in his career, Mathew built compilers and distributed systems for projects like the International Space Station, helped connect six countries to the Internet for the first time, and managed a $630m router product line at Cisco. At start-up CPlane he attempted to do SDN 10 years too early. Prior to VMware, Mathew was Senior Director at Symantec in its $1Bn+ information management group.