Virtualization Technology News and Information
Article
RSS
ArangoDB 2020 Predictions: Graph++ Will Take Center Stage

VMblog Predictions 2020 

Industry executives and experts share their predictions for 2020.  Read them in this 12th annual VMblog.com series exclusive.

By Jörg Schad, Head of Engineering and Machine Learning, ArangoDB

Graph++ Will Take Center Stage

In 2019, we saw a huge increase in the use of graph databases for processing and storage and anticipate strong growth in graph use-cases and build-up of its overall ecosystem in 2020.

One specific area where this will become evident is the intersection of graph-related developments and machine-learning.

Following are a few of the specific use-cases:

  • Knowledge Graphs: Knowledge graphs have been a powerful tool to represent knowledge by relationships between different entities. Combined with machine learning, we learn/extract new knowledge from knowledge graph (and for example grow the knowledge graph itself).
  • Graph Neural Networks: The new stars in machine-deep neural networks-expect basically vectors as input, while graphs are expressed as nodes and vertices. Lots of current research and industry use cases trend similar to how we developed neural networks for dealing with graphs, natural language, and voice.
  • AI-based DB and Multi-Model DBs: Even while we typically associate machine learning with frameworks such as TensorFlow, PyTorch or MxNet, data scientists dedicate most of their time to prepping data. A database, especially a graph or multimodel database supporting graph queries, document queries, and text retrieval can be a very powerful tool here.
  • Metadata and Production Grade ML Infrastructure: Machine learning is moving more and more into production scenarios, where metadata is equally important as good training data. As such, metadata representing a multi-stage machine learning pipeline can be naturally modeled as a graph connecting different documents of metadata.

With new legislatures such as GDPR and CCPA (coming into effect on Jan 1, 2020), the need for enforcing and auditing of data privacy will increase in 2020. Since graphs are an optimum way to meet both of those requirements, their data privacy use will grow next year.

In addition, while the graph use-cases and graph DB deployments grows which will cause the operational complexity to increase, this will continue to create a trend towards simplifying distributed scalable deployments and reducing the operational knowledge and effort required. Other trends that we believe will emerge include:

  • Kubernetes will become a default abstraction layer distributed Infrastructure. Started out as a container orchestration system, Kubernetes has grown into a default infrastructure abstraction also for distributed databases. Having everything on one platform across on-prem and different cloud vendors, together with the operational knowledge codified into Kubernetes operators, Kubernetes greatly simplifies operations.
  • Managed service will help simplify operations. Managed service (which, in turn, might be based on Kubernetes) simplifies operations more. While we see the topic of serverless managed database (where a user is not requesting n database instances but functional requirements such as n gigabyte and y updated per second) as interesting, we feel it is early and we will see more production ready systems in 2021.
Multi-model databases adoption will continue to take off in 2020: While graph database are very powerful in expressing relations, the graph data model is not effective to represent  unstructured entities such as person-related data which is more suited for a document data model. With the growing number of Graph++ use-cases in 2020, we will also see trends in augmenting the pure graph data model with other data models such as document or key-value.

##

About the Author

Jörg Schad 

Jörg Schad is head of engineering and machine learning at ArangoDB. Previously, he built machine learning pipelines in healthcare and distributed systems at Mesosphere, and in-memory databases. He holds a Ph.D. from the Saarland University in research around distributed databases and distributed data analytics.
Published Wednesday, January 22, 2020 7:19 AM by David Marshall
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
top25
Calendar
<January 2020>
SuMoTuWeThFrSa
2930311234
567891011
12131415161718
19202122232425
2627282930311
2345678