Virtualization Technology News and Information
VMblog's Expert Interviews: Imanis Data Talks Data Management, Machine Learning and More


Just this week, Imanis Data, a machine learning data management company, announced record customer adoption that was in large part driven by enterprises' need to protect and manage Hadoop and NoSQL data.  The company's momentum in 2018 has already included a major round of funding, an expansion of their executive team, and significant product enhancements.  To dig in deeper, VMblog spoke with Peter Smails, the company's chief marketing officer. 

VMblog:  To kick things off, can you start by telling readers about Imanis Data?

Peter Smails:  Imanis Data is a machine learning powered data management.  Our Imanis Data Management Platform enables customers to harness the full value of their data by delivering solutions that protect their data, as well as orchestrate and automate all their data management tasks. 

Unlike any other solution, from day one Imanis Data has focused on delivering machine learning enabled data management for distributed databases.  We are not saddled with legacy technology enabling us to begin with the end in mind and design the next generation of data management.  Imanis Data is a single solution for all Big Data and NoSQL data management.  We offer unparalleled backup and restore performance.  Our solution runs equally well on-premises or in the cloud, or both.  Finally, we have a multi-year head start on machine learning data management.

Imanis Data has been named a Gartner Cool Vendor as well as a CRN Emerging Vendor.  Imanis Data's customers include leading Fortune 500 businesses in the retail, financial services and technology industries, among others.  We are a private company backed by Canaan Partners, Intel, Onset Ventures, and Wipro Ventures.

VMblog:  What is the state of the data management market?

Smails:  You've got your traditional application world and on-prem typically already has MAS-based applications, that's the old world. In the new world, everybody's moving to microservices-based applications for IOT, for customer 360, for customer analysis, you get the picture. They're building these new modern applications. They're building those applications not in traditional RDMS, and they're building them on microservices-based architectures that are built on top of Hadoop, or built on SQL databases. Those applications, as they go mainstream and into production environments, require data management. They require backup and recovery. They require disaster recovery. They require archiving, etc. They require the whole plethora of data management capabilities. Nobody's addressing all the needs of that market. Enter Imanis Data.

VMblog:  What are the data management use cases that Imanis supports?

Smails:  Our vision is to become the de facto standard for enterprise data management for modern distributed databases by delivering solutions that enable organizations to:

  • Protect - all of their Big Data and NoSQL application data regardless of location to ensure they can recover from data loss and downtime caused by natural or man-made events.
  • Orchestrate - reduce cost and harness the power of hybrid cloud infrastructure by copying, moving, or migrating data to the appropriate location based upon use-case, e.g. test/dev (with masking and sampling), archiving, analytics, compliance, and DR.
  • Automate - enhance employee productivity by automating data management processes, and create business value by leveraging machine-learning to ensure compliance, reduce cost, enable insight, and protect against cyber-attacks.
Our platform is built upon a massively scalable distributed architecture. We scale in multiple ways. One is we're infinitely scalable just in terms of computational power. So we're built for big data by definition. Number two is we scale very well from a storage efficiency standpoint. So we can store very large volumes of data, which is a requirement. We also scale very much for the use case standpoint. So we support use cases throughout the data lifecycle. The one that gets all the attention is obviously backup and recovery, because you have to protect your data. But if you look at it from a life cycle standpoint, our number two use case is Test Dev - a lot of organizations want to spin up subsets of their data when they're building new apps, because they're supporting things like CI/CD.  We help customers automate the process and orchestrate the process of Test Dev, by supporting things like sampling and masking.  For example, I may have a one petabyte dataset, but I'm not going to do Test Dev against that. I want to do 10 percent of that and spin that up, and I want to do some masking of personal, PII data. In addition to backup and recovery, and Test Dev, we do disaster recovery. Some customers, particularly those in the big data space, may for now say, "Well, I have replica so for some of this data it's not permanent data, it's transient data, but I do care about DR." So, DR is a key use case. We also do archiving. So if you just think of data throughout the lifecycle, we support all of those. 

VMblog:  How exactly are you using machine learning?

Smails:  What's truly unique about Imanis, in addition to everything I just mentioned, is that we're the only data management platform that is machine learning-based. Okay, so machine learning as a sort of futuristic technology gets a lot of attention, but we're actually delivering machine learning enabled capabilities today.

VMblog:  What value does this machine learning inside your product provide to an enterprise data administrator?

Smails:  Very specifically, the capability we're delivering today is called ThreatSense which helps you protect your data against ransomware attacks.  What ThreatSense will do with no user intervention whatsoever is it will analyze your backups as they go forward. In doing so it will learn what a normal pattern looks like across some 100+ different metrics, establishing a baseline. That's number one. Number two is that ThreatSense will constantly analyze anything taking place that is knocking things off that baseline and creating an anomaly, and when it does it will notify the administrators that something may have happened and they should look into it. So the value very specifically is around ransomware. Typically one of the ways you're going to detect ransomware is you will see an anomaly in your backup set, because your data set will change materially. Knowing when the anomaly occurred will allow you to visit the most recent backup prior to its occurrence and help you determine exactly which data is missing. 

VMblog:  Where is ML taking your product?

Smails:  Our vision is that machine learning is the future of data management.  When data volumes are small, decisions about when to back up data to meet business mandated RPO and RTO needs can reliably made by human judgement.  Additionally, data anomalies such as data deletion can be easily detected since the environment is fairly simple. The margin of error resulting from human decisions is small and can be managed.  However, when dealing with large volumes of data in complex distributed database environments, human decision becomes extremely difficult and error prone.  Factoring in interactions between various systems is next to impossible in Big Data environments. 

VMblog:  What in your opinion will separate the winners from the losers?

Smails:  What will separate winners and losers is architecture.  While backup today gets all the attention, the problem that needs to be solved ultimately is helping customers manage their data throughout the data lifecycle - test/dev, backup, DR, archiving, migration, and data mobility in general.  To address that requires an architecture that is based upon data management, not backup.  In other words, there's a big difference between being a data management company that supports backup use-cases and a backup company trying to evolve beyond backup.  That's a difficult if not impossible task.  We believe we are uniquely positioned to become the de facto standard in enterprise data management.

VMblog:  As more of the data gets persisted out at the edge what are the challenges in terms of protecting that data, backup and restore, deduplication, and so forth, and to what extent is Imanis Data addressing those kinds of more distributed data management requirements going forward?  Do customers more of that?  More of an edge cloud environment?  Or is that way too far in the future?

Smails:  It's not that edge is that far in the future, but rather the big problem right now from an enterprise mainstreaming standpoint, is more getting the core house in order as you move from a traditional four-wall data center to a hybrid cloud environment. And how do I get the core data lake sorted out? You also have to consider the analysis. Where is the data going to persist?  Where do you do some of that computational work? So you get all this information out at the edge. Does all that information end up going into the data lake? Do you move the storage to where the lake is or do you start pushing some of the lake function out to the edge? It's a complicated discussion.

Today, where we see the edge happening is in the cloud, but how do you get to the edge? You can get to the edge through the cloud. So, we run on AWS, GCP, and Azure.


Published Thursday, July 19, 2018 7:32 AM by David Marshall
Filed under: ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<July 2018>