Virtualization Technology News and Information
VMblog's Expert Interviews: Tarun Thakur Explains Datos IO RecoverX

Datos IO Interview 

Datos IO emerged from stealth back in September 2015 and they haven't slowed down since.  On launch, they announced what it called the industry's first recovery platform for next-generation scale-out databases.  Earlier this month, they announced the availability of a new RecoverX product.  To find out more about this latest product and to get an update on this startup, I reached out to the company's co-founder and CEO, Tarun Thakur. 

VMblog: Last week you announced availability of Datos IO's new RecoverX product - which is based on your proprietary Consistent Orchestrated Distributed Recovery (CODR) technology.  What is unique about this architecture that IT and Database administrators need to know?

Tarun Thakur:  RecoverX is built on our proprietary Consistent Orchestrated Distributed Recovery (CODR) architecture, a seminal data protection architecture that is no longer dependent on media servers, and transfers data in parallel to and from file-based and object-based secondary storage.  CODR delivers cluster-consistent backups that are highly space-efficient yet available in native formats and application-ready repair-free recovery at scale. With CODR, Datos IO RecoverX provides scalable versioning that enables enterprises to protect both Apache Cassandra and MongoDB at any interval and granularity. RecoverX provides one-click recovery in minutes (vs the hours required by existing manually scripted solutions) for operational recovery and test/dev use, as well as semantic de-duplication to save up to 70% on secondary storage costs.

VMblog: It's no secret that traditional backup and recovery solutions are increasingly incapable of supporting the distributed data needs being adopted by more and more businesses; but, what exactly is changing the landscape of data, and why are we only now starting to warm to the idea of an "evolved" recovery solution?

Thakur:  In the past few months we've talked with numerous analysts and industry experts on this topic, and it pretty much boils down to the notion that enterprises are rapidly adopting 3rd Platform applications to drive operational intelligence and business innovation. Enterprise growth, innovation and, occasionally, even competition require that today's companies adopt new, high-value, data-centric applications such as content-management systems, real-time intrusion detection, customer analytics, internet-of-things, and digital advertising. Most of these applications are deployed either on-premise on open-source distributed databases such as Apache Cassandra, MongoDB, Apache HBase, or on cloud-native databases such as Amazon DynamoDB, Amazon Aurora, Google BigTable, and others.

However, this fundamental shift raises critical issues in the lifecycle of data management. Traditional backup and recovery products were originally designed for strongly-consistent databases, tape-based storage media, and legacy database architectures for on-premise deployments. This leaves the next-generation of distributed recovery solutions underneath modern database architectures (key-value stores, column-family, cloud SQL, et al.) with a critical gap. It is this gap that ultimately limits enterprise adoption and innovation.

VMblog: What use cases are most likely to benefit from RecoverX?

Thakur:  Before I answer this question, I think it's important to understand who this product is suited for. In architecting and building this product, we kept the mindset of three main audience members: 1) First are the application architects and line of business owners who are our buyers, and building next-generation 3rd Platform applications on next-generation databases such as NoSQL, big data, and cloud databases. 2) The second group are database architects and database administrators (DBAs), who are the influencers within an organization, responsible for ensuring that next-generation databases offer the same kind of enterprise-grade data management and recovery capabilities found in traditional relational databases. 3) The final group are DevOps and IT operations teams, who are the end-users. This group has emerged in recent years from the growing community of applications and orchestration deployment frameworks.

And so when we think about the main use cases most likely to benefit from RecoverX, we tailored it to fit these main groups. Here is where we are seeing early success:

  1. Operational Recovery: Scalable backups at any granularity and at any point-in-time
  2. Test/Dev for Continuous Development: Reliable recovery of versioned data to an alternate topology-based database cluster so that application developers can continue to test against most recent production data and deploy their application changes in near real-time.
  3. Workload Migrations: Single-click, fast recovery for migrations and database upgrades

VMblog:  How does RecoverX fit into a hybrid NoSQL and traditional database infrastructure model?

Thakur:  To clarify, hybrid NoSQL and traditional database infrastructure model applies to application environments that are of polyglot existence/type.  Specifically, enterprises have 3rd Platform applications which rely on multiple data stores such as a media and entertainment application that have images stored on MongoDB database and indexes of the application including semantics on SQL Server. For such polyglot applications, enterprises want a single state of truth (a point-in-time backup) that spans across relational (SQL) and non-relational (NoSQL) databases. RecoverX today supports applications that rely on Apache Cassandra and MongoDB databases.  That said, we are continuing to bring newer data stores and we are actively working to support versioning and recovery for relational databases for the future and as part of our roadmap.

VMblog: With NoSQL's known ability to support Big Data's volume, variety and velocity, and its rapid gains among enterprise - as your April 2016 survey revealed - what essential elements (or key questions) do IT administrative teams need to consider?  How can organizations make certain the introduction of NoSQL to their infrastructure aligns with their specific current and future business needs? 


  • I would say, first and foremost, analyze the features that the database offers versus the application requirements. Ensure that you pick a database that aligns with application requirements in terms of data modeling capabilities, scalability, I/O performance, security features etc. Not only that, think how you will manage the database and protect the data once critical applications start onboarding that database.
  • Start small and be realistic in terms of getting over the learning curve. Make sure that you clearly understand the capabilities that a particular database offers and the risk that you are holding. For example, several customers don't know that most NoSQL databases do not offer point-in-time backup and recovery solutions and native replication does not protect them against data loss from logical corruptions and operational failures.
  • As you are evaluating NoSQL databases, proactively engage with different vendors (such as Datos IO) who have considerable expertise in managing data and can help protect your database if failures happen.
  • Finally, decide whether you would deploy the database in-house, in a public cloud or maybe take a hybrid approach. When selecting your preferred vendors, make sure that they have the capabilities to support your deployments both on-premise and in the public cloud.

VMblog: Historically, what have been the biggest inhibitors - and enablers - of NoSQL adoption, and where does RecoverX fit in?

Thakur:  The biggest enabler to any new technology, including NoSQL and cloud databases, is the strong ecosystem of products and technologies that allow enterprises to onboard that technology with confidence. For instance, if we use the example of a Apache Cassandra database, enterprises require tooling to deploy the database, tune it, operate it, upgrade it and most importantly protect it from failures such as logical errors. Unless that tooling and educational awareness is available, enterprises will not be able to onboard or scale their business critical applications on next-generation NoSQL databases. Companies like DataStax and MongoDB are developing operational management solutions for these NoSQL databases. However, there is still a big gap as far as backup and recovery goes.

The best (and biggest) example here is that of Amazon Web Services - where they set out to offer database-as-a-service to enterprises, they knew and understood that it was just not the database itself but they needed to build and provide database services as cloud-native backups that are a must have for enterprises via Amazon RDS capability.  Furthermore, we performed some primary market research and the results show two major points: 1) that the majority of enterprise customers are adopting NoSQL databases at a faster (2x and above) pace than expected, and 2) these enterprise customers believe that adoption of NoSQL databases could be accelerated if they have enterprise-grade backup and recovery solutions.

That is where Datos IO RecoverX fits in - as next-generation data protection solution for scale of 3rd Platform applications (Analytics, SaaS, and IoT). We purpose-built Datos IO RecoverX to solve the backup and recovery challenges of next-generation databases such as NoSQL, Cloud Databases, and open-source databases, and architected it from ground up to ensure it is scalable, flexible, operationally easy to use by all (DevOps, Cloud Admins, et al). Additionally, we wanted to - and succeeded at - providing  significant capital cost savings by reducing secondary storage consumption.


Once again, thank you to Tarun Thakur, co-founder and CEO of Datos IO, for taking time out to speak with 

Published Monday, June 20, 2016 6:27 AM by David Marshall
Filed under: ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<June 2016>