Virtualization Technology News and Information
VMblog's Expert Interviews: Tarun Thakur Talks of His Versioned Vision for Distributed and Cloud Databases


Back in September 2015, Datos IO emerged from stealth, and announced what it called the industry's first recovery platform for next-generation scale-out databases. I recently had the chance to catch up with the company's co-founder and CEO, Tarun Thakur, so I could learn more about the company, its vision of the market and more about their technology.

VMblog:  Let's jump right in.  Why do you believe traditional backup and recovery solutions are becoming increasingly incapable of supporting business needs?

Tarun Thakur: The fundamentals of the database are changing dramatically and rapidly. Take for example the next-generation applications (real-time analytics, on-demand e-commerce, internet of things (IoT), fraud detection, etc.) that today's businesses are gravitating to-which all run on distributed or cloud databases that are managed by DevOps teams, aka, the data-centric infrastructure. These distributed databases radically differ fundamentally from traditional databases in that they:

  • Are de-facto standards for cloud, social, and mobile-based applications
  • Are scale-out (eventual consistency) vs. scale-up (strong consistency) in nature
  • Offer flexible data models (schema), for ease of application development
  • Provide cross-data-center replication
  • Run on elastic infrastructure that grows or shrinks per application or user growth

As this data-centric infrastructure changes, so to does the entire landscape of recovery requirements-all the way from "new" business needs to maximize the value of data (driven by CIO/CTO needs), "new" product requirements (driven by the eventual consistency and local storage model), "new" users of recovery products (DevOps and DBAs) and the "new" deployment models of cloud-native deployment.

As a result, traditional backup products are a misfit; they are purpose-built for scale-up-as opposed to scale-out-databases. The legacy backup products are based on centralized media-server architectures, they assume shared storage models and serve storage admins and backup admins, and are at complete odds (see above) with the requirements and users of our modern data-centric infrastructure. They are incapable of providing basic operational recovery solutions for next generation applications and distributed databases.

VMblog:  What precisely is changing in the enterprise datacenter? And what backup and recovery challenges exist today, versus years past?

Thakur: Everything from applications to storage infrastructure is changing in the enterprise datacenter. To innovate, enterprises are building and adopting next-generation applications (real-time analytics, on-demand e-commerce, internet of things (IoT), fraud detection, etc.) that have high data ingestion rates, operate in near real-time and are built to scale easily. This fundamental change at the application layer is driving changes in the database layer. The next-generation applications are built on distributed databases such as Apache Cassandra, MongoDB, Apache HBase, etc. and cloud databases such as Amazon DynamoDB, Amazon RDS, etc. and deployed on public or private clouds. Further, next-generation databases are usually deployed on distributed storage (direct-attached local storage) rather than traditional SAN or NAS.

These scale-out databases do offer capabilities such as cross data-center replication, but these features address availability requirements only and do not provide point-in-time versioning and recovery, so enterprises cannot go back and fix operational errors. In fact, if errors are introduced, the databases' redundant-node replication can lead to almost immediate corruption across all nodes of the enterprise's data center.

What's at stake is the lack of viable and novel data protection or data recovery solutions for next-generation databases. No solutions (until now) have existed to allow for corrupted data to be removed, replayed, and propagated with minimal downtime to customer-facing applications. In this new data-centric infrastructure era, the entire landscape of recovery requirements has changed - the ‘recovery' products need to be reinvented from ground up for the next-generation databases that are distributed, scale-out and cloud native in nature.

VMblog: What will it mean for the enterprise as databases and applications become increasingly distributed?

Thakur: As enterprises adopt next-generation applications and distributed databases, they will need robust data management, and specifically data recovery tooling, to support the mission critical nature of their core applications. This includes deployment, operational management and data management, and data protection solutions and recovery products. Enterprises need to be sure that all the data produced by their mission-critical applications can be managed and recovered over the lifecycle of that data. This is the fundamental gap of the day, and therefore an opportunity to help enterprises advance their infrastructure for next-generation application and use cases.

VMblog:  If scale-out databases are the future, and traditional disaster recovery systems are becoming a thing of the past, what are your thoughts on enterprise-wide deployments of non-relational databases such as Cassandra, MongoDB and other distributed systems?

Thakur: Scale-out databases such as Apache Cassandra, Apache HBase, and MongoDB and cloud databases such as Amazon DynamoDB, Amazon RDS, etc. are now the norm. Enterprises are beginning to scale the adoption for core enterprise applications and databases now have native replication capability to provide ‘availability' in case of hardware failures. However, enterprises require novel point-in-time recovery solutions to protect from data loss in event of operational errors such as accidental deletes, corruption etc. Without robust data protection and data recovery products, the adoption of scale-out databases will be inhibited in mission critical applications.

VMblog: If you're not providing traditional backup, and you're not providing traditional disaster recovery, how do you describe your solution? How does Datos IO work and what does it do differently?

Thakur: Datos IO has built an industry-first distributed versioning and recovery product for next-generation distributed and cloud databases. We empower application architects, database admins and DevOps with reliable and scalable recovery product for enterprise use of point-in-time recovery, test/dev instances, etc. Datos IO provides this via application-consistent versioning in native formats, repair-free and application-aware recovery, and semantic deduplication-all to ensure massive storage savings and efficiency.

VMblog:  In a nutshell: why now? Can you give us a perspective on why scale-out data is on the verge of having such a major impact on the IT landscape?

Thakur: Today's enterprise data center is fast transforming into a data-centric world, one that supports the social, mobile, and cloud environments in which modern consumers live. Large-volume, high-ingestion, real-time data from various sources is becoming mainstream for enterprise applications. These include new web-scale and distributed applications designed to help the enterprise gain deeper business insights from data and support new businesses investing in the Internet of Things (IoT), on-demand commerce, digital advertising, fraud detection, real-time analytics, and much more. And as a result, distributed and cloud databases such as Cassandra, MongoDB, Amazon DynamoDB, and Apache HBase have emerged to become new de-facto standards for these next-generation applications.

To innovate and compete in today's economy, enterprises are being driven to adopt the next-generation applications and innovative business models described earlier. But while new scale-out databases permit the rapid development of these next-generation applications-they do so at a cost. The adoption of next-generation applications is rising, yet the full potential of these applications is limited (if not stagnated) by the increased risk of data loss. Enterprises desperately need novel recovery products specifically suited to the data-centric world driven by next-generation applications and scale-out databases.

VMblog:  What else can we expect to see coming from Datos IO?

Thakur: Currently, we are providing early access product and intend to release our first product in early 2016. We are onboarding early adopters and taking their guidance on the journey to making our product generally available. The future holds exciting opportunity for us as we innovate and build our product to support cloud databases (relational and non-relational) and enable value-added features around search, "reify" (runnable applications), analytics, and support for composite applications.


Once again, thank you to Tarun Thakur, co-founder and CEO of Datos IO, for taking time out to speak with 

Published Monday, January 11, 2016 6:32 AM by David Marshall
Filed under: ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<January 2016>