Datos IO emerged from
stealth back in September 2015 and they haven't slowed down since. On launch, they announced what it called the industry's first recovery
platform for next-generation scale-out databases. Earlier this month, they announced the availability of a new RecoverX product. To find out more about this latest product and to get an update on this startup, I reached out to the company's co-founder and CEO, Tarun Thakur.
VMblog: Last week you announced availability
of Datos IO's new RecoverX product - which is based on your proprietary
Consistent Orchestrated Distributed Recovery (CODR) technology. What is unique
about this architecture that IT and Database administrators need to know?
Tarun Thakur: RecoverX is built on our proprietary
Consistent Orchestrated Distributed Recovery (CODR) architecture, a seminal
data protection architecture that is no longer dependent on media servers, and
transfers data in parallel to and from file-based and object-based secondary
storage. CODR delivers
cluster-consistent backups that are highly space-efficient
yet available in native formats and application-ready repair-free recovery at
scale. With CODR, Datos IO RecoverX provides scalable versioning that enables
enterprises to protect both Apache Cassandra and MongoDB at any interval and
granularity. RecoverX provides one-click recovery in minutes (vs the hours required
by existing manually scripted solutions) for operational recovery and test/dev
use, as well as semantic de-duplication to save up to 70% on secondary storage
costs.
VMblog: It's no
secret that traditional backup and recovery solutions are increasingly
incapable of supporting the distributed data needs being adopted by more and
more businesses; but, what exactly is changing the landscape of data, and why
are we only now starting to warm to the idea of an "evolved" recovery solution?
Thakur: In the
past few months we've talked with numerous analysts and industry experts on
this topic, and it pretty much boils down to the notion that enterprises are rapidly adopting 3rd Platform
applications to drive operational intelligence and business innovation. Enterprise growth, innovation and,
occasionally, even competition require that today's companies adopt new,
high-value, data-centric applications such as content-management systems,
real-time intrusion detection, customer analytics, internet-of-things, and
digital advertising. Most of these applications are deployed either on-premise
on open-source distributed databases such as Apache Cassandra, MongoDB, Apache
HBase, or on cloud-native databases such as Amazon DynamoDB, Amazon Aurora,
Google BigTable, and others.
However, this
fundamental shift raises critical issues in the lifecycle of data management.
Traditional backup and recovery products were originally designed for strongly-consistent
databases, tape-based storage media, and legacy database architectures for
on-premise deployments. This leaves the next-generation of distributed recovery
solutions underneath modern database architectures (key-value stores,
column-family, cloud SQL, et al.) with a critical gap. It is this gap
that ultimately limits enterprise adoption and innovation.
VMblog: What use cases are most likely to benefit from RecoverX?
Thakur: Before I answer this question, I think it's important to understand who
this product is suited for. In architecting and building this product, we kept
the mindset of three main audience members: 1) First are the application
architects and line of business owners who are our buyers, and building
next-generation 3rd Platform applications on next-generation
databases such as NoSQL, big data, and cloud databases. 2) The second group are
database architects and database administrators (DBAs), who are the influencers
within an organization, responsible for ensuring that next-generation databases
offer the same kind of enterprise-grade data management and recovery
capabilities found in traditional relational databases. 3) The final group are
DevOps and IT operations teams, who are the end-users. This group has emerged
in recent years from the growing community of applications and orchestration
deployment frameworks.
And so when we
think about the main use cases most likely to benefit from RecoverX, we
tailored it to fit these main groups. Here is where we are seeing early
success:
- Operational Recovery: Scalable backups at any
granularity and at any point-in-time
- Test/Dev for Continuous Development: Reliable
recovery of versioned data to an alternate topology-based database cluster so
that application developers can continue to test against most recent production
data and deploy their application changes in near real-time.
- Workload Migrations: Single-click, fast recovery
for migrations and database upgrades
VMblog: How does RecoverX fit into a
hybrid NoSQL and traditional database infrastructure model?
Thakur: To clarify, hybrid NoSQL and traditional database infrastructure model
applies to application environments that are of polyglot existence/type. Specifically, enterprises have 3rd
Platform applications which rely on multiple data stores such as a media and
entertainment application that have images stored on MongoDB database and indexes
of the application including semantics on SQL Server. For such polyglot
applications, enterprises want a single state of truth (a point-in-time backup)
that spans across relational (SQL) and non-relational (NoSQL) databases.
RecoverX today supports applications that rely on Apache Cassandra and MongoDB
databases. That said, we are continuing
to bring newer data stores and we are actively working to support versioning
and recovery for relational databases for the future and as part of our
roadmap.
VMblog: With NoSQL's known ability to support Big Data's volume, variety and
velocity, and its rapid gains among enterprise - as your April 2016 survey
revealed - what essential elements (or key questions) do IT administrative
teams need to consider? How can organizations make certain the introduction of NoSQL
to their infrastructure aligns with their specific current and future business
needs?
Thakur:
-
I would
say, first and foremost, analyze the features that the database offers versus
the application requirements. Ensure that you pick a database that aligns with
application requirements in terms of data modeling capabilities, scalability,
I/O performance, security features etc. Not only that, think how you will
manage the database and protect the data once critical applications start
onboarding that database.
-
Start
small and be realistic in terms of getting over the learning curve. Make sure
that you clearly understand the capabilities that a particular database offers
and the risk that you are holding.
For example, several customers don't know that most NoSQL databases do not
offer point-in-time backup and recovery solutions and native replication does
not protect them against data loss from logical corruptions and operational
failures.
-
As you
are evaluating NoSQL databases, proactively engage with different vendors (such
as Datos IO) who have considerable expertise in managing data and can help
protect your database if failures happen.
-
Finally,
decide whether you would deploy the database in-house, in a public cloud or
maybe take a hybrid approach. When selecting your preferred vendors, make sure
that they have the capabilities to support your deployments both on-premise and
in the public cloud.
VMblog: Historically, what have been the biggest inhibitors - and enablers - of
NoSQL adoption, and where does RecoverX fit in?
Thakur: The biggest enabler to any new technology, including NoSQL and cloud databases,
is the strong ecosystem of products and technologies that allow enterprises to
onboard that technology with confidence. For instance, if we use the example of
a Apache Cassandra database, enterprises require tooling to deploy the
database, tune it, operate it, upgrade it and most importantly protect it from
failures such as logical errors. Unless that tooling and educational awareness is
available, enterprises will not be able to onboard or scale their business
critical applications on next-generation NoSQL databases. Companies like
DataStax and MongoDB are developing operational management solutions for these
NoSQL databases. However, there is still a big gap as far as backup and
recovery goes.
The best (and
biggest) example here is that of Amazon Web Services - where they set out to
offer database-as-a-service to enterprises, they knew and understood that it
was just not the database itself but they needed to build and provide database
services as cloud-native backups that are a must have for enterprises via
Amazon RDS capability. Furthermore, we
performed some primary market research and the results show two major points: 1)
that the majority of enterprise customers are adopting NoSQL databases at a
faster (2x and above) pace than expected, and 2) these enterprise customers believe
that adoption of NoSQL databases could be accelerated if they have enterprise-grade
backup and recovery solutions.
That is where Datos IO RecoverX fits in - as
next-generation data protection solution for scale of 3rd Platform
applications (Analytics, SaaS, and IoT). We purpose-built Datos IO RecoverX to
solve the backup and recovery challenges of next-generation databases such as
NoSQL, Cloud Databases, and open-source databases, and architected it from
ground up to ensure it is scalable, flexible, operationally easy to use by all (DevOps,
Cloud Admins, et al). Additionally, we wanted to - and succeeded at - providing
significant capital cost savings by
reducing secondary storage consumption.
##
Once again, thank you to Tarun Thakur, co-founder and CEO of Datos IO, for taking time out to speak with VMblog.com.