Varada announced a new
capability of its flagship platform designed to support text analytics
workloads and help data teams deliver faster
time-to-insights on exabytes of string-based data. Varada's solution
for interactive text analytics-integrated with the popular open source search
engine Apache Lucene-works directly on the customer's data lake and serves SQL
data consumers out-of-the-box. As a result, data teams can achieve maximum
performance without moving data, duplicating or modeling it.
Most text analytics
solutions are deployed as a bolt-on addition to existing data analytics stacks,
which presents problems for agility, cost, time-to-market and scaling. Varada's
addition of Lucene support within its solution delivers an integrated stack that
performs and scales to exabytes of data on data lakes, making possible richer
business insights.
Today's announcement means
that Varada's technology can give companies actionable business insights by
leveraging 10 times more data and delivering results up to 100 times faster.
Varada's text analytics feature is easily deployed in the organization's own
environment, so the data is not duplicated and never leaves. Plus, it
incorporates all data from any source without modeling, which means data teams get
"zero time to market" with results that are both thorough and precise. Varada's dynamic and
adaptive indexing technology
enables text analytics workloads to run at close to zero latency response time,
especially on latency sensitive queries.
"Text analytics has been
evolving from on-premises solutions to cloud-based solutions," said Eran
Vanounou, CEO at Varada. "These approaches were innovative when introduced, but
they have become complex and expensive, especially given the wide range of
analytics platforms and stacks. At Varada, we're introducing the next era in
text analytics with a solution that runs directly on top of the customer's data
lake and alongside other analytics workloads. For the first time, users can
deploy a text analytics solution without having to move data to expensive
systems and complex, proprietary data schemas."
Text Analytics Challenges
Are Best Addressed on the Data Lake
As the volume of data and
text analytics applications grows exponentially, data teams are increasingly
challenged to optimize cost and performance. Large-scale text analytics
requires customized optimizations for LIKE %text% function and RegExps, which
often results in turning to disparate data silos that specialize in text.
"More often than not,
organizations use complex and high-end text analytics solutions for simple SQL
text search, such as "prefix", "suffix" and "contains" functions,"
explains Ori Reshef, Varada's vice president of products. "There is no need to
build and maintain a standalone text analytics solution that will over-index
each string and comes with a hefty price tag on both license and maintenance.
An example here would be n-grams. With Varada, which integrates Lucene index
within our data lake query acceleration engine, we are using minimal indexing
to get the job done."
Varada's Adaptive Indexing
Technology
Varada's adaptive and
autonomous indexing technology leverages machine learning capabilities to
dynamically accelerate queries to meet evolving business requirements. Varada
indexes data directly from the data lake across any columns. Based on the data
type, structure, and distribution of data, Varada automatically creates an
optimal index from a set of indexing algorithms including text-optimized search
and index (based on Apache Lucene) as well as bitmap, dictionary, trees, etc.
Indexes also adapt to changes in data over time, which is critical for
effective analytics anomaly detection across vast datasets.
Varada's smart engine
detects bottlenecks automatically and adjusts the cluster and acceleration
techniques to ensure business requirements are met at the allocated budget. Key
features include:
- Works atop the
customer data lake, enabling access to new data as it becomes available.
- Works directly on raw
behavior data, without any need to model data to improve performance; any
new data can be analyzed immediately with zero time-to-insights, resulting
in fast results without losing the full dimensionality of the data
- Continuously monitors
queries to identify which data is used and how it's being used by
workloads; this critical observability is then leveraged to dynamically
and automatically accelerate text analytics workloads with adaptive
indexing and caching of data or intermediate results
- Completely decoupled
from the storage layer and can easily scale to serve fluctuating demand
- Provides data teams
full control to prioritize analytics projects, define budgets and
performance requirements