Ahead of SC24 (Supercomputing 2024), VMblog spoke with Krishna Subramanian, COO and Co-founder of Komprise, about how organizations are tackling petabyte-scale unstructured data management challenges.
As enterprises increasingly adopt high-performance computing (HPC) and AI initiatives, they face mounting storage costs and data management complexity. Komprise, making its booth debut at SC24 (#414), offers solutions that not only promise 70% storage cost reductions but also help organizations optimize their data for AI workflows through intelligent tiering, tagging, and lifecycle management.
In this exclusive Q&A, Subramanian discusses how Komprise's platform helps enterprises understand and manage their data across storage silos while preparing for the AI-driven future.
VMblog:
If you were giving an SC24 attendee a quick overview of the company, what would
you say? How would you describe the company?
Krishna Subramanian: Komprise is a platform for independent, unstructured
data management. We help enterprises with petabyte-scale data volumes understand
their data across storage silos. This insight allows IT users to create
policies for its most ideal management, such as tiering cold data to cheaper storage, moving PII data to secure
storage or finding just the right data sets for AI. Komprise delivers
continuous analysis so you can understand your organization's data-how much you
have, how fast it's growing and which data is most important based on access
patterns. You can always ensure that your data is at the right place at the
right time. We frequently save our customers 70% or more on annual data storage,
backup and ransomware data
protection costs as a result.
VMblog:
Your company is sponsoring this year's SC24 event. Can you talk about what that
sponsorship looks like?
Subramanian: While
we have attended the event in the past with partners, this is the first year
we'll have a booth (#414). We are lining up meetings with customers, partners
and prospects at the show and we'll be demonstrating our new Directory Explorer and Smart
Data Workflow Manager at the booth
as well as giving away some fun prizes.
VMblog:
What kind of message will an attendee hear from you this year? What will they
take back to help sell their management team and decision makers? Explain your
technology.
Subramanian: Expanding
the horizons of high-performance computing (HPC) is the theme of SC24. We love
this because it speaks to the fact that HPC creates many opportunities for innovation,
but it also creates tons of unstructured data. The data deluge is so big that
you cannot look at a single storage or backup or cloud vendor to solve it.
Also, as HPC is maturing and becoming core to many enterprise organizations, IT
organizations are moving from DIY, open-source tools to commercial data
management software.
Our
message to attendees who oversee data storage is simple: if you have a lot of unstructured
data, you need a solution like Komprise. The reason is that storage vendors
cannot solve the higher-level problems IT organizations face-and that is how to
reduce the high cost of unstructured data and ensure that data is optimized
across its lifecycle using the best storage for the use case at hand. Komprise
is best positioned to help enterprises manage all their unstructured data
across storage -whether it's in your data center or in the cloud.
It
starts with visibility and analytics. Komprise rapidly analyzes and reports on
all your file and object data no matter where it lives. Our Global File Index, which provides deep analytics across
billions of files, allows users to search, tag and create curated data sets.
Komprise Elastic Data Migration gives customers the fastest, most reliable
migrations for both SMB and NFS data; it migrates large data volumes 25X to 27X
faster than common tools and with built-in features to prevent failures and
data loss such as auto retries and checksums.
Smart
Data Workflow Manager is another key component. This provides a point-and-click
UI wizard that helps IT users set up an AI data workflow - from searching for
the right data set, to configuring and tuning the AI service, to defining the
tags and how frequently the workflow should run and monitoring projects from a
single view.
VMblog:
Thinking about your company's technologies, what are the types of problems you
solve for an SC24 attendee?
Subramanian: We're
lucky in that our solution solves multiple pressing issues:
- Reducing 70-80% of storage costs across
vendors: We accomplish
this through analysis in our Global File Index which shows data growth rates,
amount of data in storage, and time of last access so you can model plans to
save. For instance, you can see potential savings of moving "cold" data that is
one year or older and rarely accessed into secondary storage. You could also model
cost savings from moving on-premises data to a cloud object storage tier. Komprise
has patented Transparent Move Technology (TMT)TM that tiers data across hybrid storage
while maintaining native access to the tiered data both from the original
location and from the cloud.
- Tagging data for improved
classification and segmentation. Metadata
enrichment is increasingly valuable as unstructured data volumes grow into
multiple petabytes in organizations. By adding tags to data, indicating file
contents, location or project, data becomes more searchable. This is useful for
quickly identifying sensitive data types, such as those containing PII, or
tracking down specific data sets for use in AI and ML projects, among other use
cases.
- Twenty-five times faster, lower-risk
migrations. Increasingly,
especially in hybrid cloud environments, data is on the move. Organizations are
looking to adopt new storage that is more efficient or applicable to certain
use cases such as AI. Yet large-scale data migrations are often painful,
complex and may not deliver the expected ROI. Komprise has a proven process to
analyze your environment and data prior to migration to ensure that you are
moving just the right data to the right storage. The assessment also prevents
breakdowns that often happen during migration related to networking, security
or other infrastructure configurations. Komprise Elastic Data Migration is
significantly faster than many common tools and has built-in features for reliability
and ease of use, such as by retaining all file permissions after a migration.
- Data lifecycle management. One-size-fits-all storage is no longer
viable in today's world because of the size of data. Komprise helps
organizations analyze their environments and execute data movement as it ages.
You can automate policies to tier data from hot to warm to cold data tiers
according to parameters that you set. Because of our patented TMT technology,
you can access your data at any tier later, without expensive rehydration to
the original storage. And you always have native access to your data wherever
Komprise moves it - which is beneficial especially for cloud-based AI and ML
programs.
- Added ransomware protection. By reducing the data stored on your expensive
NAS though cold data tiering to immutable object storage in the cloud, you are
reducing the footprint of your attack surface while also delivering enormous
cost savings. Unlike storage tiering solutions that lock the data into
their file format and are incompatible with ransomware protection solutions
like tamperproof snapshots, Komprise technology is transparent and fully
compatible with ransomware protection and backup solutions.
- Preparing data for AI. Getting unstructured data ready to
safely use in AI tools is one of if not the largest challenge in executing AI.
Komprise offers a Google-like search across disparate data silos. You can tag
your data via UI and API to enrich the metadata, making it more useable in AI. As
covered earlier, Smart Data Workflow Manager is the foundation for creating automated
AI data workflows that enrich data and curate the right data sets for the right
tools.
VMblog:
While thinking about your company's technology and solution offerings, can you
give readers a few examples of how your offerings are unique? What are your
differentiators? What sets you apart from the competition?
Subramanian: Komprise
is the only standards-based storage-agnostic data management software that is
used by the who's who of the Fortune 5000 to manage petabytes of data at scale.
This is for three key reasons:
-
Scale-out
with no central bottlenecks or agents: Komprise
has been designed from the ground up to handle the massive scale of
unstructured data. It has a lightweight, distributed, scale-out architecture
with optimized protocol handling and no agents or stubs. Komprise has developed
adaptive algorithms that maximize parallelism and performance while not
impeding active data access. Benchmarks show Komprise is 27x faster for NFS and
25x faster for SMB data movement. Many vendors claim to be distributed but they
are legacy client-server solutions that use agents or proprietary controllers which
limit scale.
-
Patented
Transparent Move Technology (TMT) extends any vendor namespace to the cloud: Storage vendors offer some ways to move and
tier data within their file system, but these are limited to the technologies
they support. Data migration tools can move data, but they do not extend the
original namespace, meaning users must look for data in multiple places.
Komprise is the only solution with patented Transparent Move Technology to tier
data across vendors and architectures transparently and extend the original
file namespace without locking data into Komprise or the storage vendor. Customers
get non-disruptive mobility with maximal flexibility and savings.
-
AI-ready
data and Smart Data Workflows: Komprise
lets you search and pick just the right data to feed to any AI or processing
engine and then systematically record the results as tags. You can create and
execute iterative AI workflows without the penalty of waiting for months to
move petabytes of data from one system to another. Komprise also creates data
governance by tracking data movement into AI, which is critical as AI moves
from use by a few data scientists for model training to use by anyone in the
enterprise for inferencing.
VMblog:
What major industry challenges or technological bottlenecks do you see your
solutions addressing in the high-performance computing (HPC) landscape,
particularly in relation to emerging AI/ML workloads?
Subramanian: Three
key challenges we address in HPC especially as it relates to AI/ML workloads:
-
Systematic
data sharing for RAG and inferencing:
AI/ML relies on your organization's data and the better data you give it,
the better your results will be. While it is relatively simple for a data
scientist or AI engineer to create a vectorized dataset for training a model,
it is much harder to figure out what data to use for inferencing during RAG and
how users will share data securely when using the AI/ML models. For this, you
need a systematic data workflow that will help users search and pick the
relevant data. Next, the system moves the data to the AI process and tracks
what was sent and then stores the results so you don't run the AI over and over
again. This systematic data workflow execution for RAG and inferencing is what
Komprise provides.
-
Data governance
for AI workflows: As users start
sharing organizational data with AI, IT needs a way to audit what was shared,
ensure that sensitive information is restricted from AI and create data governance
mechanisms. Komprise provides the framework for data orchestration and data governance
with AI, especially for augmentation, inferencing and use of AI models.
-
Managing data
lifecycle to optimize AI resource consumption: AI compute and storage is expensive. Once data
has been processed, an unstructured data management system should move it back
out quickly to reduce waste and cost overruns. Komprise manages data movement
to and from AI processing engines to reduce the cost of AI.
VMblog:
Data movement and storage continue to be critical challenges in HPC. What
innovations is your company bringing to market to address these bottlenecks,
and what performance improvements can customers expect?
Subramanian: Komprise
is a comprehensive unstructured data management solution that delivers IT
stakeholders insight into their file and object data, the ability to see across
silos, and the ability to create automated policies to tier, copy or migrate
data. Moving petabytes of data from one storage technology to another presents
many issues, with speed and performance being high priority. Komprise Hypertransfer is a
migration technology that solves the challenge of moving many small SMB files;
metrics show a 25x improvement in speed over a WAN. Komprise also allows users
to create custom data workflows across systems using an AI processor, saving
time. See Duquesne Case Study.
VMblog:
As security concerns continue to grow in the HPC space, especially with the
integration of AI workloads, what approaches and technologies is your company developing
to ensure both performance and protection?
Subramanian: While
Komprise does not store customer data, we help our customers manage data risk.
First, we know that unstructured data presents a huge risk for ransomware attacks
because of its size and because it is spread across silos and sometimes hidden.
Komprise offers an additional ransomware defense by identifying and tiering
cold data, which can be 80 percent or more of data, to object locked storage
where hackers can't access or modify it. This dramatically reduces your attack
surface for ransomware. Secondly, sensitive data leakage to AI is a valid and
growing concern. Komprise provides the tools to find, tag and segregate
sensitive data so that users can't send it to AI tools and prompts. Also,
Komprise can track what data was moved into an AI system. Learn more here.
##