VMblog recently reached out to and spoke with Komprise co-founder and president, Krishna Subramanian, to learn more about some of the trends taking shape in unstructured data management.
VMblog: The tech world is
always changing from quantum computing to sustainability technology to
generative AI. How are the latest developments affecting the jobs of enterprise
data storage managers and how IT teams manage storage?
Krishna Subramanian: The trends towards
IT-as-a-Service coupled with increased interest in AI are causing enterprise storage
teams to look for ways to manage data across storage vendors and deliver new
and improved data services to business users. Most (85%) of IT leaders in the
Komprise
2023 State of Unstructured Data Management say that non-IT users should have a role in
managing their own data and 62% already have attained some level of user
self-service for unstructured data management.
Data storage
professionals will need to focus on tighter collaboration with departments, such
as through showback reporting, to cut costs by finding and tiering cold data
and eliminating unnecessary duplicates. End users should be able to quickly
search for the types of files they need and inform IT about their intentions so
that IT can set policies for data movement - such as to a cloud AI service.
Storage experts may need more business acumen to deliver upon high-stakes
enterprise needs for data.
VMblog: GenAI is getting
all the headlines right now. What new challenges and strategies does this create
for storage and data managers?
Subramanian: If you can't see all
your data, and understand key characteristics about it, it's impossible to make
optimal decisions about its access, mobility, protection and storage. The
advent of generative AI makes this shift even more imperative because GenAI
introduces new ways to analyze and enrich data, but it also introduces new
risks for data privacy, security, ethics, leakage and accuracy. Lowering
these risks requires strong data governance programs, and data storage teams
have a role to play here. The right tools can help data storage managers track
data usage in AI programs, prevent leakage of unauthorized data types into AI
and bring insights to the table in concert with their security, privacy and
legal counterparts. There is also a need for employee education: workers must
understand the requirements and risks of using generative AI and how to reap
benefits from it without creating liabilities for their employer and customers.
VMblog: How hard is it to
today to put guardrails around the use of GenAI? What strategies do you
recommend today?
Subramanian: Companies and public
sector organizations are already doing this to some extent; research, including
the Komprise survey, shows that organizations are largely
allowing employee use of GenAI and most have outlined some restrictions on data
or applications. Yet there are limitations on guardrails due to the early,
amorphous nature of the technology and a lack of understanding in how the tools
work behind the scenes and what vendors are doing (or not) to protect
organizations and their data. It's hard to fully control employee use, as with
shadow IT. It's also hard to know which are the safest and most accurate tools
to use, how to minimize risks, and how to consistently monitor and audit
corporate data use. The best place to start is to create and enforce a
comprehensive data governance framework that manages the Security, Privacy,
Lineage, Ownership, and Governance (SPLOG) of data interactions with AI. Read
our blog post here.
VMblog: What is missing in
terms of new tools or features to help IT enable GenAI without bringing the
house down from a data breach, lawsuit, sensitive data leak etc?
Subramanian: Given the multifarious
threats from generative AI, it's hard to imagine a single governance solution
that will fit the bill. Instead, there will be layers of AI security tools,
starting at the network layer to prevent the access of blocked data by an AI
tool or prevents users from sending corporate data to unauthorized AI services.
There would be another level protection at the data layer which audits which
data was moved, where, when and by whom and alerts if PII or sensitive
data is being shared. Finally, there could be a security mechanism at the user
layer that may warn users when they are engineering prompts with corporate or
sensitive data or provides feedback when prompts may be giving away too much
corporate context.
VMblog: In your recent
survey on unstructured data management, cloud cost optimization came up this
year as a higher priority than cloud migration. Can you explain that and how
your customers are doing it?
Subramanian: We entered 2023 with
enterprise customers seriously reevaluating their cloud spend and the big cloud
service providers (CSPs) reporting declining or flattening revenue streams.
After years of spending aggressively in the cloud, many enterprise IT
organizations were reeling from huge, unexpected bills. Common tactics to avoid
cloud waste include leveraging cost savings plans and other pricing promotions
offered by the cloud vendors, using commercial spend monitoring tools, deleting
duplicate and orphaned data and reducing cloud sprawl through automated
discovery and corporate policies. An independent unstructured data management
solution also helps by giving storage and IT managers a means to view and
analyze data assets across all storage and establish automated tiering or
migrating of data to the most cost-effective storage solution for current
needs. This avoids data sitting endlessly on high-priced storage when it's no
longer active.
VMblog: There's been a lot
of hype about cloud file storage. How has this evolved in the last two years
and what role does Komprise play?
Subramanian: The pandemic
accelerated cloud infrastructure spending as a lifeline to rapidly resume
normal business operations and rejigger product and service delivery to
customers using online channels. Whether from waste, overprovisioning, high
egress fees, lack of demonstrable ROI and/or not selecting the optimal cloud
storage tier, high costs have resulted in organizations pulling back on cloud
spending in the last year. In 2023, there's been a sharper focus on cloud cost
optimization strategies.
Komprise Intelligent Data Management helps by delivering a variety of metrics and
reports to understand data growth, data usage and overall costs. IT can use
those metrics to forecast the savings of switching to different types of
storage and Komprise can automate policies to move data as it ages from
top-tier file storage to lower cost, archival cloud object storage.
VMblog: Moving data without
disruption was identified as a top challenge in the survey. Describe what
exactly is a "disruptive" experience in data management and how does it affect
both users and IT?
Subramanian: Traditionally, moving
data meant changing user access or disrupting users and applications. A common
problem with traditional archiving is when data becomes cold, and IT moves data
(such as to an archive), the data becomes inaccessible from the original
location. So, when a user goes back to find it and it's no longer in its
original location, they must hunt around for it, put in a help request to IT,
and meanwhile complain to their boss. The same scenario applies to applications
that store file data: the app can break if the file is no longer accessible
from the original location. Another issue is when applications like instruments
cannot continue to write data to a file server when you are doing a migration
cutover. While this downtime is often no more than a few hours, if it occurs at
a critical time, it could negatively affect customers, operations and even
safety-such as in patient care. Komprise helps resolve these issues with our
patented Transparent
Move Technology which moves data with no changes to user and
application access. We also announced warm cutover capabilities in our latest
release which eliminates migration downtime in situations that require it.
VMblog: What is the nirvana
for self-service data management, another growing trend identified in your
survey?
Subramanian: Self-service data
management is about letting users understand their data, find what they need
and contribute to data management decisions. Komprise Smart
Data Workflows is an example of how policy-driven automation
can support many different needs: discovering, tagging and moving sensitive
data to secure storage, finding and copying data for an audit or legal
investigation, merging or deleting data assets after an acquisition, and
finding the right data across storage silos to send to a cloud data lake for
analysis. Self-service data management benefits IT and departmental users
alike: storage managers can more readily meet goals for cost savings and
compliance without conflicts and department managers achieve more say in how
data is managed to meet business objectives.
VMblog: In your latest
release you introduced Storage Insights to unify data and storage management.
Why is this important now?
Subramanian: For storage managers,
there's not been a single console to see detailed usage and capacity data on
both storage and data assets. And it's not just being able to drill down into
different directories and storage vendors but the ability to execute plans from
the console. This is important now because unstructured data growth has
exploded in recent years, creating massive strain on IT budgets and complexity
plus increased security and compliance risks. Plus, storage managers are
increasingly procuring storage from many different vendors. That's making it
difficult to see trends to save money or manage capacity, performance and
security more effectively for end users. Storage insights is something our
customers have been asking for so that they can work more effectively and
productively. But don't take it from us, industry analyst Steve McDowell
remarked recently in Forbes: "Storage Insights is unique in the market in providing a
holistic view of an enterprise's unstructured data across cloud boundaries,
including data stored on-prem on nearly every storage vendor's solution. That's
powerful."
VMblog: What industries
have been the biggest adopters of unstructured data management solutions and
why? How has this changed?
Subramanian: Most industries these
days have huge volumes of unstructured data, yet some of the most relevant
sectors today include healthcare, life sciences, state and local governments,
higher education, oil & gas and manufacturing. These organizations are storing
petabytes of data which is now growing exponentially while retention periods
remain long. Unstructured data management solutions are invaluable for
data-heavy organizations to regain visibility and control of their data assets
that are now distributed across many data centers at headquarters, satellite
offices and in the cloud. An independent unstructured
data management solution can help customers know what data
they have and what it is costing them no matter where the data lives. It can
help optimize storage spending, right-place data into the most appropriate
storage, avoid vendor lock-in, and provide insight on data that can help IT
better serve their constituents - be they researchers, data scientists,
executives, citizens, product developers or marketers.
##