By Mike Ivanov, DataCore
What's the Difference and Why it Matters
Data
is the lifeblood of every modern organization. Our ability to share,
store, and use it effectively is crucial to helping businesses grow,
improve operational efficiency, keep customers happy, and gain a
competitive edge. It's also vital for empowering employees by giving
them access to the information they need to get their jobs done. This is
especially true with more of us working remotely during the current
health crisis.
We
all know that data is growing explosively - organizations have to buy
more data storage than ever before. And that's a big problem. However,
every organization is faced with another big problem that effects
everyone - business leaders, IT professionals and users - though it
effects them in different ways. And this is: Not all data is equally
valuable.
Data
is like cash. We treat, protect and use the cash in our wallet
differently based on its value. We're a lot more careful about how we
look after and spend $100 bills than $1 bills. The same is true of data.
Not all of it is equally important and more importantly, its value
changes over time - typically because of the information contained in
it, access frequency, and even age of the data. Ideally, organizations
should have storage platforms that are built to handle the importance of
data in intelligent ways, rather than just storing bits and bytes
unintelligently. That's why data storage providers introduced the
concept of "Data Temperature."
To
illustrate: there is usually a short burst of frenzied activity with
newly created data, but this activity rapidly drops off over time.
Typically 90% of I/O activity takes places in 10% of data storage. And
it is also true for most organizations that only about 20% of all data
is being actively used. That leaves 80% of data just sitting there
chilling. It might be used once a month, or once a year, or never again.
The image below shows how data temperature equates to its value. Hot
data is in active use and it's most valuable to the organization.
Inactive data is cold and less valuable, but you still have to store it
for possible future use, which would make it hot again.
It
is to be noted that data access need not be the only deterministic
factor for inactive/cold data. For unstructured data, there could be
other business requirements that determine when data can be deemed
inactive, such as the age of the data, cost of storing it, its
protection level, compliance and so on.
Let's
look at the unstructured data world where data is more distributed and
the two popular formats of data storage: file system and object storage.
What is File Storage?
File
storage (aka file-based storage or file-level storage) is the type of
data storage where data is stored in a hierarchical file and folder
structure. A file is stored as a whole without breaking down the data
into blocks, such as in block storage. Files can be stored in folders,
which can then be placed in other folders in a nested structure. The
directory path of the file and which folder it is stored in is needed to
call up that file again from its storage location. NAS systems
typically use file storage and are comparatively less expensive than
block storage.
If
you have a computer, you've used a file system. File systems contain
documents, presentations, images, all the sorts of resources we move
around on our desktop or store in our ‘Documents' folder. File systems
give us a hierarchical system for organization. It's a similar approach
to using a filing cabinet with the data organized into named
directories, folders, subfolders and files. Applications and users know
where everything is based on name and location. File systems are great
for simple in and out access, provided you know the location of what
you're looking for.
For
file storage beyond the ordinary desktop or laptop, organizations use
NAS (Network Attached Storage) solutions and file servers to provide
specialized and optimized file share capabilities across a network. They
usually provide NFS and SMB protocol support for use in Unix, Linux,
and Windows environments. These are great for file and document storage
or sharing.
NAS
is typically suited for file and document storage or sharing, as well
as access control. But as you know from your own desktop, you're only
working on a few files at a time. Most of the files on your hard drive
are cool-or cold. If that's true on a file server or NAS, the system
runs out of storage or performance bogs down-just like your notebook. In
such cases, IT organizations can consider object storage as a means to
store cold (or inactive) data.
What is Object Storage?
Object
storage (aka object-based storage) is a type of data storage used to
handle large volumes of unstructured data where data is bundled along
with metadata tags and a unique identifier. Each of these self-contained
object datasets are placed into a flat address space, known as a
storage pool. Unlike file storage, object storage does not follow a
hierarchical structure. The metadata contains description about the data
and the unique identifier is used to easily retrieve the object instead
of a file name and file path. Cloud-based S3 is a popular object
storage option in addition to on-premises object storage deployments.
Object
storage is a more recent approach that doesn't impose a file system on
the data. Instead, metadata is used to describe all the details about
the underlying data. This can include the name, creation date, location,
owner and much more. Tables are used to make it possible to store,
track and retrieve data based on this metadata.
This
works in the same way as using a valet service at a car parking
facility. Imagine millions of cars in an enormous parking lot. The valet
provides a parking ticket in exchange for your car and then parks it
for you. You don't need to know where it's parked, just that it's safe
and will be available when you need it next. It can be retrieved by the
valet at any time based on the information (or metadata) on the parking
ticket, no matter the size of the parking lot.
Object
storage boasts low cost, massive scalability, and global access
capabilities. The trade-offs include latency and performance, but these
are improving over time. For users who almost never need access to old
files and documents, it's almost invisible. But to organizations who
need to keep everything for regulatory compliance or legal defense,
object storage is essential.
Putting the Right Data in the Right Place at the Right Time
The
key take-away: Different data is worth more or less, depending on time,
users, and importance. That means the most appropriate storage for any
specific data will depend upon how valuable it is right now and the
specific needs of applications or end-users utilizing it, or its
business relevance. And that's nearly impossible for a storage admin to
determine, day by day. After all, your organization is creating millions
of documents every year. Can you imagine a storage admin digging
through every document, trying to decide whether it's hot, warm, or
cold, or applying different business relevance conditions manually and
deciding which data is placed on which storage device?
The
problem is that, up until now, we haven't had a good way to make sure
that data - whether on NAS devices or object stores - was in the right
place at the right time, especially since needs change all the time,
file and object platforms might come from different vendors or use
different toolkits, and manual migration from one another is a pain.
That's where a modern software-defined storage solution like DataCore's vFilO file and object storage come in.
- It
uses AI/ML-driven auto-placement to move data to the most appropriate
storage based on its access temperature. vFilO checks the heat template
of data stored in a storage device and then determines whether to keep
the data on a premium NAS device or move it to lower cost alternatives
(such as object stores). vFilO not only looks at the access frequency of
the data, but also other custom criteria based on business relevance
that the storage administrator can set, such as age of file, location,
resiliency, etc. This means you can balance performance, capacity,
operational efficiency and cost factors. High-performance, high-cost
storage can be reserved for hot data, while mon-critical (or inactive)
data can migrated to low-cost storage or the cloud.
- You
can tap into all the available capacity across the organization,
unlocking pockets of unused storage you didn't even know you had. That
means you can delay expensive upgrades, or avoid them altogether.
- With
a global namespace, it's simple to find the data you need when you need
it. All file and object data is now accessible from a central console
regardless of which storage device/type they live in. Using a
metadata-driven search and find operation, vFilO accelerates the process
of locating and accessing data across different types of storage
devices (file or object stored on-premises or in the cloud).
Why are those factors so important to business leaders, IT admins, and users right now?
- Because
they enable quick and seamless access to data anytime, from anywhere,
helping to drive innovation and gain a competitive advantage.
- Because
you can balance and fine-tune performance, capacity, operational
efficiency and cost across your entire storage landscape.
- Because
they give you complete visibility and control to adapt to the radically
new economic realities and even the new paradigm of a largely remote
workforce.
##
About the Author
As
the Sr. Director of Americas and Product Marketing, Mike Ivanov brings
over 25 years of B2B technology marketing experience to DataCore
Software. He was most recently the Co-Founder of JourneyLabs, a start-up
developing a Customer Journey Management Platform. Prior to that, he
has held executive marketing and product management roles at Permabit
Technology, CommVault Systems, Mimosa Systems, and Veritas Software.