Virtualization Technology News and Information
The Cost and Value of Object Storage

By Matthew Dewey, Technical Director at Quantum

Traditional storage architectures today are getting overwhelmed by the growth of unstructured data like scientific and medical research, satellite imagery and high-res video. Object storage is a promising solution for organizations producing a lot of unstructured data. Object stores have long found a home in the cloud and inside data centers as long-term repositories for high-value data, but with demand for storage capacity growing daily, can organizations reap the benefits of object stores within budget?

Providing Essential Data Retention and Protection

Object stores abstract away the location of an object, enabling higher levels of redundancy. This protects against device failure - as well as failures of entire nodes or even data centers. Abstracting away object location also enables object stores to scale to sizes and topologies difficult to achieve with file systems.

An object store user may not know exactly where data is physically stored. What looks like a single object store may be distributed across multiple geographic locations for greater reliability against natural disasters. This level of durability can increase the capacity requirements of the underlying hardware, but with smart erasure coding algorithms it can be achieved using less capacity than by mirroring the data.

One consideration for many entities exploring object storage is data retention. The retention period for many kinds of data is specified by legal and other compliance constraints. One might expect that data not subject to compliance requirements is likely to be deleted sooner, but some data has value indefinitely. Geological and genetic data are examples of data sets with no expiration date that can represent a major investment.

Tape Can Play a Critical Role Managing Costs

The demand for storage capacity is growing at a compound annual growth rate (CAGR) of more than 20 percent. Long-term repositories must become cheaper and deeper without losing durability. Object stores must lower the overall total cost of ownership (TCO) of storage - not just the cost of the media, but associated expenses of owning equipment. Most object stores are hard disk-based for performance and reliability, but the cost of power and physical footprint is significant.

Some object stores now use tape to lower TCO - ideal for large amounts of data stored for a long time. For sequential IO, tape outperforms disk for both reading and writing. Tape offers low media costs and when not being accessed uses minimal power and cooling. However, the latency to access data on tape is a consideration. Best practice implementations will present tape as a separate tier to allow applications to help manage data access.

To realize tape's advantages in an object store it helps to have a thorough knowledge of how to properly manage and treat it. The object store must survive failure modes unique to tape. It must also manage access patterns to reduce tape latencies and wear.

Because tape excels at sequential access, large individual objects perform well. However, a well- implemented object store will group small objects into larger sequential streams to and from tape. With the right expertise, organizations can implement a tape-based object store for long-term data retention while controlling storage costs.

Data Cataloging and Management to Elevate Object Stores

Object stores can be petabytes or even exabytes in size, yet objects are likely in the range of kilobytes to megabytes in size: there are, potentially quadrillions of objects in an exabyte-scale object store. How can we know what is in the data store? How can we identify and select complete subsets of information in a pool of data this vast? A catalog of the contents of the object store is a mechanism for selecting subsets of the data. Data must be classified as it is added to the object store. The initial classification may include the standard attributes and domain-specific classification. The uses of data and the information extracted from it will change and improve over time, so the classification information must be malleable.

What decides which objects go on which storage media and how is that decision made? How do you select the sets of data required for a task? Proper data management ensures the data is where the user needs it, when it is needed. The data needs to be where it can be kept safe for the lowest cost when it is not used. Once an item of interest is identified, the system ensures the data are placed for optimal processing.

In order to affordably preserve unstructured data as a future asset, a properly managed and catalogued object store can be a vital tool.


About the Author

Matthew Dewey 

Matthew Dewey works within the Technology Group to guide Quantum with forward looking technology choices.

Published Monday, October 12, 2020 8:02 AM by David Marshall
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<October 2020>