Virtualization Technology News and Information
Article
RSS
Automated Data Tiering vs. Automated Data Placement

 

By Vinod Mohan of DataCore Software

Learn how to move data between diverse online storage media and explore the two popular techniques: data tiering and data placement.

Determining which data gets placed on which storage is a herculean challenge storage administrators grapple with on a daily basis. Not all storage media are the same. They can vary by performance, cost, compliance, deployment, location, etc., and certainly not all data are equally important. Some are hot data accessed very frequently, some are infrequently accessed data, and some are just redundant copies of data for recovery and accessed only in the event of disruption and data loss. It is necessary to highlight that the importance of data as well as data temperatures change over time. For example, warm data stored on fast HDDs could be accessed frequently by a certain application and may want to be considered as hot and moved to faster SSDs.

automated data tiering vs automated data placement

It is the responsibility of the storage administrator to figure out which data goes where. Given the speed and volume at which data is processed, executing this manually and in real time is impossible. This is where automation of data movement comes to aid.

Data storage management software - those that come built into storage hardware and those that are available from third-party solution providers - provide the means to automatically move data to the appropriate storage tier. And this happens fully transparently to the application and users accessing the data, without any impact to operational continuity.

In this blog we will compare and contrast two techniques - data tiering and data placement - that are similar in principle, but different in the way they work. Let's dive right in.

Automated Data Tiering

Automated data tiering (also known as storage tiering or auto-tiering) is a widely used technique in the block storage world where the software controlling the data movement uses machine learning to track access patterns and understand data temperatures. The science of data tiering distills down to monitoring I/O behavior, determining frequency of use, then dynamically moving blocks of information to the most suitable class or tier of storage media. Based on how hot, warm, or cold the frequency is, data gets placed on corresponding storage tiers. Typically, the storage administrator defines the storage tiers - tier 1, 2, 3, and so on. Then, the software does the rest.

Data tiering can work within a single storage device with different tiers distinguished within itself, or across devices from the same manufacturer or from different manufacturers. Its full potential can be realized when there is no vendor or device constraint, and tiering is performed across any storage system.

Consider an environment where there is a mix of premium SSD flash arrays, HDD storage systems, and JBODs. You would not want to waste the space on the premium flash array with cold data which rarely gets accessed and leave the device constantly hungry for more capacity, which is neither smart nor cost-effective. Data tiering enables automatic data movement so that the high-performing and expensive storage (tier 1) stores the hottest data and the other tiers (lesser in designated tier numbers) get the warm and cold data.

This movement does not only happen when new data is written to the disk. Even as existing data is being accessed and the frequencies change, the data storage management software intelligently recognizes the pattern and moves it to the respective storage tier. Data movement happens continuously and automatically and fully transparently to the application in the front end.

At DataCore, we have incorporated automated data tiering into our block-based software-defined storage solution, SANsymphony, which uses storage virtualization technology to abstract storage capacity from the storage hardware and create virtual pools. Within a storage pool, storage tiers can be characterized and SANsymphony performs data tiering in real time letting you take full advantage of the capacity on your performant hardware for storing critical/hot data. SANsymphony promotes most frequently used blocks to the fastest tier, whereas least frequently used blocks get demoted to the slowest tier. This also gives you the benefit to integrate new technologies into your storage infrastructure seamlessly. For example, if you are adding some storage disks based on 3D XPoint, SANsymphony can add that storage non-disruptively into its virtual storage pool and make it your tier 1 storage where all your hot data automatically gets promoted to. SANsymphony' unique value is that it supports data tiering across any make or model of storage hardware and any deployment type (including hyperconverged).

 

automated data tiering

How DataCore SANsymphony performs automated data tiering

Automated Data Placement

In the world of unstructured data, where data growth is much greater in comparison to structured data, file storage is generally used as a preferred storage medium. IT organizations require the flexibility to move data back and forth between file storage systems such as NAS, file servers, etc. - and also with object storage when needed - based on their requirements.

This is possible with automated data placement, which is a variant of automated data tiering, but goes far beyond that in meeting different criteria for data movement. Here, the data storage management software is typically a global file system which resides above the storage layer. Leveraging file virtualization technology, the global file system first gathers the metadata from the data payload stored on various storage systems (file servers, NAS, cloud, etc.). It then assimilates the files, including their metadata information, into its global namespace.

Now, the global file system knows details of which files are stored where, what type of files they are, when they were created and last accessed, what their size is, which user created the file, etc. and the capacity utilization of the storage systems. The information gathered about the data is much greater than in the case of block storage. So, there are now more options to customize the criteria based on which data can be moved between storage media. Frequency of data access (or data temperature) is indeed one of them. But there can be numerous other bespoke policies that the administrator can create to regulate data movement. Hence data placement has greater applicability than data tiering.

Here are some examples for better understanding:

  • Durability and data protection: Create copies of data stored on a certain share and move it to multiple locations as backup.
  • Performance: Offload data stored on premium NAS devices to slower disks and cheaper storage. This helps free up capacity on your primary storage and minimizes I/O bottlenecks.
  • Compliance: Regulatory compliance policies may require organizations to retain data in a specific location for a given time period before it is moved or deleted. For example, store customer data within a country or within a specific site to meet compliance and security requirements.
  • Offload to object storage: For organizations that are focused on leveraging object storage as a low-cost alternative to file storage, they can use automated data placement and move inactive/cold data to object storage either on-premises or in the cloud.
  • Custom business objectives: Move all snapshot files to the cloud; move all data from HR department getting stored specific storage hardware to secondary storage; when capacity limit is reached on a specific storage volume, move all new data to another storage volume (this helps balance load across storage systems); and more.

For performing automated data placement across distributed file and object storage, DataCore offers vFilO, a software-defined storage solution that acts as a global file system and governs data movement based on custom policies set by the storage administrator. vFilO uses machine learning to detect patterns as data gets written to storage and then performs data placement based on these policies. Using vFilO allows you to aggregate namespaces across disparate NAS devices and filers into a single global namespace and streamline data mobility as you desire.

Automated Data Placement

Automated data placement between on-premises file storage and cloud object storage

Just like in data tiering, here also the data movement happens dynamically and fully transparently to the application and users in the front end. With the option to move data to the cloud and between different public cloud platforms, vFilO can also support you on your cloud journey and leverage economical options to store data.

Policy-Driven Data Placement for Unstructured Data

Customizing policies and objectives in DataCore vFilO to regulate data movement

Conclusion

While it is common for IT pros to use the terms data tiering and data placement interchangeably, from DataCore's perspective, we treat them as two distinct techniques wherein data tiering is focused on data movement based on data temperatures, and data placement uses custom policies to control data movement based on business requirements (which also includes data temperatures as one of the options). You can check out SANsymphony and/or vFilO based on what your storage environment is made up of (block, file, or object) and what type of data you are dealing with (structured or unstructured).

##

This article originally appeared on DataCore.com: https://www.datacore.com/blog/data-tiering-vs-data-placement/

About the Author

 

Vinod Mohan is a Senior Product Marketing Manager at DataCore Software. He has over a decade of experience in product, technology and solution marketing of IT software and services spanning application performance management, network, systems, virtualization, storage, IT security and IT service management (ITSM). In his current capacity at DataCore, Vinod focuses on communicating the value proposition of software-defined storage to IT teams helping them benefit from infrastructure cost savings, storage efficiency, performance acceleration, and ultimate flexibility for storing and managing data.

Prior to DataCore, Vinod held product marketing positions at eG Innovations and SolarWinds, focusing on IT performance monitoring solutions. An avid technology enthusiast, he is a contributing author to many popular sites including APMdigest, VMblog, Cyber Defense Magazine, Citrix Blog, The Hacker News, NetworkDataPedia, IT Briefcase, IT Pro Portal, and more.
 
Published Wednesday, December 09, 2020 7:42 AM by David Marshall
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<December 2020>
SuMoTuWeThFrSa
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789