Virtualization Technology News and Information
Introduction: Resolving the Storage Infrastructure Paradox

In a recent survey, the Taneja Group found that 62% of IT decision-makers identified file data growth and file management as two of their top priorities. File-based data is all that unstructured data that sits outside of databases—office documents, presentations, email, reports, records, audio and video content, images—yet are stored and accessed by systems.

Previously, IT focused on structured data, the core transaction operational data that sits at the heart of the enterprise. It has become apparent, however, that structured data isn’t the problem now and it will be less of the problem going forward. The problem, as the respondents to Taneja’s survey made clear, is unstructured, file-based data. As it turns out, 85% of all data is stored as unstructured data (Butler Group) and 80% of business is conducted with unstructured information (Gartner Group).

Furthermore, unstructured data, according to Gartner, is growing at an unprecedented rate, doubling every three months. This incessant growth leads to what is referred to as the storage paradox: the need to store increasing amounts of data only complicates the storage infrastructure, adding to its complexity and cost.

Fortunately, unstructured or file-based data is different from structured or block-based data. File data can be handled at a higher level of abstraction, which simplifies how the data is stored, accessed, and managed. By taking advantage of the differences between file and block data, organizations can effectively resolve the storage paradox.

In effect, organization can store increasingly greater amounts of file-based data without increasing the complexity of the storage infrastructure or the cost of managing that data, which is the largest cost associated with data storage. Resolving the storage paradox opens up a wealth of new possibilities.

This paper introduces the concept of the intelligent File Area Network (FAN). The FAN is the key to resolving the storage paradox opening those possibilities. In the pages that follow, this paper will explain how the FAN makes it possible to leverage those new file data possibilities. Specifically it will:

  • Explain the power of virtualization, namespaces, metadata
  • Position FAN in the existing storage/systems infrastructure
  • Describe what can be done with FAN today
  • Introduce intelligent FAN-based services
  • Describe the future use of FAN

Some of the pieces are here today and more will be coming.  

The File Area Network (FAN)

The FAN is a file-based approach to storing and managing file-based data as a single, logical pool of data. It provides heterogeneous intelligent file virtualization. Files are stored, accessed, shared, and managed based on their unique names as if they all were stored in one place and one device even though they actually may reside in different places on the network and in different devices. The intelligent virtualization in the FAN makes those differences transparent to the applications, users, and administrators who use and manage the file-based data.

The FAN consists of the following elements:

  • Storage devices, either SAN or NAS
  • File servers, able to manage data at the file level
  • Namespaces, which organize, present, and store file data
  • File management, intelligent software that interacts with the namespace
  • Policy driven, real-time services, which act on the stored data
  • Client systems, which access the namespaces over the network
  • Network connectivity, typically supporting NFS, CIFS, or other standard protocols

At one level, the FAN resembles the SAN. Both provide a network accessible, logical pool of shared storage. FANs, however, handle data at the file level, where each file has a unique name and where business and application context that can be tapped for management purposes and service delivery. In contrast, SANs handle files at the block level, which are not sufficiently unique to be globalized and are too low-level to provide business and application context.

Tapping the power of virtualization, namespaces, metadata

The FAN includes three key elements that allow it to achieve its distinctive results: virtualization, namespaces, and metadata.

  • Virtualization—enables the simplification of the storage infrastructure by masking the underlying complexity of the storage device and the specific location of the data. Storage has been using virtualization for a long time. The FAN uses virtualization to separate the logical view of the data from the specifics of the storage device and its physical location. Virtualization makes it possible to move, access, and manage data logically without regard to its actual physical storage. In the process, virtualization reduces the cost of owning and managing the data and the storage infrastructure.
  • Namespace—provides the ability to organize, present, and store file-based data. It serves the same function as the switching fabric in the SAN, but with one critical distinction: the namespace performs its functions on the logical file-based data as described in the metadata, not on the physical storage device. The namespace, in effect, becomes the heart of the FAN, performing its key functions. There are several kinds of namespaces: non-shared, shared, and global or federated. Each type of namespace supports a different level of sharing. The global or federated namespace is central to achieving a FAN.
  • Metadata—consists of information about the file-based data and its usage. This includes information about the file, where it resides, how to access it, and the type of file it is. Metadata also can include information about when a file was last used, who created it, and who used it. In the future, metadata will even include key words describing the contents of a file. As a higher level of data abstraction, files make it possible to capture information about the context of the data. Through the use of metadata attached to files, intelligent systems can identify and manage the data based on context and business values, such as age of the data, frequency of use, ownership, and ultimately the content or meaning of the data.

The intelligence built into the FAN uses virtualization, the capabilities of the namespace, and metadata deliver the benefits of the FAN. Through the FAN, every node is aware of local and global resources and all other users. By acting on policies the FAN can apply business-level controls to file-based data. Unlike the SAN, the FAN is able to tap the power of file-based metadata to deliver a level of network-based automation and control not possible with a SAN.

Read the entire article from Computer Technology Review, here.

Published Saturday, September 08, 2007 3:56 PM by David Marshall
Filed under:
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<September 2007>