Virtualization Technology News and Information
eWeek Finds the Answer to What is Data Deduplication?

The eWeek Knowledge Center features IT experts answering questions about the most pertinent enterprise technology issues of the day. This installment features Robert Stevenson, Managing Director for Storage at TheInfoPro.

Q: How does data deduplication work?
A: Data deduplication is based on the fact that in any enterprise where you are storing and backing up data there is a tremendous amount of content the occurs more than once. It's more efficient to eliminate or deduplicate those occurrences rather than store them in multiple places. Deduplication vendors use a variety of different algorithms. Some use hash algorithms like SHA-1, others do bit-by-bit comparison. But it boils down to examining the blocks of data in a backup stream and replacing duplicated instances with pointers to a unique instance.

Q: What do data deduplication products look like?
A: Typically it's an appliance that can sit either in-band or out-of-band. If it's in-band, then it analyzes and deduplicates the backup stream while it's being sent to backup storage (for example, to a virtual tape library or VTL). If it's out-of-band, it analyzes and rewrites the data after it's been written to the backup device. In either case, the goal is to remove duplicate data while changing as little as possible in your existing infrastructure, all you do is deploy the appliance.

Q: What kind of applications does deduplication work best with?
A: It can work with either file-oriented or block-oriented applications. It really depends on which applications that particular vendor's product is targeting. But you need to keep in mind that it isn't suited for data that's already been compressed or encrypted, because that will reduce the number of pattern matches the deduplication algorithm can detect. Typically you would do encryption after deduplication, not before.

Q: What are the main benefits of deduplication?
A: Well, contrary to what you might think, the most important benefit isn't really saving storage space, but the fact that you need to send less data to backup in the first place. That can save you a lot of time and bandwidth.

Q: Just how much data redundancy can be eliminated with deduplication?
A: It varies tremendously of course. In the best case, you can get a compression ratio of 20-to-1. In other words, a 20 terabyte backup would be reduced to just one terabyte. About 10% of the data deduplication users we talk to get this kind of ratio. But this is definitely something you need to test for yourself with your own data before you buy a deduplication appliance.

Q: What are some of the vendors of data deduplication gear?
A: Data Domain and Diligent Technologies are two of the leaving private independent vendors. EMC acquired a well-known company called Avamar. Network Appliance, Symantec and FalconStor also have solutions.

Read or comment on the original, here.

Published Tuesday, July 24, 2007 6:10 AM by David Marshall
Filed under:
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<July 2007>