Technical Support Services
Showing results for 
Search instead for 
Do you mean 

DPTIPS: StoreOnce Back to Basics – Why More Than One VTL?

‎10-15-2013 07:11 PM - edited ‎03-10-2016 08:43 AM

For those who are involved with StoreOnce appliance integrations utilizing Virtual Tape Library (VTL) emulation, it is very important to remember the fundamentals documented in the best practices guide. The focus of today’s tip is the rationale behind creating more than one VTL per service set.

 

Three things drive the desirability of multiple VTLs. First, you’re already paying an ingest performance penalty for the inline dedupe. Second, each VTL is its own dedupe domain. Third, you’ll need several parallel streams to realize maximum ingest.

 

Inbound streams are split into chunks of about 4 KB in size. Each chunk is passed through a high-speed hashing algorithm which generates a hexadecimal key statistically unique to that chunk. Each key is compared to a database of keys representing every currently stored chunk of data. If the inbound chunk is a duplicate, the 4 KB of data is thrown away, and a small pointer is put in its place referencing the identical, already-stored chunk. If the inbound chunk is unique, the 4 KB is passed through to the backend store, and its key is added to the database. That’s a lot of processing to achieve at wire speed even with the generous resources found in StoreOnce appliances. What can we do to make up the cost? Keep that question in mind as you consider that ...

 

Each VTL maintains its own hash key database, thus we treat each VTL as its own “dedupe domain”. With very little exception, if you create one large VTL and feed it every sort of data, your dedupe database will grow quite large. As a result, the time required to determine if any new chunk of data is a duplicate will degrade to an unacceptable level. BUT, what if you “stack the deck” and artificially improve the odds of finding duplicates? And what if, in so doing, you also reduce the size of the dedupe domain? Now combine those data points with the concept that ...

 

What the ISV backup app sees as a tape drive we of course know is a Linux process pretending to be a tape drive. As such, each tape emulation process in and of itself is only capable of a finite number of I/O operations per second (IOPS). The good news is that StoreOnce appliances have the resources to support many, many of these tape emulation processes in parallel. The takeaway here is that numerous parallel streams are required to get anywhere near an appliance’s rated ingest. This does come with a caveat, however. You really don’t want more than a dozen tape processes hammering away at any one dedupe database.

 

Now let’s connect the dots.

 

A new StoreOnce customer has Exchange backups, SQL backups, Fileserver backups, and OS backups. We’re going to create four VTLs:

  1. EXCH_VTL,
  2. SQL_VTL,
  3. FILE_VTL, and
  4. OS_VTL

Each will have no more than twelve (12) virtual tape drives. We’re also going to make sure we have plenty of objects in our backups. Finally, we’re going to make sure our backup destinations are chosen to segregate data by type per VTL. Putting all of our information together, what have we achieved?

  • Our data is split across multiple VTLs, so we have smaller dedupe domains and faster hash comparisons.
  • Chunks of data in each dedupe domain are far more likely to be duplicate because they are of the same type. More dupes = faster ingest = less physical space consumption.
  • We have limited our VTLs to a maximum of 12 simultaneous inbound streams, so the risk of dedupe database thrashing should be minimized.
  • We have enabled a maximum of 48 parallel inbound streams. Hopefully we have enough objects in enough concurrent backup sessions to at least get north of 20 streams at one time.

We are backing up the same data as we were but in a way that will maximize StoreOnce performance in terms of ingest and dedupe efficiency.

 

There are of course other major factors affecting the success of StoreOnce integration with Data Protector.  Those were covered in the original article and will likely be topics for expanded future discussions. Also, there are a few corner cases where better dedupe is achieved with only one general-use VTL, but those are typically very modestly sized implementations and not frequently encountered.  Still, you should always weigh level of complexity versus empirical results.

 

Y'all come back now, ya hear?  Smiley Happy

0 Kudos
About the Author

Jim Turner

Jim is a multi-disciplined engineering professional with 30 years of electronic and systems experience. For the past 17 years, Jim’s primary focus has been enterprise backup, recovery, and archiving (BURA). As an HPE Master Technologist, he is recognized as a global authority on HPE Data Protector, HPE StoreOnce, and the proper integration of both. Jim's consulting has stretched over 350k miles and 145 unique locations in North America during his 9 years with HP(E). When not traveling, Jim resides in Edmond, OK with his wife, two grown children, and two dogs.

Comments
Jargal
on ‎11-06-2013 12:10 AM

Hello, 

thanks for a great articole. 

I have a few questions related to StoreOnce and client side deduplication. In case that i chose to do the client side deduplication where is the index file going to be held? Is it going to be located locally on the system ( which doesnt make any sense because if you lose the whole client you lose the index file as well ) or it is going to be on the StoreOnce appliance and comparing will be done via the wire? And second thig, if deduplication is set up on the client side than the deduplication will happen twice since the StoreOnce will try to deduplicate all data that is getting.

 

Thanks 

on ‎11-06-2013 08:13 PM

Hi Alex,

 

I appreciate your kind comment and welcome your question.  Client-side dedupe with StoreOnce is achieved by having a Media Agent (MA) on the client.  The dedupe database (hash key store) remains with the VTL on the G3 appliance.  The MA on the client chunks the data and generates a hash for each chunk.  These hashes are forwarded in batches to the appliance which responds with a list of unique chunks that need to be sent over.  No index of any sort remains on the client, and no further dedupe takes place on the appliance.  (Unless you want to count hardware compression.  Each node has a comp/decomp card that squeezes the unique chunks going to the backend store.)

 

Best regards,

Jim (Mr_T)

Labels
Events
Aug 29 - Sep 1
Boston, MA
HPE Big Data Conference 2016
Attend HPE’s Big Data Conference on August 29 - September 1, 2016 to learn from peers in every industry and hear from Big Data experts and thought lea...
Read more
Sep 13-16
National Harbor, MD
HPE Protect 2016
Protect 2016 is our annual conference on September 13 - 16, 2016, and is the place to meet the world’s top information security talent, discuss new pr...
Read more
View all