Storage Boards Cleanup
To make it easier to find information about HPE Storage products and solutions, we are doing spring cleaning. This includes consolidation of some older boards, and a simpler structure that more accurately reflects how people use HPE Storage.
Tape Libraries and Drives
cancel
Showing results for 
Search instead for 
Did you mean: 

A question on Native and compressed capacity

SOLVED
Go to solution
so_2
Regular Advisor

A question on Native and compressed capacity

Hi All,
This is a very silly question.
Whether a DLT with native capacity 0f 20GB compressed capacity of 40GB
be able to backup a data volume of 50GB?

My question arise from a controversy that
1. A DLT 4000 can backup 20GB without compression and 40 GB after compression.

2. opponents say, The DLT can backup even 80 GB if we enable compression(They say this is because
compressed capacity is 40GB)

I cant believe the 2nd suggestion. But here is a log file of one such backup.

total blocks written to output file /dev/rmt/0m: 117194717

Consider each bloack as 512Bytes it is 60003695104 Bytes.
That is around 60GB.

device is Quantum DLT4000

Please clarify...

Thanks and regards
S.o
5 REPLIES
Peter Godron
Honored Contributor

Re: A question on Native and compressed capacity

Hi,
this greatly depends on the compression ratio possible. Normally ratio is assumed 2:1, but some compression could be 9:1 !

Please see:
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=926826
Bill Hassell
Honored Contributor
Solution

Re: A question on Native and compressed capacity

There is only *ONE* meaningful value for any tape drive and that is native capacity. If you get one byte more than native capacity, then that is nice but completely undependable. Start by looking at this chart:

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c00028345

When the DDS dirst came out, it became quite popular for marketing departments to 'enlarge' the capacity of a tape drive by quoting the capacity as twice native (in one case, 8 times larger) because "some collections of data are compressible 2:1". In a few cases, the fine print at the bottomk of the ad would essentially say: your mileage may vary.

But the practice continued all the way to actual product numbers with almost every manufacturer adopting the 2:1 compression value. So a DLT-80 model is a 40 Gb tape drive. Now it is very important to understand that the native capacity is all the tape can hold. Compression is accomplished by reducing the number of bytes written to the tape using special codes for repeating patterns. The DLT-80 stores 40 Gb but when the special patterns are read back, they are expanded into additional bytes, perhaps twice as many or perhaps 10x more bytes.

So how can a DLT-40 store 200 Gb or more? Easy. Create files that have massively repeating patterns of data. The prealloc command is the perfect way to demonstrate this behavior. Find an empty filesystem where you have several hundred gigabytes of free space. Now run prealloc to create a 200 Gb file (it will take a long time, perhaps 5-10 mins). Then backup that file using the device file for the tape drive that enables compression. Those device files are identified by the lssf command with the code "best density".

You'll find that the backup takes place extrodinarily fast, perhaps 2-3 times faster than expected. This is because the repeating zeros pattern created using the prealloc command is fed into the tape drive at the maximum speed allowed by the bus and the electronics reduce this stream to perhaps 1/10th of it's original size before writing these codes to the tape. (for sake of completeness, you can also create a 'sparse' file rather than using prealloc that will demonstrate the same feature).

So what is the capacity of a DLT-80? Answer: 40 gb. Can you store more than 40 Gb? It depends on the data you are backing up. Binary data such as that found in the /usr directories is almost uncompressible due to random patterns. Data files in a large database such as Oracle can be uncompressible when full or highly compressible when empty.

So how do you predict the compressibility (and therefore the amount of tape needed) for a given backup. The simple answer is: you can't. Now to be fair, you can run the compress (or pack or compact) command, then add up the new sizes. Then uncompress the files so you can use them. This shouldn't more than a few hours for 200 Gb of data files. Note that you'll have to shutdown your database(s) for this test and the test cannot be run on OS directories like /opt /sbin /usr /var, etc.

In other words, if you are forced into counting bytes for your backups, you'll need to invest in a tape changer and use backup programs that have the ability to change tapes during the backup (*not* tar, cpio, dump, pax, etc).

Lengthy answer, but your DLT4000 tape drive will store 20 Gb of data guarenteed (minus any bad spots n the tape which are skipped).


Bill Hassell, sysadmin
sajeer_2
Regular Advisor

Re: A question on Native and compressed capacity



As peter and Bill mentioned,compression ratio is really depends on what data you are backing up.If you can get a 2:1 compression ratio,you can keep 40GB of data.


Rgds,
sajeer
Jaclyn Rothe
Trusted Contributor

Re: A question on Native and compressed capacity

DLT IV tapes (the ones used in DLT 4000 drives) are only able to write 20 native 40 compressed GB. HOWEVER DLT IV tapes come in 35/70 and 40/80 capactiy as well.

If the data is primarily test, it will compress remarkably well and it is not uncommon to get 3:1 compression in this instance using hardware compression.

I would not assume this to be normal performance for a DLT 4000 drive in a production environment... honestly a 1;4:1 to 1;8:1 capacity is typical in a mixed data set (text, pics etc). As long as you ager getting 1:1 ratio and not lower then taht you should be getting native capacity on the tape.

Here are useful docs to help with compression and capcity questions:
Tape Capacity Differences Prevent Remote Storage from Making Media Copies
http://support.microsoft.com/?kbid=266011

How File System and Disk Configuration Affect Backup Performance
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=lpg50032

Performance Assessment Testing:
http://www.hp.com/support/pat

HP StorageWorks - DLT and Ultrium Hardware vs. Software Compression
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=lpg50030&jumpid=reg_R1002_USEN
Marino Meloni_1
Honored Contributor

Re: A question on Native and compressed capacity


You should know that when you see a device with two numbers es DLT 20-40 or 4000, DLT35-70 or 7000, DLT 40-80 or 8000, DLT 110-220, DLT160-320, DAT 12-24 or DAT 20-40 or DAT 36-72 LTO 400-800, etc this always mean that:

The native capacity is the first number, Native capacity means what you can physically put on the tape, each sector full of data, this is a physical limitation due to space, a Digital Tape has a fixed number of clusters, and you cannot increase them.

The compressed capacity is listed then, Compressed capacity is highly depend of the data, if data allow a compression of 2:1, then you will be able to reach the double of the native capacity, if you data can be compressed more, then higher will be the compression ration and higher the data you can put on the tape.

Remember:

The compression on tape is made by algorithm, so it work like compression applications (zip, rar, arj..) it do not work like video recorder that slow down the tape in order to put more data, but doing that it loose quality, Compression on Drive is made before to write on the tape, then data on the tape have all the same aspect (compressed or non compressed) without loosing quality

The Compressed capacity assume to be 2:1 as a standard, has been decided years ago, when data on mainframe allow compressions very high, and selling a device (example DLT2000) the name 2000 was not showing the available capacity because all data where compressible, and usually you could put 40, 60, 80 GB on this DLT, so for a commercial reason the name was changed in DLT 20-40 using the second number as a Reference for a 2:1 compression. Nowadays, the data are usually already compressed by the applications themselves, and where possible, compressed by the OS, so it is very difficult to reach even the 2:1 compression.

Marino