Operating System - OpenVMS
1752790 Members
6284 Online
108789 Solutions
New Discussion юеВ

Re: Calculating block & file size of files in multiple savesets

 
SOLVED
Go to solution
Dean McGorrill
Valued Contributor

Re: Calculating block & file size of files in multiple savesets

Kenneth,
if you just used plain backup then
its not compressed. its around 10% +- larger
then the blocks listed with a backup/list.
I guess we are not sure how you compressed
the savesets. backup which won't do it ?
Jan van den Ende
Honored Contributor
Solution

Re: Calculating block & file size of files in multiple savesets

Kenneth,

your answer is much closer then you are seeking it!

>>>"$ back/list yoursaveset.bck/save

will give you the total blocks used."

I am looking for the uncompressed block size.
<<<

WHAT that will give you, is NOT the saveset number of blocks, but the blocks READ in creating the saveset, ie, the number of blocks you will get upon restore.
(well, DO allow for clustersize uprounding. So, add approx 1/2 * (number-of-files-in saveset) * targetvolume clustersize.)

hth

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Kenneth Toler
Frequent Advisor

Re: Calculating block & file size of files in multiple savesets


>>>"$ back/list yoursaveset.bck/save

will give you the total blocks used."

<<<

Based on the back/list command above, this could take a while for the entire saveset to complete the listing. This is especially true in my case where one saveset can contain as as many as 300,000 to 500,000 files each.

Is there a quick way to extract the line that contains the number of blocks and files for the entire saveset?


So, add approx 1/2 * (number-of-files-in saveset) * targetvolume clustersize.)

Finally, how do I determine the target volume cluster size?
Robert Brooks_1
Honored Contributor

Re: Calculating block & file size of files in multiple savesets

Finally, how do I determine the target volume cluster size?

--

cluster_size = f$getdvi( , "cluster")

where devnam is the name of the mounted target device

-- Rob
Robert Gezelter
Honored Contributor

Re: Calculating block & file size of files in multiple savesets

Kenneth,

The problem as posed has several hazards:

- There may be files that were marked NOBACKUP in the saveset (and thus not saved) that WILL occupy space when restored
- If the saveset is stored on a sequential device (tape or simulated tape), then there is no way to determine the length of the saveset without reading through the entire set
- The "breakage" factor relating to the disk cluster size and the BACKUP record size

There are probably a few cases that I missed in the above.

The bottom line is that without parsing the output of a BACKUP/LIST of the saveset, I doubt that it is possible to come up with a truly reliable number.

It is important to note that hardware compression is below the user's visibility in this case. The case of the NOBACKUP files is effectively an example of an optimization within the saveset and is an issue.

One option is to do the restore to a scratch volume, where users have far more space available than normal volumes. The operation can then be staged onto the actual destination.

As usual, the depth of the response is limited by the details of the target environment.

I hope that the above is helpful.

- Bob Gezelter, http://www.rlgsc.com
Jon Pinkley
Honored Contributor

Re: Calculating block & file size of files in multiple savesets

Kenneth,

This is the fourth question about what appears to be the same problem.

PKZIP for VMS vs. backup/log http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1114625
Total Number of Files and Blocks inside savesets http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1121642
Need to speed up expansion of very large savesets http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1122470
Calculating block & file size of files in multiple savesets http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1133933

Can you please provide a few more details about the actual problem you are trying to solve? We are answering the specific questions you ask, but the answers don't seem to be solving your real problem.

This appears to be a data transfer issue, not an archival issue.

Reading between the lines, it seems you have a process that creates many (300,000 - 500,000 individual files containing a total of more than 50 GB of data) on an ongoing basis. This data is delivered to another party periodically. The apparent problem is that the "customer" is complaining about how long it takes to get the data into a form that they can process it.

Once the customer "unloads" the data to their disk, what do they do with it? After they process it, do they delete it to make room for the next set of data? Specifically, do they process it multiple times, or do they only need to process it once in its raw form? E.g. if they are reading data, and loading it into another database, then once they have processed it, they no longer need the original data.

The reason I ask is that if they are processing the data only once, then if is possible for you to change your procedure in the collection of the data, you will be able to provide the data in a form that will be usable in a very short period of time from the customer's point of view.

If they are only processing the data once, and you don't need a copy of the original data, you can create the data on a removable disk that you deliver to them once the disk is "full". If you had two disks, you could exchange disks (double buffering) but if you need to have a drive available for collected data at all times, then you will need to get the previous disk back before your primary disk fills, i.e. you may need more than two drives. In a previous thread I suggested the use of an LD container file as the "drive", but you seemed reluctant to use LDDRIVER.

The modified procedure to transfer data would be:

At your site:

1. Prepare collection/transfer disk. (Connect, Initialize, Mount)
2. Store data to disk until disk nearly full.
3. Remove disk, send to data consumer.
4. goto step 1

At customer site:

1. Ready input disk (Connect, Mount)
2. Process data
3. Remove disk, sent to data provider.
4. Goto step 1.

The customer can be processing one set of data while you are collecting/generating the next.

Note in this scenario, no backups/restores are done. It is just mount and go. If you do need to keep a copy of the data, you will need to do a backup of the drive before it is sent, or you will need to use HBVS to another drive (which can be an LD device), so the data is copied into two places as it is saved.

The disk can be either a removable SCSI disk or an LD container file. The procedure is essentially the same. The key is that you are providing them with a disk that has the files in a usable state without the need to do an unload (i.e. a restore of a backup saveset or an unzip of a zip file)

Answers to your specific questions:

The only way to get an accurate estimate of the output size of an arbitrary saveset is what is reported by backup/list. However, this data does not change, and you can create a listing file at the time of the initial backup (just include /list=file in the backup command that is creating the saveset). Once the saveset is created, the time consuming process of listing the contents does not need to be done again. Deliver the listing file along with the backup saveset (assuming you are not going to use my proposed solution, in which case, the listing isn't needed).

In your case, as long as you do not have files that are marked /nobackup, and you are not using data compression, and you are creating the backup savesets on disk, and you specify /group=0 (no redundancy), the size of the saveset will be a good approximation of the size of the restored data if restoring to a disk with a cluster size of 1. But you would want to require more space than the size of the saveset, as you would not want to run out of space on a restore. This problem can be avoided by just exchanging drives.

Jon
it depends