Operating System - OpenVMS
1829696 Members
2138 Online
109992 Solutions
New Discussion

Re: Backup takes to long (9 hour)

 
SOLVED
Go to solution
Craig A Berry
Honored Contributor

Re: Backup takes to long (9 hour)

Since you appear to be backing up a large number of relatively small files, be sure to review the system parameter CHANNELCNT and the FILLM quota on the account running the backup. On an ancient AlphaServer 2100, I reduced a 10-hour backup by about an hour by bumping CHANNELCNT from the default 256 up to 2048. Make sure FILLM stays somewhat below CHANNELCNT.

As I understand it, BACKUP is aggressively multi-threaded and being able to have many files open at once allows it to keep the output device busy and perhaps do some optimizations with I/O size and such. Others can no doubt explain the whys much better.
Uwe Zessin
Honored Contributor

Re: Backup takes to long (9 hour)

Erm. I would not bump up just one or two parameters or quotas. Somewhere in the documentation there are formulas how the quotas should be calculated against each other. In older versions of BACKUP I was able to create corrupted save-sets and only a /CRC/VERIFY helped me detect this!
.
Craig A Berry
Honored Contributor

Re: Backup takes to long (9 hour)

Uwe, no one suggested he bump one parameter or quota without considering others. He said he's already tuned account quotas and had various suggestions in this thread for quotas to look at. I see no reason why he shouldn't also consider the maximum number of open files as a potential limitation since recent versions of BACKUP are explicitly designed to take advantage of large numbers of open files.
Hein van den Heuvel
Honored Contributor
Solution

Re: Backup takes to long (9 hour)

Let me add me 2 cents. [I cheated a little, because I have also seen parts of the HP internal discussion on this subject.]

First, I suspect backup is doing a reasonable job here. It is reasonable because the large number of (small) files.

We (or least I) would like to think backup is limited by bandwith: MB/sec... and mosty it is. But in this particular case it is limited by IO/sec. Let me prove that.

There are 6M files in 36,000MB. So the average file size is 6KB. (It'll be a little less because the disk is not 100% full).
So rigth there you *know* that IO is not large enough to max out MB/sec.

Now the disk is minimally fragmented. But still, Backup is going to do 2 IOs per files.
To find the files to backup it walks the directory tree. And starts reading file headers from INDEXF.SYS. I believe this done unsorted. I should verify that. It then allocates buffers for the extents found untill all buffers are spoken for, or fillm is reached. It then sorts the LBNs corresponding with those buffers and issue a bunch of IOs in order. btw... All this time no output is happening!. In this case, that's going to be just 1 IO per file. But together with the header IO that is 2 IOs, times 6M files = 12M IOs.
Over 9 hours that works out to 12,000,000/9*3600 = 416 IO/sec.

Now those IOs are sorted, but they do go all over the disk 'within each bundle' and have no pattern where a disk / controller (read-ahead) cache can possibly help. So for a single logical disk, even served by shadows pairs of mirrors, that is just about the max you can expect. (10,000 rpm = 166 revs/second (= 6ms) Give each disk a little over 1/2 revolution to seek (4 ms) and 1/2 revolution to find (on average) for each IO and you are looking at no more than 150 IO/sec. Add to that the on-off cycle of backup IO and your are looking at the numbers observed!

In summary. Backup is working as expected under the circumstances.

To get better timings you need more spindles active.

- With the existing storage, you may want to try smaller backup quota's to get those IO's going more quickly.

- You might want to consider multiple concurrent backup streams on the same disk from different directory structures to keep the queues filled better. Yes that will create more random IO, but unless you magically have files sorted by name and lbn the IO's will be rather random anyway.

- Can you get files allocted with LBN order simialr to Name order? I doubt it, but If files had been created as 000000001,00000002,.... the backup woudl be faster :-).

- While in general I discourage storage partitions, it might help here. Consider recreating the devices not as simple mirrors, but as (36GB) partitions of a larger multi-disk mirror sets. This would make more spindles available during the backup and would restrict the physical head movement dramatically. Of course great care might be needed at other (non backup) usage time where partitions now live on the same disks, increasing head movements!

- you may need a different way to store these object! Could, with he right subroutine package, those 6M small 'files' become records in an indexed files with the key being the filename? bucket size 63? Use extentions records or external files when going over 30KB?


Hope this helps some,
Met vriendelijke groetjes,
Hein.


Wim Van den Wyngaert
Honored Contributor

Re: Backup takes to long (9 hour)

Hein,

Hoed af.
Without the cheating this should have been valued 20 points. I hope you get 10.
Of course, the density can still be a second reason.

Wim
Wim
Uwe Zessin
Honored Contributor

Re: Backup takes to long (9 hour)

Craig, then I have misunderstood you. Sorry.

Thanks for the detailled analysis, Hein. It is another nice example that the tape is not always the limiting factor. There are still many people who beleive that a disk is fast (it is) and a tape is just slow.
.
Hein van den Heuvel
Honored Contributor

Re: Backup takes to long (9 hour)



I verified the header reading.
It is currently strictly synchroneous.
So the IO pattern in this case will be:
- some directory IOs
- a large (FILLM) series of synchroneous 512 byte IOs to indexf.sys
- a bulk of a similarly large number of async 6KB read IOs over the whoel disk.
- async output starting when the first data reads from the bulk finishes.
- repeat untill no more files.

I'm afraid that given this more spindles will only offer marginal improvements for the indexf.sys IOs (where most time is waited) and only speading up the data IO time significantly.

My recommendation, is to consider using physical backups. That will be done in half an hour or so (eating double the tape perhaps depending on compression) @20MB/sec. To restore, roll out to a random disk, select from that disk, walk away from that disk.
Yeah, I know, possibly easier said then done. (where done means full integration in operation procedures?)
But possibly also easier done then said! (Where done means the actual one backup hour job versus days of discussions around it ;-)

Hein.
Willem Grooters
Honored Contributor

Re: Backup takes to long (9 hour)

Ha die Piet.

First: I happen to know the application that creates the files quite well (working on it at present), so I can suggest another approach, if that is possible.
These files are named sequentially - not necessarily contiguously (there are holes in the naming), located in directories named in a likewise manner.
If sequencing the files would mean a faster backup, it could well be a good idea to backup to disk first, to get them 'lined-up' by name (both directories and files) and back them up from there. Would not be a problem because most of the files are 'historical' (but it's required to have them available for years to come).

Willem
Willem Grooters
OpenVMS Developer & System Manager
Hein van den Heuvel
Honored Contributor

Re: Backup takes to long (9 hour)

>>> Would not be a problem because most of the files are 'historical' (but it's required to have them available for years to come).

So perhaps the 'old files' in the old directories are not only not likely to change, but perhaps not even allowed to change? If that is the case why not come up with a directory naming convention such that you only need a backup of recent files? Or use backup/since=xxx/fast (This will scan indexf.sys relatively quickly to build a bitmap of candidate files, avoid single file header IO to read the date fields).
Using storageworks partitions you may even want to create dedicate sub-disks per archive period (a year? quarter? month?). Create the disk with a backup once the period is closed and the size known. (Use clustersize=1!). Create a searchlists of period to make access transparent.

so many solutions... so little time...

Hein.
Willem Grooters
Honored Contributor

Re: Backup takes to long (9 hour)

Hein,
Your suggestion already is implemented - by design.
Piet,
Just another thought: If you have spme spare room on one disk (say 3Gb), consider LDDriver (Freeware CD). Create a containerfile of that size, initialize and mount it and backup/image the documents onto this (logical) drivve. next, dismount and backup the whole container at once. That means: just one file...
Getting it back would require diskspace (and LDDriver).
Willem Grooters
OpenVMS Developer & System Manager
Guinaudeau
Frequent Advisor

Re: Backup takes to long (9 hour)

Hein,

you wrote :

> the IO pattern in this case will be:
> - some directory IOs
> - a large (FILLM) series of synchroneous 512 byte IOs to indexf.sys
> - a bulk of a similarly large number of async 6KB read IOs over the whoel disk.
> - async output starting when the first data reads from the bulk finishes.
> - repeat untill no more files.

I am interested to understand better BACKUP issues for my concern too with a mix during the backup of large contiguous files (> 1GB) and relative small files (< 10KB), eventually on the same drive.

i observed on our systems the simultaneous outstanding IOs of some BACKUP, and it is clearly due to the async bulk of reads. when looking at SH PROC/QUOTA or within SDA, often seen one or no outstanding IO charged versus DIOLM, esp when backup of the large contiguous files.

you wrote about bulk of "async 6KB read IOs".
this 6KB is a min value for small files ? in the case backup large files : each IO (read and write if /BLOCK=65535) will be 64K, or ?

i am also considering your advice about physical backup. our procedures for tape are using currently BACKUP/IMAGE and we have the most recent BACKUP/IMAGE online on a drive. For a restore case, either the data is online or the procedure is to restore a complete drive content from the tape backup to a scratch disk, after having choosen the date. Then one picks up files/dirs as needed. This procedure is very rarely used and has the great advantage of simplicity when explaining it to a customer (insufficient training could make a selective restore to durate longer in elapse time)

questions :

in some sense, i would compare it with UNIX "dd" ? but "dd" is not pretty sure : what about the safety (eg, CRC) of BACKUP/PHYSICAL ?

does a physical restore need a drive with exact the same geometry than original one ? or is it OK when the restore drive has at least as many logical blocks as the original one ? what about the cluster size, eg ?

we often use BACKUP/IMAGE/IGNORE=INTERLOCK for a system-mounted and active drive, eg to backup the system drive : probably one should not use physical backup for a non quiescent drive ?

same question versus shadow-set : we dismount a shadow-member for the backup while some application continues to run (which should not be stopped at each backup).

although the application is running at the dismount-time or the system drive is the active one, the backup has proved to be consistent enough for a restore (BACKUP may complete with an error, our procedures tolerate a few error alike %BACKUP-E-OPENIN)

yours

louis

nota 1 : i did not open the thread but i enjoy and i thank here the participants for their competent and helpfull answers !

nota 2 : i can only confirm the previous recommendation to look at BLOCKSIZE 32K or better 64K : we experienced very very strange "medium error" errors on a customer site, and this repeated many times until we discovered that the BACKUP default block size of 8K was producing the trouble !!! a very silly error, indeed. We now ever run BACKUP with /BLOCKSIZE=65535 (at least when using Alpha and DLT8000, it is OK ; dont know where it would not properly do)