Re: Diagnosing a performance bottleneck in BACKUP/LIST

Jeremy Begg · ‎04-04-2011

Hi,

This one has got me very puzzled.

I've inherited responsibility for an AlphaServer with this configuration:

AlphaServer DS20 500MHz (single CPU)
1GB RAM
Mylex DAC960 backplane RAID
TZ88 tape
OpenVMS V7.2-1

The system volume is a RAID-1 set containing two 9GB drives. There are four other logical volumes held on a RAID-5 set on the same controller. The disks are spread over three SCSI busses.

The problem I've been asked to investigate is why the evening tape backup takes so long.

I've determined that the backup to tape takes 4-5 hours, which is acceptable. The job then uses this command to create a listing of the tape contents:

$ backup/list=$1$dra1:[kits.backup_list]backup_list.lis tape:*.*

and that command takes up to eight hours to run!

I've used MONITOR to examine I/O and found this:

I/O Request Queue Length CUR AVE MIN MAX

$1$DRA1: (ORFF) ORFF_1 8559.00 8557.52 8559.00 8559.00

which is very suspicious, to say the least. Not just that it's very high, also that the CUR, MIN and MAX figures are identical but the AVE is less. And they don't change. If I use SDA to examine the device it says the I/O request queue is empty.

The MONITOR display for disk I/O rate is much more sensible; the rate varies from 0 to 100 or so for this disk (and average is under 20); the total AVE across all disks is under 50.

So I'm at a loss to understand why it takes twice as long to read the tape as it did to write it; the system is not heavily loaded by user activity.

What else can I look at here? I know the disks are badly fragmented but I wouldn't have thought that would make a big difference to the time it takes to write a saveset listing.

Thanks,
Jeremy Begg

Shriniketan Bhagwat · ‎04-04-2011

Hi.

In this case, listing the tape saveset should not take 8 hours in my view. How old the tape is? Are you rewinding the tape before listing the saveset? If yes, what is the command and how much time it takes just to rewind the tape? Is the saveset written in the beginning of the tape or end of the tape? What is the capacity of the tape?

Regards,
Ketan

Shriniketan Bhagwat · ‎04-04-2011

Hi,

One more thing, this is outside the scope of BACKUP, how much time does it take to mount the tape and do dir on it?

Regards,
Ketan

Jeremy Begg · ‎04-04-2011

Hi Ketan,

The procedure uses multiple BACKUP commands to write the tape then uses DISMOUNT/NOUNLOAD followed by MOUNT/FOREIGN to rewind the tape. Using SET MAGTAPE/REWIND might be slightly faster but not enough to be significant. The backup listing's file creation time is within a few minutes of the last disk backup command completing.

I'll try to organise a DIR listing of the tape to see how long that takes. However it might have to wait until the weekend because if it takes more than a few hours it will interfere with the next backup job.

Thanks,
Jeremy Begg

Bob Blunt · ‎04-04-2011

Jeremy, lets talk configuration. How is the TZ88 connected to the system? Does it share a shelf with the disk drives or does it have it's own SCSI controller? Does it generate any errors either during the BACKUP or the pass to list the tape? Is the same account used to list that is used to write the tape? What are the account quotas?

Have you observed the tape during the listing process? Does it seem like there are a lot of pauses where the drive does nothing? Have you checked the TZ88 while listing with $ SHOW DEVICE and $ SHOW DEV/FULL?

Try mounting the tape and performing the BACKUP and adding the following qualifiers to both (if they're not already present)

$ mount/for/media_format=compression/cache=tape_data

$ backup/blah/blah input: output:/media=compression/block=61440

The "Schooner" (DLT) drives work their best when you write using a blocking factor that is a multiple of 4096. So not only can you get a scooch more data on there with the larger blocking factor you write more efficiently when using a multiple of 4096. I would expect it to read better too.

Just some semi random thoughts that might help. You're also somewhat dependent on the account quotas for BACKUP performance on that OpenVMS version so those might be giving you grief too although I'll grant that it shouldn't make that much difference when listing multiple savesets. Do you have a "special" BACKUP account or are you just using something setup for general use?

bob

Volker Halle · ‎04-04-2011

Jeremy,

check the fragmentation of the listing file (DUMP/HEAD/BL=COUNT=0) and maybe use MONI FILE,FCP to see how much the XQP is active.

How big is the listing file ?

Try using SET RMS/EXTEND=65535 for increasing the default extend size, if fragmentation could be a problem.

Volker.

Jeremy Begg · ‎04-04-2011

Hi,

The tape is on its own SCSI controller. (You can't put a tape drive on a DAC960, AFAIK.) There's only one tape error and it's been '1' for several days.

I am several thousand KM from the tape drive so no chance of observing it unfortunately.

The account used for backup is the SYSTEM account and its quotas seem pretty generous. Like I said, the time taken to write the backup is acceptable -- it's the time taken to list the backup tape which is the problem.

The backup procedure was set to use /BLOCK=65535 which resulted in the backup tapes being written with a block size of 65024 (the value I would have put, had I written this procedure). I'll try /BLOCK=61440 tonight and see what happens.

The listing file is 28436 blocks and has hundreds of extents so the disks are badly fragmented (or at least that one is) but I would have expected that to impact the write time more than the listing time. I tried setting the default RMS extension to 2000 last night but it didn't make any difference (the listing file still has hundreds of extents).

Regards,
Jeremy Begg

Volker Halle · ‎04-04-2011

Jeremy,

to rule out a performance bottleneck during READING the tape, try the following:

$ BACKUP/LIST=NLA0: tape:*.*

Backup will still have to read and process all the blocks from the tape and it will do the IOs to the 'listing file', only that they now complete in ZERO time.

Regarding fragmentation: I just cut the backup operation time in half for a backup to a saveset on disk by simply using SET RMS/EXTEND=65535 instead of the default. So don't underestimate the effect of writing to a fragmented disk/file. This also may be preventing the tape from streaming !

Volker.

Shriniketan Bhagwat · ‎04-04-2011

Hi,

Does the error count value of the tape drive getting increased when any operation or BACKUP is performed on the tape device? Please check the status of the tape drive when listing the saveset from SHOW DEV command. Are there any errors or events related to hardware logged in the ERRLOG.SYS while listing the saveset on tape?

Regards,
Ketan

Hein van den Heuvel · ‎04-05-2011

You gotta think backup fails to keep the tape streaming. That'll kill performance. But why would that happen? Because it did not turn around quickly enough to issue the next read. But why? Busy with CPU stuff?... check T4. Waiting for list file IO? Unlikely, that's probably with RMS $PUT to a file with WBH (write behind) which is async.

I should try when I get a moment. Might even start with Jur's MDdriver first, just to see (Trace!) the kind of IOs happening.

Rather then look at Queue depth, I'd sooner like to see IO/sec. But why not have it all and analyze T4 output during the listing window.
Be sure to have T4 running before any next try!

I would also be interested in sample os ANALYZE/SYSTEM PROCIO$SDA output for the list process for a few samples. Just tot check.
While there I would use SDA> SHOW PROC/CHAN and SHOW PROC/RMS=(FAB,RAB,BDBSUM) to see how IO is done

>>> The listing file is 28436 blocks and has hundreds of extents so the disks are badly fragmented

That's not good, but does not explain.
8 hours = 20,000+ second, and even at 1000 extends that is 1 extent every 20 seconds. I think your system can handle that.
Still, might as well try output to NL: or better still to a RAM drive?

$ MCR SYSMAN IO CONNECT MDA1/NOADAPTER/DRIVER=SYS$MDDRIVER
$ INIT /SIZE=200000 MDA1 RAM ! 100 MB
$ MOUN /SYST MDA1 RAM RAM
$ CREA /DIRE RAM:[TEMP]/PROT=(WORLD:RWED)

>> I tried setting the default RMS extension to 2000 last night but it didn't make any difference (the listing file still has hundreds of extents).

The larger default extend will reduce the number of times the system is asked to grow the file and increase the chance it will grab a big chunck, but if there is no competing allocations, then ultimately the same free space will satisfy the request.
You should be able to witness this with DIR/SIZE=ALL for the list file. With the large extent the allocation should 'jump' only a few times and stay put most the time.

hth,
Hein

Hoff · ‎04-05-2011

My laptop runs faster than this AlphaServer, and I'm backing up and shuffling disks and files over WiFi faster, too; probably eight minutes to transfer a four-gigabyte file via WiFi between the laptop and a server.

As for the vintage of this gear, I paid US$1300 for a used AlphaServer DS20e dual several years ago (with a bus full of SCSI controllers), and less than half that for an Itanium box, and I've received DLTs faster than this one - for free.

Just about everything you've listed here is a dozen years old.

The Mylex is slow, RAID-5 is slow (and known to expose itself to catastrophic double spindle failures during its recovery processing), the SCSI bus here is slow, the 9 GB drives are slow, the version of VMS is slow, and, well, you're in a target-rich environment for slow.

In terms of raw performance for archival processing, BACKUP (with proper process quotas, etc) was getting 90% of the theoretical bandwidth of the slowest component between the source and the data.

Yes, "old and slow" is a theme in this reply.

Here's the various HP process quota recommendations for BACKUP usernames, and it's typically the proportions that are key, not the absolute values of any of the quotas:

http://labs.hoffmanlabs.com/node/49

Prior to its wholesale replacement with newer gear, I'd verify the quotas, and would also ensure compression/compaction is enabled, and I'd also try enabling fast skip on the tape drive ddcu: device:

$ set magtape/fast_skip=always ddcu:

And (failing a wholesale server swap) I'd look to get to faster SCSI devices all around.

Jan van den Ende · ‎04-05-2011

Jeremy,

May or may not be of help here, but is there any reason to NOT combining the BACKUP itself with the generation of the listing?

$ BACKUP /LIST=

... only make sure is NOT on !!!! :-)

fwiw

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

GuentherF · ‎04-05-2011

My guess is that the 4-5 hour backup is done from the system disk with RAID-1 and the listing is going to a data disk with RAID-5.

If so there is a write penalty with RAID-5. I don't know how RAID-5 is implemented on this controller it might not have enough onboard cache memory to compensate for the write penalty.

Also, BACKUP code uses the default parameters for RMS buffer size and number of buffers. Try with:

$ SET RMS/BLOCK_COUNT=127/BUFFER_COUNT=127

...before the BACKUP/LIST command. This would definitely make a better use of the RAID-5 on the output disk because BACKUP code uses write-behind (asynchronuous) and now uses 127 instead of the default 2 buffers and a buffer size (I/O size to disk) of 127 block instead of the default 32 (or whatever DCL-SHOW RMS shows).

/Guenther

Art Wiens · ‎04-05-2011

Just swap the works with a Proliant running CharonAXP and move on. There's no valid reason to keep 9GB drives in a "production" environment anymore (at least just emulate them if they _really_ need to be 9GB).

I'm sure the ROI would be about a year if your actually paying maintenance on this stuff.

Cheers,
Art

Jeremy Begg · ‎04-05-2011

Like I said, I inherited responsibility for this stuff and I know it's antiquated but it's what I've got. There will be no money for any kind of hardware or software upgrade. I'm told the system only has to run until the end of the year but if it looks like being longer than that I'll see what if I can talk them into a better tape drive.

Last night's backup wrote the tape but then failed during the listing phase with a parity error so I'm going to recommend that if they've got any money at all the new owners buy some new tapes and clean the drive too. (This was the first time I've seen it log errors during backup.)

I'll try increating the RMS block- and multibuffer counts to see if that helps. I'll also try forcing the tape drive to fastskip=always but I'm not sure how that will really help because the BACKUP/LIST command needs to read the entire file before moving to the next anyway.

Jan, I had thought about adding /LIST to the backup commands which write the tape and it might be worth trying. The risk is that it will blow out the time required to write the tape, which is unacceptable. Another job for the weekend!

More news later ...

Jeremy Begg

Robert Gezelter · ‎04-05-2011

Jeremy,

Two thoughts:

- I agree with Volker's suggestion to run a test with the listing going to NLA0:. That will remove all fragmentation and file extension processing from the equation.

- In a somewhat related experiment, I would increase the RMS buffering and blocking factors significantly. This may require resource quota expansion. For experimental purposes, I might very well try very large increments.

Expanding the quotas will reduce the impact of XQP operations acting as blocks on tape processing. The tape will likely process at speed, with the output results backing up in buffers. Note that I am not in my office at the moment, so my ability to experiment on my systems is limited. The preceding presumes that BACKUP is using normal RMS to process the listing file (I KNOW that it uses normal RMS to write/read the save set itself).

- Bob Gezelter, http://www.rlgsc.com

Shriniketan Bhagwat · ‎04-05-2011

Hi,

>> Last night's backup wrote the tape but then failed during the listing phase with a parity error

Parity errors are usually caused by tape errors, or by problems with the tape drive or related I/O hardware. Please check the online help on parity. $ help/message parity. Some times a cleaning of the tape drive may resolve such issues.

Regards,
Ketan

GuentherF · ‎04-05-2011

Ketan,

a parity error on a SCSI tape drive is typically related to a SCSI bus problem. Either a missing terminator or, an illegal bus/cable length or, a bad cable/connector.

Problems with media are reported as DRVERR. Only the errorlog entry would show the real SCSI error.

The SCSI tape driver (MKDRIVER) has a long mapping table to squeeze the tons of SCSI errors into a few OpenVMS SS$_... status values.

/Guenther

Hein van den Heuvel · ‎04-05-2011

I disagree with any and all suggestion about NL:, RMS buffering and fragmentation, and to RAID or not to RAID for the list file. (including my own RAMdisk suggesion)

Folks, the raw numbers just are not there.

>>> listing file is 28436 blocks and has hundreds of extents
>>> that command takes up to eight hours to run!

So even if backup did an IO, and an extend for each block then it would have a full second to do so each time.
Any disk, any fragmentation can do this 10 times per second, if not 200 times.

I tried with on my PC with FreeAXP and the LM driver as tape. Relevant log attached.

You can see how backup does NOT use RMS to make the tape-save set

You can see how backup uses basic RMS $PUT with 2 default (32 block=16KB=0x4000) buffers and write behind.
You can see normal IO counts: 1 IO for 32 blocks of list file... so we are talking less than 1,000 IOs in 8 hours.
Waddayathink... could that be a bottleneck? NO.

Now running this emulated AS 400, I _was_ using 100% CPU time%.

Jeremy... was there significant CPU time, enough to stop the tape from streaming?

Average could be well than 100%, but if each tape block took more time to process then to read, then backup might not post the next IO fast enough? Maybe it does not double-buffer on list (versus restore)

Now that we heard about hardware errors... maybe there was some error correction / retry going on taking 'hours' ?

fwiw,
Hein

GuentherF · ‎04-06-2011

Hein, you pointed to an obvious mystery: ca. 900 I/Os per hour to disk!?

No matter what OpenVMS and BACKUP code would be doing there is a severe performance bottleneck. Question is: Where? Is it the tape drive read side or the disk write side?

A listing to the NL device would at least tell whther or not it is the disk side. And then go from there.

/Guenther

abrsvc · ‎04-06-2011

Wouldn't volume settings like /data_check and /highwater make a significant difference?

What do the disk characteristics look like?

Dan

David Jones_21 · ‎04-06-2011

My only guess would be it had something to do with caching. Write behind is easier than read ahead.

I'm looking for marbles all day long.

Art Wiens · ‎04-06-2011

"I'm told the system only has to run until the end of the year but if it looks like being longer than that I'll see what if I can talk them into a better tape drive."

And we're on year 8 of the bold 3-5 year plan to replace all our VMS systems. They haven't accomplished much.

Always the same tough questions have to be asked of the business ... how much money is lost per hour/day/week when this 10+ year old production hardware goes south?

Cheers,
Art

p.s. don't forget to test an actual restore. A backup listing should give you some confidence that there's something valid written on that piece of plastic but ... ;-)

Paul Jerrom · ‎04-06-2011

Isn't it amazing how many people answer different questions to the one being asked?

So if I understand right, it is taking 4-5 hours to read data from disk and write it to tape. It is taking double that time to read from the tape and write to the disk.

A similar backup style on one of my sites shows the backup time to be about double the listing time, so your times seem out.

Presumably then, it's an issue of either reading from the tape or writing to the disk. If the former, then maybe someone local can listen to the drive and see if it is constantly moving, or whether it is going back and forth. Might be old knackered tapes (although I'd expect an increase in error count). If the latter, maybe the Mylex has caching disabled or is waiting for writes to complete to the disk before continuing? Any chance one of the R5 disks has failed? This would cause big overheads in calculating values from parity data. Not sure if a failed R5 disk on a Mylex flags anything to VMS. Maybe set RMS extent to 200k, if that is what it will use?
[BTW, this customer wouldn't happen to be in Auckland, would it?]
Have fun,

PJ

Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?

Jeremy Begg · ‎04-06-2011

Hi Paul,

No surprised at all :-)

This system is located in Christchurch, it got moved from one building to another after the earthquake there rendered the first building unsafe. As it happens, the people in the "new" building are the ones who actually use the system.

I think your comment about a failed disk in the RAID5 set might be the solution. I've re-checked OPERATOR.LOG for SWXCR messages and yes, there is a failed drive. (I remember bringing this to their attention before the earthquake and thought it had been fixed, but now I've found out it wasn't. Some people don't deserve to have computers.)

Next step: get HP to replace the failed disk(s).

Thanks,
Jeremy

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Diagnosing a performance bottleneck in BACKUP/LIST

Diagnosing a performance bottleneck in BACKUP/LIST