- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Diagnosing a performance bottleneck in BACKUP/LIST
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-05-2011 06:00 AM
тАО04-05-2011 06:00 AM
Re: Diagnosing a performance bottleneck in BACKUP/LIST
As for the vintage of this gear, I paid US$1300 for a used AlphaServer DS20e dual several years ago (with a bus full of SCSI controllers), and less than half that for an Itanium box, and I've received DLTs faster than this one - for free.
Just about everything you've listed here is a dozen years old.
The Mylex is slow, RAID-5 is slow (and known to expose itself to catastrophic double spindle failures during its recovery processing), the SCSI bus here is slow, the 9 GB drives are slow, the version of VMS is slow, and, well, you're in a target-rich environment for slow.
In terms of raw performance for archival processing, BACKUP (with proper process quotas, etc) was getting 90% of the theoretical bandwidth of the slowest component between the source and the data.
Yes, "old and slow" is a theme in this reply.
Here's the various HP process quota recommendations for BACKUP usernames, and it's typically the proportions that are key, not the absolute values of any of the quotas:
http://labs.hoffmanlabs.com/node/49
Prior to its wholesale replacement with newer gear, I'd verify the quotas, and would also ensure compression/compaction is enabled, and I'd also try enabling fast skip on the tape drive ddcu: device:
$ set magtape/fast_skip=always ddcu:
And (failing a wholesale server swap) I'd look to get to faster SCSI devices all around.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-05-2011 07:33 AM
тАО04-05-2011 07:33 AM
Re: Diagnosing a performance bottleneck in BACKUP/LIST
May or may not be of help here, but is there any reason to NOT combining the BACKUP itself with the generation of the listing?
$ BACKUP
... only make sure
fwiw
Proost.
Have one on me.
jpe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-05-2011 08:43 AM
тАО04-05-2011 08:43 AM
Re: Diagnosing a performance bottleneck in BACKUP/LIST
If so there is a write penalty with RAID-5. I don't know how RAID-5 is implemented on this controller it might not have enough onboard cache memory to compensate for the write penalty.
Also, BACKUP code uses the default parameters for RMS buffer size and number of buffers. Try with:
$ SET RMS/BLOCK_COUNT=127/BUFFER_COUNT=127
...before the BACKUP/LIST command. This would definitely make a better use of the RAID-5 on the output disk because BACKUP code uses write-behind (asynchronuous) and now uses 127 instead of the default 2 buffers and a buffer size (I/O size to disk) of 127 block instead of the default 32 (or whatever DCL-SHOW RMS shows).
/Guenther
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-05-2011 05:40 PM
тАО04-05-2011 05:40 PM
Re: Diagnosing a performance bottleneck in BACKUP/LIST
I'm sure the ROI would be about a year if your actually paying maintenance on this stuff.
Cheers,
Art
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-05-2011 06:22 PM
тАО04-05-2011 06:22 PM
Re: Diagnosing a performance bottleneck in BACKUP/LIST
Last night's backup wrote the tape but then failed during the listing phase with a parity error so I'm going to recommend that if they've got any money at all the new owners buy some new tapes and clean the drive too. (This was the first time I've seen it log errors during backup.)
I'll try increating the RMS block- and multibuffer counts to see if that helps. I'll also try forcing the tape drive to fastskip=always but I'm not sure how that will really help because the BACKUP/LIST command needs to read the entire file before moving to the next anyway.
Jan, I had thought about adding /LIST to the backup commands which write the tape and it might be worth trying. The risk is that it will blow out the time required to write the tape, which is unacceptable. Another job for the weekend!
More news later ...
Jeremy Begg
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-05-2011 07:03 PM
тАО04-05-2011 07:03 PM
Re: Diagnosing a performance bottleneck in BACKUP/LIST
Two thoughts:
- I agree with Volker's suggestion to run a test with the listing going to NLA0:. That will remove all fragmentation and file extension processing from the equation.
- In a somewhat related experiment, I would increase the RMS buffering and blocking factors significantly. This may require resource quota expansion. For experimental purposes, I might very well try very large increments.
Expanding the quotas will reduce the impact of XQP operations acting as blocks on tape processing. The tape will likely process at speed, with the output results backing up in buffers. Note that I am not in my office at the moment, so my ability to experiment on my systems is limited. The preceding presumes that BACKUP is using normal RMS to process the listing file (I KNOW that it uses normal RMS to write/read the save set itself).
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-05-2011 08:07 PM
тАО04-05-2011 08:07 PM
Re: Diagnosing a performance bottleneck in BACKUP/LIST
>> Last night's backup wrote the tape but then failed during the listing phase with a parity error
Parity errors are usually caused by tape errors, or by problems with the tape drive or related I/O hardware. Please check the online help on parity. $ help/message parity. Some times a cleaning of the tape drive may resolve such issues.
Regards,
Ketan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-05-2011 08:46 PM
тАО04-05-2011 08:46 PM
Re: Diagnosing a performance bottleneck in BACKUP/LIST
a parity error on a SCSI tape drive is typically related to a SCSI bus problem. Either a missing terminator or, an illegal bus/cable length or, a bad cable/connector.
Problems with media are reported as DRVERR. Only the errorlog entry would show the real SCSI error.
The SCSI tape driver (MKDRIVER) has a long mapping table to squeeze the tons of SCSI errors into a few OpenVMS SS$_... status values.
/Guenther
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-05-2011 09:42 PM
тАО04-05-2011 09:42 PM
Re: Diagnosing a performance bottleneck in BACKUP/LIST
I disagree with any and all suggestion about NL:, RMS buffering and fragmentation, and to RAID or not to RAID for the list file. (including my own RAMdisk suggesion)
Folks, the raw numbers just are not there.
>>> listing file is 28436 blocks and has hundreds of extents
>>> that command takes up to eight hours to run!
So even if backup did an IO, and an extend for each block then it would have a full second to do so each time.
Any disk, any fragmentation can do this 10 times per second, if not 200 times.
I tried with on my PC with FreeAXP and the LM driver as tape. Relevant log attached.
You can see how backup does NOT use RMS to make the tape-save set
You can see how backup uses basic RMS $PUT with 2 default (32 block=16KB=0x4000) buffers and write behind.
You can see normal IO counts: 1 IO for 32 blocks of list file... so we are talking less than 1,000 IOs in 8 hours.
Waddayathink... could that be a bottleneck? NO.
Now running this emulated AS 400, I _was_ using 100% CPU time%.
Jeremy... was there significant CPU time, enough to stop the tape from streaming?
Average could be well than 100%, but if each tape block took more time to process then to read, then backup might not post the next IO fast enough? Maybe it does not double-buffer on list (versus restore)
Now that we heard about hardware errors... maybe there was some error correction / retry going on taking 'hours' ?
fwiw,
Hein
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-06-2011 07:01 AM
тАО04-06-2011 07:01 AM
Re: Diagnosing a performance bottleneck in BACKUP/LIST
No matter what OpenVMS and BACKUP code would be doing there is a severe performance bottleneck. Question is: Where? Is it the tape drive read side or the disk write side?
A listing to the NL device would at least tell whther or not it is the disk side. And then go from there.
/Guenther