- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: Extremely slow backup to LTO Tape after replac...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-06-2014 03:37 PM - edited 08-06-2014 04:28 PM
08-06-2014 03:37 PM - edited 08-06-2014 04:28 PM
Hello all,
On the 28th July, 8am, I hot swapped out a failed SCSI disk that had failed with a "new" one sent by HP. This disk is a member of a shadow set called DSA3:
Immediately the error count on all the drives on the PKA0: increased their error count by one. I reset the error count then did a show device d/rebuild:
DSA1: Yes DSA2: Yes DSA3: No $3$DKA0: No $3$DKA400: No
So I performed a set volume/rebuild on DSA1: and DSA2:
I then walked away, happy with my achievements...
A nightly backup is performed on the DS25 (EMU1) ONLY to the MKC500: device.
Normally, backups to an LTO take just two hours, but the next day, and all subsequent days (I had been away) the backup is now taking upwards of 8 hours.
Magtape $3$MKC500: (EMU1), device type HP Ultrium 2-SCSI, is online, allocated, mounted foreign, record-oriented device, file-oriented device, available to cluster, error logging is enabled, controller supports compaction (compaction enabled), device supports fastskip (per_io). Error count 0 Operations completed 381412841 Owner process "BATCH_966" Owner UIC [COMCEN,BACKUP] Owner process ID 2CC84AF3 Dev Prot S:RWPL,O:RWPL,G:R,W Reference count 4 Default buffer size 32256 Volume label "AUG7 " Relative volume no. 0 Record size 0 Transaction count 1 Mount status Process Mount count 1 ACP process name "" Density default Format Normal-11 Allocation class 3 Volume status: odd parity.
I say normally, in that the space or file makeup on the disk(s) has not change dramatically in months, so I have a normal average of around 2.45 hours. A daily backup runs from midnight to approx 2:45 am when all is working correctly.
I must add, the sequence of backup is thus:
DSA2:, DSA1:, DSA3: and then $3$DKA400
Background:
OpenVMS 7.3-2 clustered system.
EMU1 (DS25) is the primary server, EMU2 (DS20) is the disaster server.
DS25 (alloclass 3) with internal SCSI shadowed to a remote (approx 1 kilometre) DS20 (alloclass 4). This is a fibre link via EWA <-> fibre converters.
3 sets of shadowed disks, DSA1:, DSA2: and DSA3: all comprised of 1 disk on DS25 and another on the DS20.
On the DS25, output of show device d:
DSA1: Mounted 0 DATA1 136.50GB 124 2 DSA2: Mounted 0 DATA2 206.39GB 6 2 DSA3: Mounted 0 DATA3 17.70GB 1192 2 $3$DKA0: (EMU1) Mounted 0 ALPHASYS 23.48GB 1713 1 $3$DKA100: (EMU1) ShadowSetMember 0 (member of DSA1:) $3$DKA200: (EMU1) ShadowSetMember 0 (member of DSA2:) $3$DKA300: (EMU1) ShadowSetMember 0 (member of DSA3:) $3$DKA400: (EMU1) Mounted 0 ADMIN 28.20GB 35 1 $3$DQA0: (EMU1) Online 0 $3$DQA1: (EMU1) Offline 1 $3$DQB0: (EMU1) Offline 1 $3$DQB1: (EMU1) Offline 1 $4$DKB0: (EMU2) Mounted 0 (remote mount) 1 $4$DKB100: (EMU2) ShadowSetMember 0 (member of DSA1:) $4$DKB200: (EMU2) ShadowSetMember 0 (member of DSA2:) $4$DKB300: (EMU2) ShadowSetMember 0 (member of DSA3:) $4$DKB400: (EMU2) Mounted 0 (remote mount) 1 $4$DQA0: (EMU2) Online 0
This system has been running for years without any glitches, with the only thing done in recent history being the replacement of a failed disk on the DS25 for the DSA3: shadow set.
I attach the analyze/system sho dev dsa1 and dsa2 if that helps.
I also have T4 data for periods before, after and during the exchange of disk. In summation, it shows a continual LOW Direct IO during the backup period over the 8 to 10 hours rather than the high rates normally seen for 2 hours.
Thank you for your assistance,
Mark
(I wasn't going to try my luck adding more attachments, so anything else like T4 I can append to another reply/post)
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-07-2014 04:21 AM - edited 08-07-2014 04:23 AM
08-07-2014 04:21 AM - edited 08-07-2014 04:23 AM
Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk
Mark,
what's slowing down the backup now ?
- slower writes to tape ?
- slower reads from disks ?
- slower intersite link ?
If you look at the T4 data for the disks (during backup), do you now see larger than usual queues '[MON.DISK]Qln' on the local or remote mbrs ? Or larger Response times?
What I do to the test read throughput from disks is:
$ SPAWN/NOWAIT BACKUP/PHY/NOCRC/GROUP=0 disk: NLA0:x.x/SAVE then $ MONI DISK and look at the Dirio-Rate
You could do this for the shadowsets and also for individually mounted disks from the local and remote site.
Then try a BACKUP/PHYSICAL from a local disk to tape.
Any resource problems (nonpaged pool) on the node running backup ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-07-2014 05:23 PM - edited 08-07-2014 06:35 PM
08-07-2014 05:23 PM - edited 08-07-2014 06:35 PM
Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk
Volker,
what's slowing down the backup now ? - slower writes to tape ? - slower reads from disks ? - slower intersite link ?
My friend, if I knew that, the problem would be solved.
We back up from the main server (DS25) so I assume (correctly?) that all disk access is local if there are served disks in the shadow set that are local, which there are.
Therefore, I have only T4 data at hand for the DS25.
If you look at the T4 data for the disks (during backup), do you now see larger than usual queues '[MON.DISK]Qln' on the local or remote mbrs ? Or larger Response times?
The mon.disk qlen is abnormally high for the duration of 10 hours the backup is running. I will attach the 26th July T4 (before the disk replacement) and the T4 after the disk replacement, in this case I've picked 5-aug-2014.
You can see for yourself the stark difference. The backup normally runs from midnight till about 3am, as it did on the 26th July. Compare that to the backup on the 5th August, where the Qlen is high from midnight until 10:30am when it finally finishes (or I think it was killed off because of impact on the system - I have to find that out).
The disk oprate is normally a spike for 2-3 hours, but now it's a steady sprawl over 10 hours.
What I do to the test read throughput from disks is: $ SPAWN/NOWAIT BACKUP/PHY/NOCRC/GROUP=0 disk: NLA0:x.x/SAVE then $ MONI DISK and look at the Dirio-Rate You could do this for the shadowsets and also for individually mounted disks from the local and remote site.
I will get back to you after running this (I need to wait for the backup to finish before clobbering the system more...).
Then try a BACKUP/PHYSICAL from a local disk to tape. Any resource problems (nonpaged pool) on the node running backup ?
It would be odd (far too eerily coincidental) that a change in disk would correspond with that.
Anyway, sysgen reports:
Parameter Name Current Default Min. Max. Unit Dynamic -------------- ------- ------- ------- ------- ---- ------- NPAGEDYN 39993344 1048576 163840 -1 Bytes NPAGEVIR 156827648 8388608 163840 -1 Bytes NPAG_BAP_MIN 40960 0 0 -1 Bytes NPAG_BAP_MAX 131072 0 0 -1 Bytes NPAG_BAP_MIN_PA 0 0 0 -1 Mbytes NPAG_BAP_MAX_PA 1024 -1 0 -1 Mbytes NPAG_RING_SIZE 2048 2048 0 -1 Entries NPAGECALC 0 1 0 2 Coded-valu NPAGERAD 0 0 0 -1 Bytes NPAG_INTERVAL 30 30 0 -1 Seconds D NPAG_GENTLE 85 85 0 100 Percent D NPAG_AGGRESSIVE 50 50 0 100 Percent D SYSGEN>
Backup is running now so here tis:
...$COMMON:COMMON.ADMIN> SHOW MEMORY/POOL/FULL System Memory Resources on 8-AUG-2014 10:20:22.33 Nonpaged Dynamic Memory (Lists + Variable) Current Size (MB) 38.14 Current Size (Pagelets) 78112 Initial Size (MB) 38.14 Initial Size (Pagelets) 78112 Maximum Size (MB) 149.56 Maximum Size (Pagelets) 306304 Free Space (MB) 17.96 Space in Use (MB) 20.17 Largest Var Block (MB) 10.11 Smallest Var Block (bytes) 64 Number of Free Blocks 9941 Free Blocks LEQU 64 bytes 2017 Free Blocks on Lookasides 5123 Lookaside Space (MB) 1.85 Bus Addressable Memory (Lists + Variable) Current Size (KB) 128.00 Current Size (Pagelets) 256 Initial Size (KB) 128.00 Initial Size (Pagelets) 256 Free Space (KB) 110.87 Space in Use (KB) 17.12 Largest Var Block (KB) 104.00 Smallest Var Block (KB) 6.87 Number of Free Blocks 2 Free Blocks LEQU 64 bytes 0 Free Blocks on Lookasides 0 Lookaside Space (bytes) 0 Paged Dynamic Memory Current Size (MB) 7.68 Current Size (Pagelets) 15744 Free Space (MB) 4.17 Space in Use (MB) 3.50 Largest Var Block (MB) 4.12 Smallest Var Block (bytes) 16 Number of Free Blocks 227 Free Blocks LEQU 64 bytes 198 Lock Manager Dynamic Memory Current Size (MB) 56.89 Current Size (Pages) 7283 Free Space (MB) 50.73 Hits 38995840 Space in Use (MB) 6.16 Misses 0 Number of Empty Pages 6228 Expansions 7283 Number of Free Packets 207799 Packet Size (bytes) 256 Write Bitmap (WBM) Memory Summary Local bitmap count: 0 Local bitmap memory usage (bytes) 0.00 Master bitmap count: 0 Master bitmap memory usage (bytes) 0.00 ...$COMMON:COMMON.ADMIN>
Some timings (this is just indicative - other things impact it, I know):
DSA1: 144GB used, takes 5:25 hours to backup (lots of small files, large files (300 MB each) as well)
DSA2: 73GB used, takes 4:51 hours to backup (lots of small user files on this disk)
DSA3: 16GB used, takes 16minutes - system shared software - UAF/security/rightslist/queues etc.
DKA400: 5GB used - ODS-5 (all the rest of the disks are ODS-2) - Oracle Client Disk. Takes 49 minutes to backup. Non-shadowed. Lots of small files and other detritus Oracle places there.
Thanks for your assistance.
(This forum software is rubbish. I set my included output to Courier new and it removes it!)
(I also attempt to attach the .csv from T4 to which this junk software says:
"The contents of the attachment doesn't match its file type.")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2014 01:38 AM - edited 08-08-2014 03:04 AM
08-08-2014 01:38 AM - edited 08-08-2014 03:04 AM
Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk
Mark,
interesting data ;-)
Let's just look at the BACKUP of the first shadowset DSA2: (local $3$DKA200 and remote $4$DKB200):
Saturday, 26-JUL 00:28-01:46, Opcnt 500 to 1500, then about 150 starting at 01:03 (BACKUP/RECORD phase ?)
Qlen about 5, then about 1 starting at 01:03, Response-time 6-8 ms
Tuesday, 5-AUG 00:30-05:54, Opcnt 80-190, then about about 150 starting at 05:23 (BACKUP/RECORD phase ?)
Qlen about 7 (all the time), then 1 at 05:23, Response-time 80-120 ms, 6 ms after 05:23
Even the BACKUP of DSA3: on 26-JUL (from the REMOTE mbr $4$DKB300, as $3$DKA300 was not present) seems to have performed well: Opcnt about 1000, Qlen=6, Response-time=5-6 ms
Nearly zero IOs on the remote member, except during the BACKUP/RECORD (?) phase. But this phase shows very similar behaviour BEFORE and AFTER replacement of $3$DKA300:
Could the local SCSI adapter/bus have some problems after hot-swapping $3$DKA300: ?
For disk throughput comparison, I've collected some data from a DS25 (V7.3-1, Symbios 895, HP U320 15K RPM 36.4 GB disk). This SCSI adapter should have a max. throughput of 80 MB/sec in LVD mode. 2-mbr shadowsets similar to your config:
$ BACKUP/PHYS/NOCRC/GROUP=0 DSAx: NLA0: x.x/SAVE achieves about 2234 DIRIO/sec -> 70 MB/sec, constant queue length of about 405 (MONI DISK/ITEM=Q/INT=1)
Same results on the 'remote' DS25 with 10K 36.4 GB disk (locally mounted, not a shadowset mbr). The limiting throughput factor seems to be the Symbios 895 in this case.
You could perform a BACKUP/IMAGE DSA2: NLA0:x.x/SAVE and if you also get the similar 'bad performance', try to temporarily DISMOUNT/POLICY=MINI $3$DKA200: (the local mbr) and see if throughput increases ! If so, this would confirm the local SCSI adapter/bus as a suspect.
Maybe with even less impact on your system, you can try the above BACKUP test against $3$DKA400: and also against $4$DKB400: (after first mouting it on EMU1).
Update: you may also want to play with the Read Cost parameter. The remote mbr of DSA2: has a read cost of 501. You may try to reduce that and increase the read cost of the local mbr. This way you could force reads from remote, if that member provides better performance.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2014 03:33 AM
08-08-2014 03:33 AM
Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk
Hi Volker,
"Mark,
interesting data ;-)
"
In your element, hey, Volker :-) You just love helping me! :-)
I apologise I didn't give you the DIO figures earlier, but the system's been busy.
Anyway, as per your suggestion, here are the results of SPAWN/NOWAIT BACKUP/PHY/NOCRC/GROUP=0 disk: NLA0:x.x/SAVE
DSA1: Approx 76 max.
DSA2: Approx 77 max.
DSA3: Approx 2767 max.
Snap shot of DSA3:
DSA3: DATA3 2767.66 146.75 0.00 2767.66
DSA3: DATA3 2798.66 372.27 0.00 2798.66
DSA3: DATA3 2762.66 433.01 0.00 2798.66
A marked difference!
So here's a summary after running DSA1:, DSA2:, DSA3: and $3$DKA400:
OpenVMS Monitor Utility DISK I/O STATISTICS on node EMU1 8-AUG-2014 19:52:06.38 I/O Operation Rate CUR AVE MIN MAX $3$DKA0: (EMU1) ALPHASYS 4.00 1.10 0.00 24.66 $3$DKA100: (EMU1) DATA1 0.00 36.45 0.00 76.00 $3$DKA200: (EMU1) DATA2 0.00 8.28 0.00 76.66 $3$DKA300: (EMU1) DATA3 0.66 551.49 0.00 2834.33 $3$DKA400: (EMU1) ADMIN 21.33 7.75 0.00 81.00 DSA1: DATA1 0.00 36.45 0.00 76.00 DSA2: DATA2 0.00 8.28 0.00 76.66 DSA3: DATA3 0.66 548.33 0.00 2834.33 $4$DKB100: (EMU2) DATA1 R 0.00 0.00 0.00 0.00 $4$DKB200: (EMU2) DATA2 R 0.00 0.00 0.00 0.00 $4$DKB300: (EMU2) DATA3 R 0.66 11.09 0.00 50.33
"Even the BACKUP of DSA3: on 26-JUL (from the REMOTE mbr $4$DKB300, as $3$DKA300 was not present) seems to have performed well: Opcnt about 1000, Qlen=6, Response-time=5-6 ms
Nearly zero IOs on the remote member, except during the BACKUP/RECORD (?) phase. But this phase shows very similar behaviour BEFORE and AFTER replacement of $3$DKA300:"
Oh yes, I forgot to mention, sorry, that the disk had been dead a week before I had a chance to install the new one, so yes, the backup of 26-jul-2014 would indeed have used $4$dkb300 as the primary source for reading when backing up DSA3:. You have been paying attention. ;-)
You are correct, the backup/record phase is quite intense because these are FULL backups, not incrementals. Again, I forgot to mention this.
"Could the local SCSI adapter/bus have some problems after hot-swapping $3$DKA300: ?"
Ah, thank you for suggesting this. If you are suggesting this after looking at the data, then perhaps I am not as mad as I think. I too believe this must be where the issue lies. See below for more confirmation.
Ok, so I ran the SPAWN/NOWAIT BACKUP/PHY/NOCRC/GROUP=0 disk: NLA0:x.x/SAVE on the remote server, and looky here at the results:
DISK I/O STATISTICS on node EMU2 8-AUG-2014 20:15:45.15 I/O Operation Rate CUR AVE MIN MAX $4$DKB0: (EMU2) ALPHASYS1 0.00 0.57 0.00 23.00 $4$DKB100: (EMU2) DATA1 0.00 328.45 0.00 1148.00 $4$DKB200: (EMU2) DATA2 0.00 195.70 0.00 1152.00 $4$DKB300: (EMU2) DATA3 1.00 98.69 0.00 942.00 $4$DKB400: (EMU2) ADMIN1 0.00 88.00 0.00 1093.33 DSA1: DATA1 0.00 328.45 0.00 1148.00 DSA2: DATA2 0.00 195.70 0.00 1152.00 DSA3: DATA3 0.66 90.13 0.00 942.00 $3$DKA100: (EMU1) DATA1 R 0.00 0.00 0.00 0.00 $3$DKA200: (EMU1) DATA2 R 0.00 0.00 0.00 0.00 $3$DKA300: (EMU1) DATA3 R 0.33 25.90 0.00 194.33
There's just a little bit of difference... :-)
So, as you suggest, this is a controller issue, almost certainly on PKA
(analyze/system clue scsi /summary gives us this: )
OpenVMS (TM) system analyzer SCSI Summary Configuration: --------------------------- SPDT Port STDT SCSI-Id SCDT SCSI-Lun Device UCB Type Rev -------------- -------------- -------------- -------- -------- ------ ---- 8218D540 PKD0 821874C0 PKC0 82184040 5 821CC4C0 0 MKC500 821A5800 Ultriu F63D 82051A40 PKB0 81F53300 PKA0 81F57F00 0 81F58100 0 DKA0 81ED0B80 BF0368 HPB2 820D9C80 1 81EC6840 0 DKA100 82171500 BD3008 HPB4 82172080 2 82172280 0 DKA200 82171A40 BD3008 HPB4 82172F80 3 82173180 0 DKA300 82172700 BF0368 HPB9 82173E80 4 82174080 0 DKA400 82173600 BF0368 HPB9
You're a genius Volker. I didn't think to look at the other system to compare its throughput. I think I might add a wrapper around the batch job for the backup to take out the local disk members of DSA1:, DSA2: and DSA3: and hope to speed up the backup. Today the backup ended at 11:30 so by the middle of next week, it might be finishing after midday - not a recipe for happy users (doing a directory command is horrible when this is happening).
I will start a case with HP.
Cheers
Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2014 03:42 AM
08-08-2014 03:42 AM
Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk
Mark,
why does the DSA3: (local $3$DKA300) backup test perform so well ?
Did you see my update regarding 'read cost' ( $ SET DEV/READ_COST=n $3$DKcn:) ? If you can get more throughput this way, it will be safer, as the 2nd members don't need to be dismounted.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2014 03:54 AM
08-08-2014 03:54 AM
Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk
"Mark,
why does the DSA3: (local $3$DKA300) backup test perform so well ?"
Beats me. Perhaps it was an anomaly? I will run it again.
"Did you see my update regarding 'read cost' ( $ SET DEV/READ_COST=n $3$DKcn:) ? If you can get more throughput this way, it will be safer, as the 2nd members don't need to be dismounted."
No, I didn't, sorry.
You know, for the life of me, I knew there was a setting to change the "priority" of the disks in a shadow set and I couldn't think of it. Thanks for reminding me of that.
I agree this would be a safer option, given the reason for the shadow set across multiple sites is for data security.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2014 04:00 AM
08-08-2014 04:00 AM
Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk
Ran it again on DSA3:
OpenVMS Monitor Utility DISK I/O STATISTICS on node EMU1 8-AUG-2014 20:59:31.19 I/O Operation Rate CUR AVE MIN MAX $3$DKA0: (EMU1) ALPHASYS 0.00 0.15 0.00 1.66 $3$DKA100: (EMU1) DATA1 0.00 0.00 0.00 0.00 $3$DKA200: (EMU1) DATA2 0.00 0.00 0.00 0.00 $3$DKA300: (EMU1) DATA3 2798.66 2390.29 1499.66 2798.66 $3$DKA400: (EMU1) ADMIN 0.00 0.79 0.00 12.00 DSA1: DATA1 0.00 0.00 0.00 0.00 DSA2: DATA2 0.00 0.00 0.00 0.00 DSA3: DATA3 2798.66 2390.24 1499.33 2798.66 $4$DKB100: (EMU2) DATA1 R 0.00 0.00 0.00 0.00 $4$DKB200: (EMU2) DATA2 R 0.00 0.00 0.00 0.00 $4$DKB300: (EMU2) DATA3 R 48.33 42.12 27.00 49.33
Same result. It has to be getting the primary data from the remote node.
I will check.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2014 04:06 AM
08-08-2014 04:06 AM
Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk
Yes,
Analyze/system sho dev dsa3:
Device $4$DKB300 ... Master Member Index 0 Status 000000A0 src,valid Read Cost 000001F5 Site 00000000 SM Time Out 120 UID 1162012C 00000004 UCB 821AC880 Device $3$DKA300 Index 1 Status 000000A0 src,valid Read Cost 00000002 Site 00000000 SM Time Out 120 UID 1161012C 00000003 UCB 82172700
That explains the high throughput for DSA3:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2014 04:21 AM
08-08-2014 04:21 AM
Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk
Mark,
Same result. It has to be getting the primary data from the remote node.
No, I don't think so: the MONITOR data of EMU1 confirms, that the majority of the reads are from $3$DKA300: (the LOCAL member) and only SOME IOs are from $4$DKB300: (the REMOTE member). And the local disk $3$DKA300: is actually the one you've replaced.
A HIGHER read cost should result in LESS IOs to be sent to that shadowset member. $ SHOW SHADOW also shows the Read Cost values, so no need to invoke SDA - although that is my favorite tool ;-)
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2014 04:30 AM - edited 08-08-2014 04:31 AM
08-08-2014 04:30 AM - edited 08-08-2014 04:31 AM
Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk
"No, I don't think so: the MONITOR data of EMU1 confirms, that the majority of the reads are from $3$DKA300: (the LOCAL member) and only SOME IOs are from $4$DKB300: (the REMOTE member). And the local disk $3$DKA300: is actually the one you've replaced."
I stand corrected, you are right. It's just the master member of the set because it fell into that role when the $3$DKA300 disk died. My apologies.
Could something as mundane as a io autoconfigure help here?
Or, the solution is to replace all the other disks and get wonderful throughput back again... :-)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2014 04:39 AM - edited 08-08-2014 04:41 AM
08-08-2014 04:39 AM - edited 08-08-2014 04:41 AM
Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk
Mark,
you can change the Read Cost of the shadowset members on the fly.
Maybe start to experiment on EMU2:
EMU2 $ SET DEV/READ=1 $3$DKA200:
EMU2 $ SET DEV/READ=500 $4$DKB200:
Then try the BACKUP/PHY DSA2: test on EMU2. Most of the reads should now be from the EMU1 member and should be 'slow'. If that works as expected, apply the same setting on EMU1 for DSA1: and DSA2: by reducing the Read Cost for the (remote) EMU2 members and increasing it for the (local) EMU1 members.
To reset the read cost values back to their defaults, use $ SET DEV/READ DSA2:
And no, I don't think a SYSMAN IO AUTO will help here. Probably only a reboot or even power-off/on of EMU1. But maybe can can work around this problem by modifying the Read Cost...
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2014 04:52 AM
08-08-2014 04:52 AM
Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk
Volker,
I did as you requested:
SET DEV/READ=1 $3$DKA200:
SET DEV/READ=500 $4$DKB200:
and for good measure:
SET DEV/READ=1 $3$DKA100:
SET DEV/READ=500 $4$DKB100:
Results (as you expected):
OpenVMS Monitor Utility DISK I/O STATISTICS on node EMU2 8-AUG-2014 21:50:17.29 I/O Operation Rate CUR AVE MIN MAX $4$DKB0: (EMU2) ALPHASYS1 0.00 0.12 0.00 5.00 $4$DKB100: (EMU2) DATA1 0.00 0.00 0.00 0.00 $4$DKB200: (EMU2) DATA2 0.00 0.00 0.00 0.00 $4$DKB300: (EMU2) DATA3 0.33 1.73 0.00 7.00 $4$DKB400: (EMU2) ADMIN1 0.00 0.00 0.00 0.00 DSA1: DATA1 59.00 37.16 0.00 76.00 DSA2: DATA2 0.00 18.73 0.00 76.66 DSA3: DATA3 0.33 1.17 0.00 3.66 $3$DKA100: (EMU1) DATA1 R 59.00 37.15 0.00 76.00 $3$DKA200: (EMU1) DATA2 R 0.00 18.73 0.00 76.66 $3$DKA300: (EMU1) DATA3 R 0.33 0.98 0.00 2.33
Curiosity got the better of me:
SET DEV/READ=1 $3$DKA300:
SET DEV/READ=500 $4$DKB300:
Output:
OpenVMS Monitor Utility DISK I/O STATISTICS on node EMU2 8-AUG-2014 21:51:47.31 I/O Operation Rate CUR AVE MIN MAX $4$DKB0: (EMU2) ALPHASYS1 0.00 0.15 0.00 5.00 $4$DKB100: (EMU2) DATA1 0.00 0.00 0.00 0.00 $4$DKB200: (EMU2) DATA2 0.00 0.00 0.00 0.00 $4$DKB300: (EMU2) DATA3 32.33 5.32 0.00 34.66 $4$DKB400: (EMU2) ADMIN1 0.00 0.00 0.00 0.00 DSA1: DATA1 0.00 24.77 0.00 76.00 DSA2: DATA2 0.00 12.48 0.00 76.66 DSA3: DATA3 2340.66 289.70 0.00 2361.00 $3$DKA100: (EMU1) DATA1 R 0.00 24.77 0.00 76.00 $3$DKA200: (EMU1) DATA2 R 0.00 12.48 0.00 76.66 $3$DKA300: (EMU1) DATA3 R 2341.00 289.57 0.00 2361.00
It's like the new disk has stolen all the bandwidth... :-(
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2014 04:57 AM
08-08-2014 04:57 AM
Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk
Mark,
It's like the new disk has stolen all the bandwidth... :-(
Or all the lower SCSI IDs have some problem ? How about the performance of $3$DKA400: ?
The experiments with changing Read Cost have shown, that this would be a valid, quick and safe workaround to decrease your backup runtime, until this problem can really be solved.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2014 05:14 AM - edited 08-08-2014 05:18 AM
08-08-2014 05:14 AM - edited 08-08-2014 05:18 AM
Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk
Volker,
"Or all the lower SCSI IDs have some problem ? How about the performance of $3$DKA400: ?"
Nope. I recall I tested it. It came up as 81. A tad better than 76 (I'm an optimist).
OpenVMS Monitor Utility DISK I/O STATISTICS on node EMU1 8-AUG-2014 19:52:06.38 I/O Operation Rate CUR AVE MIN MAX $3$DKA0: (EMU1) ALPHASYS 4.00 1.10 0.00 24.66 $3$DKA100: (EMU1) DATA1 0.00 36.45 0.00 76.00 $3$DKA200: (EMU1) DATA2 0.00 8.28 0.00 76.66 $3$DKA300: (EMU1) DATA3 0.66 551.49 0.00 2834.33 $3$DKA400: (EMU1) ADMIN 21.33 7.75 0.00 81.00 DSA1: DATA1 0.00 36.45 0.00 76.00 DSA2: DATA2 0.00 8.28 0.00 76.66 DSA3: DATA3 0.66 548.33 0.00 2834.33 $4$DKB100: (EMU2) DATA1 R 0.00 0.00 0.00 0.00 $4$DKB200: (EMU2) DATA2 R 0.00 0.00 0.00 0.00 $4$DKB300: (EMU2) DATA3 R 0.66 11.09 0.00 50.33
I really appreciate all your help, Volker. Thank you.
I will use your suggestion of setting the read cost appropriately to preference the remote disks. That is a great solution/workaround.
(If the moderators and yourself, Volker, don't mind, I will leave this as unsolved and await the resolution of the case from HP to hopefully resolve this conundrum).
Cheers,
Mark.
Just a note for anyone reading this, the command to reset the read cost is:
SET DEVICE/READ=1 DSA1:
You may specify any read cost because the O/S ignores it and resets the read costs for each device in the shadow set to its default.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2014 05:23 AM
08-08-2014 05:23 AM
Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk
Mark,
now if ALL disks on that SCSI bus EXCEPT the newly replaced one perform badly after the replacement, you might also have to ask yourself: is the replacement disk $3$DKA300 itself causing the problems ? Is that disk of the SAME type as the previous one ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2014 06:06 PM - edited 08-08-2014 06:12 PM
08-08-2014 06:06 PM - edited 08-08-2014 06:12 PM
Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk
Hi Volker,
"now if ALL disks on that SCSI bus EXCEPT the newly replaced one perform badly after the replacement, you might also have to ask yourself: is the replacement disk $3$DKA300 itself causing the problems ?"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-09-2014 06:09 PM - edited 08-09-2014 06:13 PM
08-09-2014 06:09 PM - edited 08-09-2014 06:13 PM
Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk
>I attempt to attach the .csv from T4 to which this junk software says:
>"The contents of the attachment doesn't match its file type."
I also get that attachment error and Excel likes that file fine. Your .zip solution is a good workaround.
Forum problems should be reported here:
http://h30499.www3.hp.com/t5/Community-Feedback-Suggestions/bd-p/community-feedback-suggestions
>If ... Volker, don't mind, I will leave this as unsolved ...
You can still assign Kudos to each helpful answer by clicking on the Kudos star.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-12-2014 07:04 PM
08-12-2014 07:04 PM
Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk
What specific sort of disk is it? SHOW DEVICE /FULL $3$DKA300: will usually give some sort of an identity for the device, otherwise there are some SCSI-level tools to gather information.
With V7.3-2, you have access to dissimilar device shadowing (DDS) and dynamic volume expansion (DVE), two features that are worth enabling when you next have some scheduled downtime or can otherwise dismount the shadowsets briefly.
36 GB disks are ancient, too. I've been scrounging new-old-stock 146 GB SCSI disk drives for ~US$40 for a while now. Amazon has available HP 72 GB 15K SCSI disks in what is likely the appropriate "Universal" disk sled for that AlphaServer DS25 for ~US$15, with free shipping. Several different 146 GB 15K HP drives on offer for ~$37 to $40 over there, too.
I'd look to patch to current V7.3-2 (or to upgrade), enable DDS/DVE on the disks, and start a migration to those 72 GB or 146 GB drives, and then let HP know you've upgraded your storage for purposes of the support contract.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-14-2014 09:54 PM - edited 08-14-2014 09:54 PM
08-14-2014 09:54 PM - edited 08-14-2014 09:54 PM
SolutionThe situation was finally resolved.
I placed a support call and after some procrastination by the people at HP that do such diagnosis they came to the conclusion that the tape drive controller needs replacing. What?!?!?!
This was because they saw errors logged in ERRLOG.SYS at the time I hot-swapped the drive. This was DESPITE the fact I told them, and showed them, the evidence I assembled above indicating all BUT the new disk are performing badly.
A HP Engineer came to the site bearing a replacement SCSI controller for the Tape drive.
Before he replaced the card he said the hot-swap was undoubtedly the problem. Other systems with RAID controllers we have are more forgiving of hot-swaps. Even though the hardware supports hot-swap, it seems OpenVMS doesn't necessarily behave well afterwards (I am paraphrasing the engineer).
So, a reboot solved the problem.
Here is the final output from running BACKUP/PHY/NOCRC/GROUP=0 disk: NLA0:x.x/SAVE on all disks in the system after a system reboot:
OpenVMS Monitor Utility DISK I/O STATISTICS on node EMU1 14-AUG-2014 00:03:54.70 I/O Operation Rate CUR AVE MIN MAX $3$DKA0: (EMU1) ALPHASYS 2403.66 272.33 0.00 2404.00 $3$DKA100: (EMU1) DATA1 0.00 429.30 0.00 2519.33 $3$DKA200: (EMU1) DATA2 0.00 297.22 0.00 2526.33
$3$DKA300: (EMU1) DATA3 34.00 341.23 0.00 2674.33 $3$DKA400: (EMU1) ADMIN 0.00 428.48 0.00 3497.00 DSA1: DATA1 0.00 429.30 0.00 2519.33 DSA2: DATA2 0.00 297.22 0.00 2526.33 DSA3: DATA3 34.00 341.14 0.00 2674.33 $4$DKB100: (EMU2) DATA1 R 0.00 0.15 0.00 12.00 $4$DKB200: (EMU2) DATA2 R 0.00 0.28 0.00 5.00 $4$DKB300: (EMU2) DATA3 R 34.00 24.63 0.00 50.33
So, basically, a disk can be hot-swapped into the DS25E (using the on-board motherboard controller) but a reboot needs to be scheduled for everything to get back to normality.
A partial solution, until the reboot can be scheduled, and to reduce the slowness of a backup to tape is to change the read_cost of the disks that make up the shadow set to bias them towards the remote server.
Thank you to Volker for his invaluable help and partial solution/interim work-around.