Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk

MarkOfAus · ‎08-06-2014

Hello all,

On the 28th July, 8am, I hot swapped out a failed SCSI disk that had failed with a "new" one sent by HP. This disk is a member of a shadow set called DSA3:

Immediately the error count on all the drives on the PKA0: increased their error count by one. I reset the error count then did a show device d/rebuild:

DSA1:                   Yes
DSA2:                   Yes
DSA3:                   No
$3$DKA0:                No
$3$DKA400:              No

So I performed a set volume/rebuild on DSA1: and DSA2:

I then walked away, happy with my achievements...

A nightly backup is performed on the DS25 (EMU1) ONLY to the MKC500: device.

Normally, backups to an LTO take just two hours, but the next day, and all subsequent days (I had been away) the backup is now taking upwards of 8 hours.

Magtape $3$MKC500: (EMU1), device type HP Ultrium 2-SCSI, is online, allocated,
    mounted foreign, record-oriented device, file-oriented device, available to
    cluster, error logging is enabled, controller supports compaction
    (compaction enabled), device supports fastskip (per_io).

    Error count                    0    Operations completed          381412841
    Owner process        "BATCH_966"    Owner UIC               [COMCEN,BACKUP]
    Owner process ID        2CC84AF3    Dev Prot            S:RWPL,O:RWPL,G:R,W
    Reference count                4    Default buffer size               32256

    Volume label            "AUG7  "    Relative volume no.                   0
    Record size                    0    Transaction count                     1
    Mount status             Process    Mount count                           1
    ACP process name              ""
    Density                  default    Format                        Normal-11
    Allocation class               3

  Volume status:  odd parity.

I say normally, in that the space or file makeup on the disk(s) has not change dramatically in months, so I have a normal average of around 2.45 hours. A daily backup runs from midnight to approx 2:45 am when all is working correctly.

I must add, the sequence of backup is thus:

DSA2:, DSA1:, DSA3: and then $3$DKA400

Background:

OpenVMS 7.3-2 clustered system.

EMU1 (DS25) is the primary server, EMU2 (DS20) is the disaster server.

DS25 (alloclass 3) with internal SCSI shadowed to a remote (approx 1 kilometre) DS20 (alloclass 4). This is a fibre link via EWA <-> fibre converters.

3 sets of shadowed disks, DSA1:, DSA2: and DSA3: all comprised of 1 disk on DS25 and another on the DS20.

On the DS25, output of show device d:

DSA1:                   Mounted              0  DATA1         136.50GB   124   2
DSA2:                   Mounted              0  DATA2         206.39GB     6   2
DSA3:                   Mounted              0  DATA3          17.70GB  1192   2
$3$DKA0:        (EMU1)  Mounted              0  ALPHASYS       23.48GB  1713   1
$3$DKA100:      (EMU1)  ShadowSetMember      0  (member of DSA1:)
$3$DKA200:      (EMU1)  ShadowSetMember      0  (member of DSA2:)
$3$DKA300:      (EMU1)  ShadowSetMember      0  (member of DSA3:)
$3$DKA400:      (EMU1)  Mounted              0  ADMIN          28.20GB    35   1
$3$DQA0:        (EMU1)  Online               0
$3$DQA1:        (EMU1)  Offline              1
$3$DQB0:        (EMU1)  Offline              1
$3$DQB1:        (EMU1)  Offline              1
$4$DKB0:        (EMU2)  Mounted              0  (remote mount)                 1
$4$DKB100:      (EMU2)  ShadowSetMember      0  (member of DSA1:)
$4$DKB200:      (EMU2)  ShadowSetMember      0  (member of DSA2:)
$4$DKB300:      (EMU2)  ShadowSetMember      0  (member of DSA3:)
$4$DKB400:      (EMU2)  Mounted              0  (remote mount)                 1
$4$DQA0:        (EMU2)  Online               0

This system has been running for years without any glitches, with the only thing done in recent history being the replacement of a failed disk on the DS25 for the DSA3: shadow set.

I attach the analyze/system sho dev dsa1 and dsa2 if that helps.

I also have T4 data for periods before, after and during the exchange of disk. In summation, it shows a continual LOW Direct IO during the backup period over the 8 to 10 hours rather than the high rates normally seen for 2 hours.

Thank you for your assistance,

Mark

(I wasn't going to try my luck adding more attachments, so anything else like T4 I can append to another reply/post)

Volker Halle · ‎08-07-2014

Mark,

what's slowing down the backup now ?

- slower writes to tape ?

- slower reads from disks ?

- slower intersite link ?

If you look at the T4 data for the disks (during backup), do you now see larger than usual queues '[MON.DISK]Qln' on the local or remote mbrs ? Or larger Response times?

What I do to the test read throughput from disks is:

$ SPAWN/NOWAIT BACKUP/PHY/NOCRC/GROUP=0 disk: NLA0:x.x/SAVE then $ MONI DISK and look at the Dirio-Rate

You could do this for the shadowsets and also for individually mounted disks from the local and remote site.

Then try a BACKUP/PHYSICAL from a local disk to tape.

Any resource problems (nonpaged pool) on the node running backup ?

Volker.

MarkOfAus · ‎08-07-2014

Volker,

what's slowing down the backup now ?

 

- slower writes to tape ?

- slower reads from disks ?

- slower intersite link ?

My friend, if I knew that, the problem would be solved.

We back up from the main server (DS25) so I assume (correctly?) that all disk access is local if there are served disks in the shadow set that are local, which there are.

Therefore, I have only T4 data at hand for the DS25.

If you look at the T4 data for the disks (during backup), do you now see larger than usual queues '[MON.DISK]Qln' on the local or remote mbrs ? Or larger Response times?

The mon.disk qlen is abnormally high for the duration of 10 hours the backup is running. I will attach the 26th July T4 (before the disk replacement) and the T4 after the disk replacement, in this case I've picked 5-aug-2014.

You can see for yourself the stark difference. The backup normally runs from midnight till about 3am, as it did on the 26th July. Compare that to the backup on the 5th August, where the Qlen is high from midnight until 10:30am when it finally finishes (or I think it was killed off because of impact on the system - I have to find that out).

The disk oprate is normally a spike for 2-3 hours, but now it's a steady sprawl over 10 hours.

What I do to the test read throughput from disks is:

 

$ SPAWN/NOWAIT BACKUP/PHY/NOCRC/GROUP=0 disk: NLA0:x.x/SAVE then $ MONI DISK and look at the Dirio-Rate

 

You could do this for the shadowsets and also for individually mounted disks from the local and remote site.

I will get back to you after running this (I need to wait for the backup to finish before clobbering the system more...).

Then try a BACKUP/PHYSICAL from a local disk to tape.

 

Any resource problems (nonpaged pool) on the node running backup ?

It would be odd (far too eerily coincidental) that a change in disk would correspond with that.

Anyway, sysgen reports:

Parameter Name           Current    Default     Min.      Max.     Unit  Dynamic
--------------           -------    -------    -------   -------   ----  -------
NPAGEDYN                 39993344    1048576    163840         -1 Bytes
NPAGEVIR                156827648    8388608    163840         -1 Bytes
NPAG_BAP_MIN                40960          0         0         -1 Bytes
NPAG_BAP_MAX               131072          0         0         -1 Bytes
NPAG_BAP_MIN_PA                 0          0         0         -1 Mbytes
NPAG_BAP_MAX_PA              1024         -1         0         -1 Mbytes
NPAG_RING_SIZE               2048       2048         0         -1 Entries
NPAGECALC                       0          1         0          2 Coded-valu
NPAGERAD                        0          0         0         -1 Bytes
NPAG_INTERVAL                  30         30         0         -1 Seconds    D
NPAG_GENTLE                    85         85         0        100 Percent    D
NPAG_AGGRESSIVE                50         50         0        100 Percent    D
SYSGEN>

Backup is running now so here tis:

 

...$COMMON:COMMON.ADMIN> SHOW MEMORY/POOL/FULL
              System Memory Resources on  8-AUG-2014 10:20:22.33

Nonpaged Dynamic Memory      (Lists + Variable)
  Current Size (MB)                38.14   Current Size (Pagelets)     78112
  Initial Size (MB)                38.14   Initial Size (Pagelets)     78112
  Maximum Size (MB)               149.56   Maximum Size (Pagelets)    306304
  Free Space (MB)                  17.96   Space in Use (MB)           20.17
  Largest Var Block (MB)           10.11   Smallest Var Block (bytes)     64
  Number of Free Blocks             9941   Free Blocks LEQU 64 bytes    2017
  Free Blocks on Lookasides         5123   Lookaside Space (MB)         1.85

Bus Addressable Memory       (Lists + Variable)
  Current Size (KB)               128.00   Current Size (Pagelets)       256
  Initial Size (KB)               128.00   Initial Size (Pagelets)       256
  Free Space (KB)                 110.87   Space in Use (KB)           17.12
  Largest Var Block (KB)          104.00   Smallest Var Block (KB)      6.87
  Number of Free Blocks                2   Free Blocks LEQU 64 bytes       0
  Free Blocks on Lookasides            0   Lookaside Space (bytes)         0

Paged Dynamic Memory
  Current Size (MB)                 7.68   Current Size (Pagelets)     15744
  Free Space (MB)                   4.17   Space in Use (MB)            3.50
  Largest Var Block (MB)            4.12   Smallest Var Block (bytes)     16
  Number of Free Blocks              227   Free Blocks LEQU 64 bytes     198

Lock Manager Dynamic Memory
  Current Size (MB)                56.89   Current Size (Pages)         7283
  Free Space (MB)                  50.73   Hits                     38995840
  Space in Use (MB)                 6.16   Misses                          0
  Number of Empty Pages             6228   Expansions                   7283
  Number of Free Packets          207799   Packet Size (bytes)           256

Write Bitmap (WBM) Memory Summary
  Local bitmap count:     0     Local bitmap memory usage (bytes)       0.00
  Master bitmap count:    0     Master bitmap memory usage (bytes)      0.00
...$COMMON:COMMON.ADMIN>

Some timings (this is just indicative - other things impact it, I know):

DSA1: 144GB used, takes 5:25 hours to backup (lots of small files, large files (300 MB each) as well)

DSA2: 73GB used, takes 4:51 hours to backup (lots of small user files on this disk)

DSA3: 16GB used, takes 16minutes - system shared software - UAF/security/rightslist/queues etc.

DKA400: 5GB used - ODS-5 (all the rest of the disks are ODS-2) - Oracle Client Disk. Takes 49 minutes to backup. Non-shadowed. Lots of small files and other detritus Oracle places there.

Thanks for your assistance.

(This forum software is rubbish. I set my included output to Courier new and it removes it!)

(I also attempt to attach the .csv from T4 to which this junk software says:

"The contents of the attachment doesn't match its file type.")

Volker Halle · ‎08-08-2014

Mark,

interesting data ;-)

Let's just look at the BACKUP of the first shadowset DSA2: (local $3$DKA200 and remote $4$DKB200):

Saturday, 26-JUL 00:28-01:46, Opcnt 500 to 1500, then about 150 starting at 01:03 (BACKUP/RECORD phase ?)

Qlen about 5, then about 1 starting at 01:03, Response-time 6-8 ms

Tuesday, 5-AUG 00:30-05:54, Opcnt 80-190, then about about 150 starting at 05:23 (BACKUP/RECORD phase ?)

Qlen about 7 (all the time), then 1 at 05:23, Response-time 80-120 ms, 6 ms after 05:23

Even the BACKUP of DSA3: on 26-JUL (from the REMOTE mbr $4$DKB300, as $3$DKA300 was not present) seems to have performed well: Opcnt about 1000, Qlen=6, Response-time=5-6 ms

Nearly zero IOs on the remote member, except during the BACKUP/RECORD (?) phase. But this phase shows very similar behaviour BEFORE and AFTER replacement of $3$DKA300:

Could the local SCSI adapter/bus have some problems after hot-swapping $3$DKA300: ?

For disk throughput comparison, I've collected some data from a DS25 (V7.3-1, Symbios 895, HP U320 15K RPM 36.4 GB disk). This SCSI adapter should have a max. throughput of 80 MB/sec in LVD mode. 2-mbr shadowsets similar to your config:

$ BACKUP/PHYS/NOCRC/GROUP=0 DSAx: NLA0: x.x/SAVE achieves about 2234 DIRIO/sec -> 70 MB/sec, constant queue length of about 405 (MONI DISK/ITEM=Q/INT=1)

Same results on the 'remote' DS25 with 10K 36.4 GB disk (locally mounted, not a shadowset mbr). The limiting throughput factor seems to be the Symbios 895 in this case.

You could perform a BACKUP/IMAGE DSA2: NLA0:x.x/SAVE and if you also get the similar 'bad performance', try to temporarily DISMOUNT/POLICY=MINI $3$DKA200: (the local mbr) and see if throughput increases ! If so, this would confirm the local SCSI adapter/bus as a suspect.

Maybe with even less impact on your system, you can try the above BACKUP test against $3$DKA400: and also against $4$DKB400: (after first mouting it on EMU1).

Update: you may also want to play with the Read Cost parameter. The remote mbr of DSA2: has a read cost of 501. You may try to reduce that and increase the read cost of the local mbr. This way you could force reads from remote, if that member provides better performance.

Volker.

MarkOfAus · ‎08-08-2014

Hi Volker,

"Mark,

interesting data ;-)

"

In your element, hey, Volker :-) You just love helping me! :-)

I apologise I didn't give you the DIO figures earlier, but the system's been busy.

Anyway, as per your suggestion, here are the results of SPAWN/NOWAIT BACKUP/PHY/NOCRC/GROUP=0 disk: NLA0:x.x/SAVE

DSA1: Approx 76 max.

DSA2: Approx 77 max.

DSA3: Approx 2767 max.

Snap shot of DSA3:

DSA3: DATA3 2767.66 146.75 0.00 2767.66
DSA3: DATA3 2798.66 372.27 0.00 2798.66

DSA3: DATA3 2762.66 433.01 0.00 2798.66

A marked difference!

So here's a summary after running DSA1:, DSA2:, DSA3: and $3$DKA400:

                            OpenVMS Monitor Utility
                              DISK I/O STATISTICS
                                 on node EMU1
                             8-AUG-2014 19:52:06.38

I/O Operation Rate                         CUR        AVE        MIN        MAX

$3$DKA0:      (EMU1)   ALPHASYS           4.00       1.10       0.00      24.66
$3$DKA100:    (EMU1)   DATA1              0.00      36.45       0.00      76.00
$3$DKA200:    (EMU1)   DATA2              0.00       8.28       0.00      76.66
$3$DKA300:    (EMU1)   DATA3              0.66     551.49       0.00    2834.33
$3$DKA400:    (EMU1)   ADMIN             21.33       7.75       0.00      81.00
DSA1:                  DATA1              0.00      36.45       0.00      76.00
DSA2:                  DATA2              0.00       8.28       0.00      76.66
DSA3:                  DATA3              0.66     548.33       0.00    2834.33
$4$DKB100:    (EMU2)   DATA1        R     0.00       0.00       0.00       0.00
$4$DKB200:    (EMU2)   DATA2        R     0.00       0.00       0.00       0.00
$4$DKB300:    (EMU2)   DATA3        R     0.66      11.09       0.00      50.33

"Even the BACKUP of DSA3: on 26-JUL (from the REMOTE mbr $4$DKB300, as $3$DKA300 was not present) seems to have performed well: Opcnt about 1000, Qlen=6, Response-time=5-6 ms

Nearly zero IOs on the remote member, except during the BACKUP/RECORD (?) phase. But this phase shows very similar behaviour BEFORE and AFTER replacement of $3$DKA300:"

Oh yes, I forgot to mention, sorry, that the disk had been dead a week before I had a chance to install the new one, so yes, the backup of 26-jul-2014 would indeed have used $4$dkb300 as the primary source for reading when backing up DSA3:. You have been paying attention. ;-)

You are correct, the backup/record phase is quite intense because these are FULL backups, not incrementals. Again, I forgot to mention this.

"Could the local SCSI adapter/bus have some problems after hot-swapping $3$DKA300: ?"

Ah, thank you for suggesting this. If you are suggesting this after looking at the data, then perhaps I am not as mad as I think. I too believe this must be where the issue lies. See below for more confirmation.

Ok, so I ran the SPAWN/NOWAIT BACKUP/PHY/NOCRC/GROUP=0 disk: NLA0:x.x/SAVE on the remote server, and looky here at the results:

                              DISK I/O STATISTICS
                                 on node EMU2
                             8-AUG-2014 20:15:45.15

I/O Operation Rate                         CUR        AVE        MIN        MAX

$4$DKB0:      (EMU2)   ALPHASYS1          0.00       0.57       0.00      23.00
$4$DKB100:    (EMU2)   DATA1              0.00     328.45       0.00    1148.00
$4$DKB200:    (EMU2)   DATA2              0.00     195.70       0.00    1152.00
$4$DKB300:    (EMU2)   DATA3              1.00      98.69       0.00     942.00
$4$DKB400:    (EMU2)   ADMIN1             0.00      88.00       0.00    1093.33
DSA1:                  DATA1              0.00     328.45       0.00    1148.00
DSA2:                  DATA2              0.00     195.70       0.00    1152.00
DSA3:                  DATA3              0.66      90.13       0.00     942.00
$3$DKA100:    (EMU1)   DATA1        R     0.00       0.00       0.00       0.00
$3$DKA200:    (EMU1)   DATA2        R     0.00       0.00       0.00       0.00
$3$DKA300:    (EMU1)   DATA3        R     0.33      25.90       0.00     194.33

There's just a little bit of difference... :-)

So, as you suggest, this is a controller issue, almost certainly on PKA

(analyze/system clue scsi /summary gives us this: )

OpenVMS (TM) system analyzer


SCSI Summary Configuration:
---------------------------
SPDT      Port  STDT   SCSI-Id  SCDT  SCSI-Lun  Device    UCB       Type    Rev
--------------  --------------  --------------  --------  --------  ------  ----
8218D540  PKD0

821874C0  PKC0
                82184040     5
                                821CC4C0     0  MKC500    821A5800  Ultriu  F63D

82051A40  PKB0

81F53300  PKA0
                81F57F00     0
                                81F58100     0  DKA0      81ED0B80  BF0368  HPB2
                820D9C80     1
                                81EC6840     0  DKA100    82171500  BD3008  HPB4
                82172080     2
                                82172280     0  DKA200    82171A40  BD3008  HPB4
                82172F80     3
                                82173180     0  DKA300    82172700  BF0368  HPB9
                82173E80     4
                                82174080     0  DKA400    82173600  BF0368  HPB9

You're a genius Volker. I didn't think to look at the other system to compare its throughput. I think I might add a wrapper around the batch job for the backup to take out the local disk members of DSA1:, DSA2: and DSA3: and hope to speed up the backup. Today the backup ended at 11:30 so by the middle of next week, it might be finishing after midday - not a recipe for happy users (doing a directory command is horrible when this is happening).

I will start a case with HP.

Cheers

Mark

Volker Halle · ‎08-08-2014

Mark,

why does the DSA3: (local $3$DKA300) backup test perform so well ?

Did you see my update regarding 'read cost' ( $ SET DEV/READ_COST=n $3$DKcn:) ? If you can get more throughput this way, it will be safer, as the 2nd members don't need to be dismounted.

Volker.

MarkOfAus · ‎08-08-2014

"Mark,

why does the DSA3: (local $3$DKA300) backup test perform so well ?"

Beats me. Perhaps it was an anomaly? I will run it again.

"Did you see my update regarding 'read cost' ( $ SET DEV/READ_COST=n $3$DKcn:) ? If you can get more throughput this way, it will be safer, as the 2nd members don't need to be dismounted."

No, I didn't, sorry.

You know, for the life of me, I knew there was a setting to change the "priority" of the disks in a shadow set and I couldn't think of it. Thanks for reminding me of that.

I agree this would be a safer option, given the reason for the shadow set across multiple sites is for data security.

MarkOfAus · ‎08-08-2014

Ran it again on DSA3:

                            OpenVMS Monitor Utility
                              DISK I/O STATISTICS
                                 on node EMU1
                             8-AUG-2014 20:59:31.19

I/O Operation Rate                         CUR        AVE        MIN        MAX

$3$DKA0:      (EMU1)   ALPHASYS           0.00       0.15       0.00       1.66
$3$DKA100:    (EMU1)   DATA1              0.00       0.00       0.00       0.00
$3$DKA200:    (EMU1)   DATA2              0.00       0.00       0.00       0.00
$3$DKA300:    (EMU1)   DATA3           2798.66    2390.29    1499.66    2798.66
$3$DKA400:    (EMU1)   ADMIN              0.00       0.79       0.00      12.00
DSA1:                  DATA1              0.00       0.00       0.00       0.00
DSA2:                  DATA2              0.00       0.00       0.00       0.00
DSA3:                  DATA3           2798.66    2390.24    1499.33    2798.66
$4$DKB100:    (EMU2)   DATA1        R     0.00       0.00       0.00       0.00
$4$DKB200:    (EMU2)   DATA2        R     0.00       0.00       0.00       0.00
$4$DKB300:    (EMU2)   DATA3        R    48.33      42.12      27.00      49.33

Same result. It has to be getting the primary data from the remote node.

I will check.

MarkOfAus · ‎08-08-2014

Yes,

Analyze/system sho dev dsa3:

Device $4$DKB300        ...  Master Member
  Index 0 Status    000000A0 src,valid
  Read Cost 000001F5    Site 00000000   SM Time Out 120 UID 1162012C 00000004
                        UCB  821AC880
Device $3$DKA300
  Index 1 Status    000000A0 src,valid
  Read Cost 00000002    Site 00000000   SM Time Out 120 UID 1161012C 00000003
                        UCB  82172700

That explains the high throughput for DSA3:

Volker Halle · ‎08-08-2014

Mark,

Same result. It has to be getting the primary data from the remote node.

No, I don't think so: the MONITOR data of EMU1 confirms, that the majority of the reads are from $3$DKA300: (the LOCAL member) and only SOME IOs are from $4$DKB300: (the REMOTE member). And the local disk $3$DKA300: is actually the one you've replaced.

A HIGHER read cost should result in LESS IOs to be sent to that shadowset member. $ SHOW SHADOW also shows the Read Cost values, so no need to invoke SDA - although that is my favorite tool ;-)

Volker.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk

Extremely slow backup to LTO Tape after replacement of hot plug (shadowed) internal SCSI disk