Re: Degraded backup and Oracle dump performance after 11.23 to 11.31 upgrade

Richard Johnson_1 · ‎10-08-2010

Which fiber channel HBA's are in the new rx3600?

We had similar issues with the AH400A cards using the fcd driver that were fixed in the 10.03.01 version of that driver.

chris huys_4 · ‎10-09-2010

Hi Martin,

small correction.
on HP-UX 11.31 instead of sar -d 1 100, execute

#sar -LdR 1 100

on HP-UX 11.23 it stays sar -d 1 100.

Greetz,
Chris

Martin Roende · ‎10-10-2010

Chris ..
The numbers collected here is taken while the nightly backup is still running.

It means lots of reads from /exp (disk182)

regards Martin RÃ¸nde Andersen

chris huys_4 · ‎10-11-2010

Hi Martin,

Can the file be attached as a .txt file. Having problems in ie to "open" the .Z file.

Greetz,
Chris

Martin Roende · ‎10-11-2010

sar_out.txt in attachment ..

Regards Martin R. A.

chris huys_4 · ‎10-12-2010

Hi Martin,

This looks like the disk_to_tape backup.

09:03:22 disk3_lunpath82 34.34 0.50 0 706 90376 0.00 0.48
[..]
disk182_lunpath7 50.51 0.50 213 0 22562 0.00 2.59
[..]
disk182_lunpath33 62.63 0.50 213 0 22651 0.00 3.16
[..]
disk182_lunpath89 51.52 0.50 213 0 22343 0.00 2.61
[..]
disk182_lunpath108 50.51 0.50 213 0 22570 0.00 2.58
[..]
disk182 100.00 0.50 853 0 90125 0.00 2.73

[..]

tape is "presented" with disk3_lunpath82, disk by the 4 "disk182" lunpaths , i.e. disk182_lunpath7; disk182_lunpath33;disk182_lunpath89;disk182_lunpath108

disk182 varies, during the measured sar period, between 27776 blocks/sec and 156451 blocks/sec

When the vg04 lun, delivers only 27776 blocks to tape, representing 262 read IOs, the other volumegroup luns are requesting 1464 read IOs and 523 write IOs.

When the vg04 lun, delivers 156451 blocs/sec, or 1475 read IOs, the other volumegroup luns are requesting 91 read IOs and 158 write IOs.

summing up the read /write IOs at the "worst for the backup" 27776 = 262 + 1464 + 523 = 2249 IOs

summing up the read/write IOs at the "best for the backup" equals to 1475+91+158=1724 IOs

2 things I would do.

1. move all the luns that are still not under under 11.31 native multipathing disk control, under 11.31 native multipathing control, i.e. the disks that show up with their cxtydz device file in the sar output ..

Average c1t0d2 9.35 0.50 22 54 1402 0.00 1.27
Average c1t1d3 0.46 0.50 1 1 53 0.00 2.03
Average c1t1d4 5.64 0.50 9 25 577 0.00 1.68
Average c7t2d3 0.53 0.50 1 1 45 0.00 2.74
Average c5t0d2 9.32 0.50 22 54 1326 0.00 1.26
Average c5t1d3 0.49 0.50 1 1 46 0.00 3.06
Average c3t2d3 0.57 0.50 1 1 57 0.00 2.60
Average c10t0d2 9.61 0.50 22 54 1333 0.00 1.31
Average c10t1d3 0.44 0.50 1 1 53 0.00 2.29
Average c14t1d3 0.45 0.50 1 1 52 0.00 2.10
Average c14t1d4 5.60 0.50 9 25 640 0.00 1.70
Average c16t2d3 0.54 0.50 1 1 49 0.00 2.62
Average c1t0d1 2.52 0.50 21 19 617 0.00 0.62
Average c10t0d1 2.38 0.50 21 19 658 0.00 0.59
Average c14t0d1 2.27 0.50 21 19 591 0.00 0.55
Average c14t0d2 9.26 0.50 22 55 1391 0.00 1.25
Average c5t0d1 2.48 0.50 21 19 656 0.00 0.60
Average c10t1d4 5.36 0.50 9 25 601 0.00 1.62
Average c5t1d4 5.21 0.50 8 25 572 0.00 1.56
Average c12t2d3 0.61 0.50 1 2 59 0.00 2.52
Average c1t0d6 0.06 0.50 0 0 0 0.00 19.63
Average c5t0d6 0.06 0.50 0 0 0 0.00 21.17
Average c10t0d6 0.06 0.50 0 0 0 0.00 20.82
Average c14t0d6 0.06 0.50 0 0 0 0.00 18.29
Average c1t1d0 0.01 0.50 0 0 0 0.00 0.50
Average c5t2d2 0.01 0.50 0 0 0 0.00 0.31
Average c14t2d2 0.01 0.50 0 0 0 0.00 0.25
Average c5t2d0 0.01 0.50 0 0 0 0.00 0.20

2. double up max_q_depth for disk182, scsimgr set_attr -D /dev/rdisk/disk182 -a max_q_depth=16, lets see if doubling up the IO bandwidth of a particular lun, will get the EVA to favour that lun more...

If the above doesnt work, give the output of
# scsimgr lun_map /dev/rtape/

Also I would still like to see the original 11.23 sar output. ;)

Greetz,
Chris

Martin Roende · ‎10-13-2010

Thanks Chris ..
I will go along getting the change planned and execute your recommandations.

I do not have sar output from the 11.23 installation, and it is no longer running.

As I mentioned, all I have is the collected glance output. I have attached one of the files collected from the 11.23 installation in production. (BUT there is no detailed Disk statistics.)
Oracle dump time was 18:40 and backuptime 00:00

regards Martin

Martin Roende · ‎10-13-2010

Chris ..
After "scsimgr set_attr -D /dev/rdisk/disk182 -a max_q_depth=16" I have made another sar collection. HEre attached new_sar_out.txt.

regards MRA

Martin Roende · ‎10-13-2010

This looks very good Chris , Thankyou !
Seems like thoughput has doubled.

mraMac:Dokumentation mra$ grep Average *sar_out.txt | grep disk182
09:01:44 lunpath %busy avque r/s w/s blks/s avwait avserv
%age num num num num msec msec
device %busy avque r/s w/s blks/s avwait avserv

new_sar_out.txt:Average disk182_lunpath7 17.04 0.50 0 22 40065 0.00 7.86
new_sar_out.txt:Average disk182_lunpath33 17.10 0.50 0 22 40226 0.00 7.89
new_sar_out.txt:Average disk182_lunpath89 17.01 0.50 0 22 39992 0.00 7.85
new_sar_out.txt:Average disk182_lunpath108 17.01 0.50 0 22 40037 0.00 7.85
new_sar_out.txt:Average disk182 68.15 0.50 0 87 160320 0.00 7.86
old_sar_out.txt:Average disk182_lunpath7 48.97 0.50 252 0 26789 0.00 2.11
old_sar_out.txt:Average disk182_lunpath33 48.93 0.50 252 0 26837 0.00 2.10
old_sar_out.txt:Average disk182_lunpath89 44.12 0.50 252 0 26622 0.00 1.88
old_sar_out.txt:Average disk182_lunpath108 44.60 0.50 252 0 26672 0.00 1.91
old_sar_out.txt:Average disk182 100.00 0.50 1009 0 106919 0.00 2.00

chris huys_4 · ‎10-13-2010

Hi Martin,

I suppose this should put at least, the before midnight, part of the backup, back in the "11.23" neighbourhood. ;)

The reason, I asked for the max_q_depth to be increased, was because of the constant 100% busy of the disk182 disk.

%busy avwaitavserv
disk182 100.00 0.50 853 0 90125 0.00 2.73

This mostly means that, with the current max_q_depth, or IO bandwidth, the eva diskarray was giving maximal throughput.

However to check if that is really the maximal througput the eva, can be pushed to deliver, max_q_depth needs to be increased, until at least the %busy goes below 100%.

And as was shown here, the maximal throughput was indeed not reached, as with increasing the max_q_depth, not only %busy went down fairly steep beneath 100%, but also the IO throughput shot up by a fair percentage..

Oh yes, read also this thread, https://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1431801, it has an interesting whitepaper on 11.31 about max_q_depth in duncans' reply.

Greetz,
Chris

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Degraded backup and Oracle dump performance after 11.23 to 11.31 upgrade