Operating System - HP-UX
1837179 Members
2470 Online
110113 Solutions
New Discussion

Re: High avserv and avwait, low busy and avque

 
David Child_1
Honored Contributor

High avserv and avwait, low busy and avque

I am trying to work through some I/O performance issues on an L-class server. I haven't worked with this server much until recently so I don't have a feel for how the box has been performing, but it seemed to me to be sluggish. I took a quick look at the basics with vmstat/sar/glance. vmstat showed some processes that were blocked, sar showed ~20%wio. I ran 'sar -d 10 5' and here is a snippet of the return;

$ sar -d 10 5

HP-UX pdsys03 B.11.11 U 9000/800 05/06/05

19:15:34 device %busy avque r+w/s blks/s avwait avserv
19:15:44 c1t2d0 5.17 0.50 8 59 401.26 10.86
c2t2d0 5.37 0.50 8 59 145.87 266.12
c1t0d0 0.80 0.50 1 12 390.68 11.37
c2t0d0 0.60 0.50 1 8 100.41 302.80

c1t2d0 is the primary boot device and c2t2d0 is it's mirror. I noticed the high avserv time on the mirror and high avwait on the primary (and mirror). Neither drive appear to be very busy and they aren't displaying any queue.

In glance it does show c2t2d0 as 100% utilized (c1t2d0 is ~14%). I figured that since they are mirrored the utilization should be about the same.

Idx Device Util Qlen KB/Sec Logl IO Phys IO

1 0/0/1/1.2.0 13/ 14 0.0 56.3/ 147.2 na/ na 15.1/ 26.4
2 0/0/2/0.2.0 100/100 0.0 47.5/ 60.1 na/ na 11.1/ 13.3


I figured the difference could be attributed to reads so I used 'dd' to read out 50000 8k blocks from one of the logical volumes. All the reads appear to be from c1t2d0. Here is the glance output while running the 'dd' on lvol9;

Idx Device Util Qlen KB/Sec Logl IO Phys IO
--------------------------------------------------------------------------------
1 0/0/1/1.2.0 76/ 18 0.0 16507.3/ 778.4 na/ na 2068/105.1
2 0/0/2/0.2.0 100/100 0.0 172.3/ 77.6 na/ na 24.8/ 15.4

I did also notice c2t0d0 displaying the same characteristics as c2t2d0. That drive is used in a separate volume group.

Any ideas or suggestions?

Thanks,
David
6 REPLIES 6
Steven E. Protter
Exalted Contributor

Re: High avserv and avwait, low busy and avque

I would suspect the mirror disk is a lot slower than the primary. Thats probably not possible unless the hardware is out of whack.

Is there anything unbalanced thats only on the slow drive like swap, or is everything mirrored evenly?

If everything is mirrored evenly, I would suspect a hardware issue and run diagnostics with cstm, mstm or xstm.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
David Child_1
Honored Contributor

Re: High avserv and avwait, low busy and avque

Steven,

Thanks for the quick reply. I suspected the same thing so I verified that both drives had the same logical volumes, layout order, etc. They were identical in that respect. I also verified they were the exact same drive model and firmware. SCSI settings (queue_depth, etc) were the same as well.

I had the drive replaced this morning, but it did not fix the problem. I had not mentioned that originally as more of a sanity check (i.e. my sanity) to see if a bad drive was a logical conclusion.

Since c2t0d0 is also showing the same type of avserv time I am beginning to wonder if there is something with the controller. Am I way off base on that?

Thanks again,
David
Steven E. Protter
Exalted Contributor

Re: High avserv and avwait, low busy and avque

A bad drive is the logical conclusion.

If the disk is exlusively a mirror disk then I would suggest this:

pvdisplay -v /dev/dsk/c2t2d0 | more

look for stale sectors.

If there are a lot, the synch process will be whats slowing down i/o on the disk.

I would next look at the controller card or core i/o card, because there could be something wrong there.

There are hardware tests in cstm et al for that sort of issue.

Ideally, I'd spend some keyboard time on the system and poke around.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Tim D Fulford
Honored Contributor

Re: High avserv and avwait, low busy and avque

Hmm This dont look good...

1 - everything on c2 has very slow avserv. This should be about 10ms not 250+
2 - both c1 & c2 have appauling avwait... again this should be 5ms not 350 & 100ms

I'm suspicious of device/controller c2 I THINK you are mirroring in both cases between c1 and c2 eg
c1t2d0 <--> c2t2d0
c1t0d0 <--> c2t0d0

So if C2 is having problems then LVM may well be doing alot of system IO making sute they are in sync. From my experience even a small ammount of systen IO will reduce the performance of the disk pair. (though sys IO is essential in making sure mirror is in sync)

As far as the relative utilisations of primary & mirror disks, LVM is not the best at evening these out. At low IO rate/disk utilisations you will see all reads & writes & system IO on the primary disk & only writes on the mirror. As disks util goes up some reads do start to hit the mirror disk, but generally the primary disk is running harder than mirror.

My guess is that devices on 0/0/2/0 are having problems.. Check syslog and dmesg to see if there are any reported SCSI errors

On the performance front if you have measureware try looking at a few days of data by doing an extract
/opt/perf/bin/extract -xp -d -r -b -e
xfrdDISK.asc will contain the data for each disk

The only last piece of advice I can give and you would need to be a little brave here is the following.. Both entail risk, and you might want to simply do it as a sieres of tests rather than commit to them in your live environment...
o unmirror the disks & only use c1 disks... this may well improve your performance, then you will know c2 is slowing things down.
o Slightly different, and you would need to be equally brave here. Turn off MWC at LVM (leave mirroring on, but stop Mirror Write Consistenmcy) this will almost eliminate the sys IO associated with checking each extent is correctly mirrored. Effectively the writes are split & fireed off to the two disks & no re-checking is done. This is "risky" as when you turn it back on again you wil not know which disk/extent is correct!
lvchange -M n /dev/vg??/lvol??
unmount lvol & re-mount it

To switch it back on
lvchange -M y /dev/vg??/lvol??
unmount lvol & re-mount it

Regards

Tim

-
David Child_1
Honored Contributor

Re: High avserv and avwait, low busy and avque

Steven; pvdisplay comes up clean (all extents current). I looked around in stm and there wasn't much for checking/exercising the controller. I wasn't sure what exercising the disk itself would do, but it indicates the disk must not be busy so I would need to wait for some (rare) downtime.

Tim; I agree on points 1 & 2 and thats why I've been a bit concerned. You are correct in that c1t2d0 <-> c2t2d0 & c1t0d0 <-> c2t0d0.

When running a 'dd' against one of the logical volumes in either vg00 or vg01, all reads do come from the c1 disks. Obviously checking writes is not as easy.

There are No errors in any logs. I did extract measureware data and it corresponds to what I've been seeing. One thing I did notice after putting this data into a chart was that whenever the physical I/O rate goes up, avserv goes down (e.g. PhysIO=15, avserv=100; PhysIO=30, avserv=30).

I did remove both c2 disks from the volume groups and the numbers didn't really change (on any of the disks). I didn't try turning MWC off.

Thanks for all your input.
David Child_1
Honored Contributor

Re: High avserv and avwait, low busy and avque

I did verify that I could read all four disks (c1t[02]d0 & c2t[02]d0) using 'dd'
with equal speed. I tried various block sizes and each time the disks matched fairly close. Since all disks had high avwait and/or avserv times I then compared my 'dd' reads with another server that is basically the same. This other server shows 'normal' (e.g. sub 10ms avwait/avserv) times. Amazingly enough the high wait server actually ran a bit faster.

Ever since I had replaced the disk the system has been performing fine so I am going to wait until the system is rebooted to see how it looks.

Thanks again,
David