Operating System - HP-UX
1834465 Members
2891 Online
110067 Solutions
New Discussion

Re: Excessive wio, 30 second delay on disk operations

 
Andrew Scott_3
Regular Advisor

Excessive wio, 30 second delay on disk operations

One of my machines is having a peculiar problem.

It's running 11.11, and has an oracle database running on it.

I'm seeing very high WIO numbers:
13:56:28 %usr %sys %wio %idle

13:56:33 0 0 19 81

13:56:38 1 1 49 50

13:56:43 0 0 50 50

13:56:48 0 0 50 50

13:56:53 0 0 50 50

13:56:58 2 1 49 48

13:57:03 0 0 50 50

13:57:08 1 0 49 49

13:57:13 0 0 21 78

13:57:18 0 0 5 94

13:57:23 3 2 47 48

13:57:28 1 0 30 69

13:57:33 4 1 0 95

13:57:38 0 0 5 94

13:57:43 1 0 36 63


Also, initial load of anything that has to come off of disk, like running "su" for the first time in awhile, takes 30 seconds. Once 30 seconds clicks off, it runs.

I'm guessing this mystery 30 second delay for reading from disk is probably what is causing the high WIO, but how do I figure out what is causing the delay?

Thanks!
6 REPLIES 6
Steven E. Protter
Exalted Contributor

Re: Excessive wio, 30 second delay on disk operations

Shalom,

I'd investigate the layout of the storage oracle is running on.

Its pretty easy to cause this problem by putting too much write intensive data on a raid 5 disk. Or by mixing OS striping with hardware raid.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Andrew Scott_3
Regular Advisor

Re: Excessive wio, 30 second delay on disk operations

Well, I think I've got it, almost.
This is a VERY low use system, it's basically a room heater, really.

But I've got SCSI read errors now:
Feb 21 14:47:08 ofdtora1 vmunix: SCSI: Read error -- dev: b 31 0x020000, errno: 126, resid: 1024,
Feb 21 14:47:08 ofdtora1 vmunix: blkno: 8, sectno: 16, offset: 8192, bcount: 1024.
Feb 21 14:47:08 ofdtora1 vmunix: LVM: VG 64 0x000000: PVLink 31 0x020000 Failed! The PV is not accessible.
Feb 21 14:47:08 ofdtora1 vmunix: LVM: VG 64 0x000000: PVLink 31 0x020000 Recovered.


A PV seems to be disappearing and reappearing.

How do I interpret the hex and track down which device is failing? VGdisplay tells me everything is synced and available.

Andrew Scott_3
Regular Advisor

Re: Excessive wio, 30 second delay on disk operations

Okay, I've figured it out I think:

LVM: VG 64 0x000000: PVLink 31 0x020000 Failed! The PV is not accessible.

LVM, vg 64 is the major group number, 0x000000 is the minor, so this is vg00.

0x020000 is the minor number on the physical disk, which corresponds on my system to c2t0d0.

Am I reading that right?


A. Clay Stephenson
Acclaimed Contributor

Re: Excessive wio, 30 second delay on disk operations

... and the 30 seconds is the default IO timeout value but can be modified by the pvchange command.
If it ain't broke, I can fix that.
Andrew Scott_3
Regular Advisor

Re: Excessive wio, 30 second delay on disk operations

Excellent. I've change swapped the bad disk out, re-mirrored, and I'm sitting pretty. Thanks.

Andrew Scott_3
Regular Advisor

Re: Excessive wio, 30 second delay on disk operations

Bad disk identified and replaced, system back to normal.