Operating System - HP-UX
1826319 Members
3572 Online
109692 Solutions
New Discussion

SCSI Read and Write Errors

 
SOLVED
Go to solution
Aziz Zouagui
Frequent Advisor

SCSI Read and Write Errors

 
12 REPLIES 12
Ken Hubnik_2
Honored Contributor

Re: SCSI Read and Write Errors

Make sure you have all the latest scsi patches. We also had this problem and it ended up being the I/O backplane on the server.
Steven E. Protter
Exalted Contributor

Re: SCSI Read and Write Errors

You have a bad disk or controller.

Back your system up and call hardware support.

lbolts almost always lead to failure of a disk or controller.

Hopefully your system is mirrored.
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Carlos Ruffin
Advisor

Re: SCSI Read and Write Errors

Aziz,

You are having a problem with the following device file:
/dev/dsk/c6t0d4. I had this problem before with a disk and the tech that replaced it told me how he pinpointed the problem.

He used the numbers on the lbolt line specifically the (dev: 1f060400). the middle 3 numbers correspond to the c,t,d of the device.

Now that you know the device you can run some diags on it or at least copy the data idf not mirrored.

Hope this helps!!
Aziz Zouagui
Frequent Advisor

Re: SCSI Read and Write Errors

I thank everyone who has replied.

it is not only one device that is shown in the syslog, it is 3 or 4. That's why we're not suspecting the drives.

I have one question, on the Raid, is there a physical disk number listed, I am assuming that the devices that are shown in the syslog are not actual physical drives but only LUNs which is defined on multiple disks.

The Mod 12H Raid unit is not showing any problems with the disks...

Jeff Schussele
Honored Contributor
Solution

Re: SCSI Read and Write Errors

Hi Aziz,

Do a
lvdisplay /dev/vg_name/lv_name
on the affected LVs.

If these lbolts occur during periods of heavy I/O it's possible that the IO Timeout values on the LVs are too low.
We've had these before & increasing the timeout to 180 seconds made them go away.

But if you're getting these during periods of low disk I/O then I suspect a HBA or SCSI bus problem if it's occurring on multiple disks.

Rgds
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Aziz Zouagui
Frequent Advisor

Re: SCSI Read and Write Errors


This is definitely occuring when there is IO on the LVs ?
Anytime there is a lot of writing data to disk, this message pops up on the syslog.

Where do you setup the timeout for LVs ?

This is one of the best boards I have been at, Thank you for all the replies, please keep them coming.

steven Burgess_2
Honored Contributor

Re: SCSI Read and Write Errors

Hi Aziz

You can set the timeout for the disks with the pvchange -t command

man pvchange

for more info

HTH

Steve
take your time and think things through
John Poff
Honored Contributor

Re: SCSI Read and Write Errors

You can set the timeout on the PVs by using 'pvchange -t 180 /dev/dsk/c6t0d4' or you can set it on the LV by 'lvchange -t 180 /dev/yourvg/lvolname'. I think Jeff was talking about setting it on the LV.

JP
steven Burgess_2
Honored Contributor

Re: SCSI Read and Write Errors

Hi Aziz

Sorry it's lvchange you require.

Have a look at this doc

http://www4.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&admit=-938907319+1034802655390+28353475&docId=200000062952613

HTH

Steve
take your time and think things through
Jeff Schussele
Honored Contributor

Re: SCSI Read and Write Errors

Hi All,

Actually you *can* change the PV value & leave the LV at default (if it's at default), because when LV is at default the PV value becomes the default timeout.

IF both PV & LV are at default then the actual I/O Timeout default comes from the driver.

So I'd set the LV to approx 180 (seconds) as that seemed to stop our lbolts under load.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Aziz Zouagui
Frequent Advisor

Re: SCSI Read and Write Errors

Jeff,

Did the timeout increase change have any impact on performance ?

Thanks
Jeff Schussele
Honored Contributor

Re: SCSI Read and Write Errors

No, not really.
Because we were already experiencing the extra time penalty to reset the bus & re-request the read or write.

But at the same time you have to seriously investigate just why you're bottlenecking so badly at the disks.

There can be many reasons for this - including:
1) LVM setup i.e. how the LV is constructed (striped, mirrored, extent size, raw, cooked, etc.)
2) Speed of the disk & subsystem as a whole.
3) Kernel parameters dealing with disks i.e. buffering
4) Thrashing the disks by swapping (page outs/ins) to the SAME disks holding the data OR by running the binaries from the same device as that holds the data.
5) The application itself & how it's accessing the disks & whether it's cacheing efficiently - if at all.

And there are yet more things might that need to be investigated. But you should start investigating just why you're constrained by the disk subsystem.

Good Hunting,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!