1833294 Members
3167 Online
110051 Solutions
New Discussion

SCSI Reset/ lbolt error

 
Sharon Bi
Frequent Advisor

SCSI Reset/ lbolt error

 
8 REPLIES 8
Patrick Wessel
Honored Contributor

Re: SCSI Reset/ lbolt error

take a look here
javascript:openExternal('http://forums.itrc.hp.com/cm/QuestionAnswer/1,1150,0x9b677e990647d4118fee0090279cd0f9,00.html')
There is no good troubleshooting with bad data
Ramesh Donti
Frequent Advisor

Re: SCSI Reset/ lbolt error

Hi,
It looks like you have problem in your SCSI channel. It might be because of the SCSI adapter or cable.
Always Keep Smiling
Gadi
Advisor

Re: SCSI Reset/ lbolt error

Hi Sharon,

Probebly you have SCSI Disk problem.

I need the ioscan -kfnC disk output
for determine the disk.

You can look at /etc/dmesg output too.
Patrick Wessel
Honored Contributor

Re: SCSI Reset/ lbolt error

The trouble comes from "bus 1". You can determine the hardware address of the bus by
ioscan -fC ext_bus

check all disks on that bus (are they present, do they show any errors in the logs of the disk)
There is no good troubleshooting with bad data
Shannon Petry
Honored Contributor

Re: SCSI Reset/ lbolt error

9999 out of 10000 times, this is caused by a failing disk, not by a controller or a cable as someone suggested.
You may also note that the system hangs during this period if it a mounted disk, with any activity.
I.E. If this is your root disk, users will not be able to authenticate untill the disk re-connects.
If my memory serves me correctly, the line "scb->cdb: 28 00 00 06 1c 70 00 00 10 00" will indicate that the disk on controller 0, with address 1 is the culprit. (Sorry if I misinterpreted the disk address, which is highly probable, so please read the next section!)
You should also note that you have messages about PV's being dropped, then returned. PV0 would correspond to the first physical volume, PV1 would be the second, etc....
If you see the error's about the PV dropping, it is a disk going bad, and should be addressed immediately.
Best regards!
Shannon.
Microsoft. When do you want a virus today?
Patrick Wessel
Honored Contributor

Re: SCSI Reset/ lbolt error

I disagree with Shannon. A failing disk is an option, but not a 99,9%. The controller is the most stable part in the chain, disk, cable and terminator are the weakest. These three parts have an equal chance to fail. And there is still the possibility of bad driver or heavy IO load. The first step should be the installation of the current SCSI patch.
The line starting with scb->cdb is the Command Data Block, or in easy words, the SCSI command. I can't see any sense to figure out which command was executed.
Just in case you are interested, the next device that wanted to access the bus was c1t15d0 (bp->b_dev: f101f000). If you want to blame someone for the bus hang, don't start with this disk because it was able to request the bus, so could not be the one, which was blocking it.
There is no good troubleshooting with bad data
Stefan Farrelly
Honored Contributor

Re: SCSI Reset/ lbolt error


If this error ocurrs every day then you do indeed have a problem which needs fixing. Now, how to determine where the problem most likely is;

1. If you have the Diagnostic bundle installed (OnlineDiag - if not install it from the SupportPlus CD or download from the ITRC and once the next errors ocurr you will be able to get lots more info as to where the problem lies).
2. Run xstm, got to Tools -> Utility -> Run then select utility LOGTOOL. From the new window select either raw log and scroll to the bottom of the current log and look for the date/timestamps that match your errors in syslog and you should get info which will tell you exactly which device is causing the problem.
3. From my experience, if logtool shows only "Entry Type: I/O Error" then you have a controller problem, replace the SCSI controller first. If you get different Entry types such as SCSI error or SCSI reset etc. then replace the disk first. Logtool should identify the device in question with its "Device Path" entries, eg. 10/0/14.0. Seeing as the disk drive in your path is the only one with moving parts it should be considered the most likely suspect first.

Now, if you dont want to trawl thru these logtool logs yourself print out the recent ones and fax/email them to someone at the ITRC once youve logged a call and HP will have an experienced engineer look at it for you and decide which device (controller/disk) should be replaced first.

In my experience if no-one has touched the machine recently and the cables are nice and tightly screwed in then you should consider a cable/terminator problem last of all. But once the HP engineer comes in to replace your controller and/or disk then he should check the cables are nice and tight anyway. If your server is on HP hardware support youre well within your rights to demand (nicely) that an HP engineer comes in and replaces whatever part he thinks is causing the problem. Go for it!
Im from Palmerston North, New Zealand, but somehow ended up in London...
Michael Lampi
Trusted Contributor

Re: SCSI Reset/ lbolt error

The CDB indicates a read command (28) was being performed. The disk block start address is 061c70 (hexadecimal), and 16 blocks (8192 bytes) were being read.

Disk drives are configured to undergo a large number of retries if they fail to correctly read a block. It is possible that the time it takes to complete the retry cycles necessary to recover the data are causing the SCSI driver to time out.

Sounds like a bad spot has been located on your drive, particularly if this same address appears in more than one message.
A journey of 1000 steps ends in a mile.