1752786 Members
6134 Online
108789 Solutions
New Discussion юеВ

Re: SCSI question

 
Christopher McCray_1
Honored Contributor

SCSI question

I was doing my morning checks when I encountered a block of SCSI related messages in my syslog.log file, which I have included in an attachment. I am assuming that this is only informational and calling my FE is not necessary, but I was wondering if anyone can give me a quick "down and dirty" as to what the codes mean, or maybe point me to a helpful link, just to satisfy my need to know. Thanks in advance.

 

 

P.S. This thread has been moved from Disk to HP-UX > sysadmin. - Hp Forum Moderator

It wasn't me!!!!
4 REPLIES 4
paul courry
Honored Contributor

Re: SCSI question

This is my standard question..............

Do you have a DLT (model?), what it in use and do you have anything else on that SCSI bus?
Vincent Fleming
Honored Contributor

Re: SCSI question

Here's some general information for you; my HPUX is getting better every day, but I can't decode some of the info in the messages...

The messages are telling you that the system is resetting a SCSI bus. This is typically caused by an errant device - usually that the device did not respond within the timeout period. The system will reset the bus (and thereby the device) and try again.

You're OK as long as it succeeds before giving up and failing the I/O. If you have no failed I/O messages, then you're OK for now.

This kind of behavior can be caused many ways. A SCSI bus that is not properly terminated will do this; so will a failing disk.

This stuff:
scb->cdb: 12 00 00 00 80 00

is a dump in hex of the scsi command block (commonly known as a CDB). The first byte indicates the command type. I think 12 is a read, but would have to look it up.

SCSI: Resetting SCSI -- lbolt: 33878436, bus: 0

I think the 33878436 is the minor number of the offending device... I forget. The bus:0 should be obvious.

Now, what to do about it... try to identify which device is causing the problem... Have you moved any SCSI cables recently? Is this part of a cluster? Which cable is bus:0 (internal or external?) Check that the terminators are all seated well, and that all cables are plugged in tightly.

Good luck!
No matter where you go, there you are.
Erik Tong
Advisor

Re: SCSI question

I can't say what is wrong as there is just not enough information.

Comments on some of the data displayed:
The cdb command "12" (hex value) is an inquiry command (not a read). This command is typically used to get general information about the device on discovery. It can be used in alot of ways though, so it tells us nothing about why the host is sending an inquiry command to the device. Unfortunately the command does not tell us what device is being talked to because that is determined in a different part of the SCSI protocol.

The lbolt a time stamp internal to the kernel. It does not tell us anything about the device.

Bus 0 may say something, but if you have multiple SCSI HBAs, they all are bus 0. There are cases with Fibre Channel devices where virtual buses are used, so that can narrow down which set of devices to look it.

Is there a set of devices connected to an HBA that seems to be operating slowly? If so, thats the set of devices to look at.

Does an ioscan seem to hang up anywhere? Is there any "NO_HW"s in the ioscan? I don't know if this technique is valid, but has provided me with some information points:
1) run ioscan from one telnet window. This ioscan will request information from each device.
2) run an "ioscan -fk" from a different window.
3) make note of what is in the "CLAIMED" and what is "SCAN". The "k" option tells the ioscan command to look at the kernel variables rather than go to the device to get the information. Devices in the "SCAN" state are still pending the ioscan is step (1) to return.
4) repeat the "ioscan -fk" a few more times, 10 or so seconds apart. If it takes a (reletive)long time to complete a device, that is a candidate for research. You may get more of the syslog messages at this point.
Angus Crome
Honored Contributor

Re: SCSI question

Two things, if there are external scsi cables involved, try wiggling them and the terminators slightly, if you can consistently get lbolt errors, that cable (or terminator) is the problem and will need to be reseated/replaced. If that does not work, try a dd of each device file
"dd if=/dev/dsk/cXtYdZ of=/dev/null bs=1024 count=100000" to exercize each of the disks.

If neither of these causes repeated lbolt errors, then you may be missing a scsi patch. You should then go get the newest General Release and HW/Crit patch sets and apply at least those pertaining to SCSI.

The only other real possibility is a flaky scsi controller, and that is usually hard to prove without getting HP support involved.
There are 10 types of people in the world, those who understand binary and those who don't - Author Unknown