1834248 Members
2302 Online
110066 Solutions
New Discussion

Re: SCSI Errors ???

 
SOLVED
Go to solution
MikeL_4
Super Advisor

SCSI Errors ???

I am receiving SCSI error messages:

scb->cdb 12 00 00 00 80 00
SCSI: Resetting SCSI -- lbolt: 175951085, bus: 10
SCSI: Reset detected -- lbolt: 175951085, bus: 10

How can I trace this back to the hardare path/device causing the problem ??
9 REPLIES 9
Pete Randall
Outstanding Contributor

Re: SCSI Errors ???

See Chris Moore's answer (among others) in this thread:

http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0x9d086d96588ad4118fef0090279cd0f9,00.html


Pete

Pete
Ravi_8
Honored Contributor

Re: SCSI Errors ???

Hi,


Look under /dev (maybe using lssf) until you find a node that matches.

SCSI resets can happen if a device on the bus was power-cycled, if a device initiated a reset, or if a cable was removed. They can also be triggered by faulty hardware or if the bus is not properly terminated. A few resets are nothing to worry about. The SCSI protocol is expected to handle these events but if you are seeing more than a few then start looking.

if the OS is 11i
1) SCSI IO Subsystem Cumulative (PHKL_23666)
2) SCSI IO Cumulative (PHKL_29039)
may help you
never give up
Michael Steele_2
Honored Contributor
Solution

Re: SCSI Errors ???

Bus 10 is the issue so that's going to be a disk controller or another HBA, i.e., 'c10'.

ioscan -fnkC ext_bus

If fibre channel adaptor then use 'fcmsutil' to check for errors.

ioscan -fnkC fc

fcmsutil /dev/td0, 1, 2, etc.

Use logtool to see accumulated errors:

STM > TOOLS > UTILITY > RUN > LOGTOOL > FILE > VIEW > RAW SUMMARY.

Note the first and last dates of transactions and calculate the difference. If the difference is short, like 4 hours, then this is important to note. Now read down the report of hardware addresses and observe the integer numbers in parenthesis. Anything over 150 in this 4 hour period should be called into HP for replacement.
Support Fatherhood - Stop Family Law
Zeev Schultz
Honored Contributor

Re: SCSI Errors ???

1)ioscan -fnCext_bus (seek bus 10)
2)in syslog.log seek for "dev 0xXXYYZZFF" where
XX are a major number (1f for sdisk),YY is a bus,ZZ is a scsi target+lun pair.
For example 0x1f011000 stands for sdisk (disk device),bus 01, target 1,lun 0.Do ioscan for
bus 01 and find (for example):c2t1d0 is a culprit.(Note c2,where 2 is an instance/number of controllers in system and NOT bus number which can be found by ioscan -fnCext_bus)

Zeev
So computers don't think yet. At least not chess computers. - Seymour Cray
Ron Lawson_1
Trusted Contributor

Re: SCSI Errors ???

Looking for some feedback...

I've noticed several queries about how to decode these SCSI syslog messages.

For the next generation of SCSI, we will be including some additional information in the syslog messages. For example, you might see the following message if an IO times out:

SCSI Ultra320 1/0/14/0/1 instance 11: IO Type: SCSI IO. IO timed out - Target ID: 10, LUN ID: 0 CDB - 2F 00 00 00 00 00 00 FF FF 00

Does this make it easier to find the device causing the problem? Is there any additional information you'd like to see?

Ron Lawson
HP
Bill Hassell
Honored Contributor

Re: SCSI Errors ???

The additional (decoded) information is VERY nice. However, a decoder script would be nice. lbolt, power fail, etc messages have lots of numbers but no useful data without the official HP Decoder Ring (tm). The hex numbers may or may not be useful but nothing is available outside HP to make any sense out of them (ie, CDB - 2F 00 00 00 00 00 00 FF FF 00). And yes, the script will have to be constantly updated as long as the message formats keep changing. Maybe the lab can standardize on format tags in front of the numbers?


Bill Hassell, sysadmin
Eugeny Brychkov
Honored Contributor

Re: SCSI Errors ???

Command 12h is one of mandatory SCSI commands - it's Inquiry command. Device failed or delayed inquiry and I would recommend to proceed with device troubleshooting and testing ASAP. To do it first identify device, check its state with ioscan and diskinfo, then check surface with read command.
Although if this device is not an ordinary SCSI disk, but, for example, FC disk array, these symptoms can be caused by incorrect SAN configuration
Eugeny
Eugeny Brychkov
Honored Contributor

Re: SCSI Errors ???

Clarification... under read command I meant
dd if=/dev/rdsk/cXtYdZ of=/dev/null bs=4096k
Eugeny
Zeev Schultz
Honored Contributor

Re: SCSI Errors ???

Well would be nice to know different fields of
SCSI dump...
Isn't it up to the chip manufacturers (like NCR,Digital etc) to set what exactly is
shown in the syslog?
So computers don't think yet. At least not chess computers. - Seymour Cray