1846444 Members
2750 Online
110256 Solutions
New Discussion

Re: SCSI lbolt errors

 
Robert Milne
Frequent Advisor

SCSI lbolt errors

Hi there, hope you can help.

For a period yesterday we found some SCSI lbolt errors in both our syslog files and dmesg. They have stopped now. I haven't sent all of the error info. as it's too much (see below) ! But hopefully there is enough here to help identify what is going on. My main concern is identifying correctly the device causing the error. I suspect the tape drive, and a backup was being done at the time.

The devices specifically mentioned in the error are b_dev: cb046000 and later as b_dev: cd046040

Now to translate these devices...
From an ioscan we have a SCSI instance ext_bus at 4 with a tape drive on hardware path 0/2/0/0.6.0 with a device of /dev/rmt/c4t6d0 and device files 0x046000 for 0m and 0x046040 for 0mn. The backup with no rewind function was running at the time.

Is this the right device referred to (I'm not an expert) in the error or could it be something else ?

The messages have stopped now, and have not occurred against in doing another backup but what is likely to be the cause ? Electronic, SCSI bus, the tape drive itself, media, dirty heads, something else ?

Please, any help appreciated !
PS system is L1000A rp5400 running 11i

Thanks again,
Rob.


Jul 18 12:40:04 krone vmunix: SCSI: Request Timeout; Abort -- lbolt: 810133487,
dev: cd046040, io_id: 471a23f
Jul 18 12:40:37 krone vmunix: SCSI: First party detected bus hang -- lbolt: 8101
36787, bus: 4
Jul 18 12:40:37 krone vmunix: lbp->state: 5060
Jul 18 12:40:37 krone vmunix: lbp->offset: 80
Jul 18 12:40:37 krone vmunix: lbp->uPhysScript: f97ef000
Jul 18 12:40:37 krone vmunix: From most recent interrupt:
Jul 18 12:40:37 krone vmunix: ISTAT: 09, SIST0: 00, SIST1: 00, DSTAT:
84, DSPS: 00000001
Jul 18 12:40:37 krone vmunix: lsp: 0000000000000000
Jul 18 12:40:37 krone vmunix: lbp->owner: 000000004e6a7500
Jul 18 12:40:37 krone vmunix: bp->b_dev: cb046000
"For every pleasure there's a tax."
5 REPLIES 5
A. Clay Stephenson
Acclaimed Contributor

Re: SCSI lbolt errors

The first 2 hex digits identify the major device number. OxCB = 203 (dec) 0xCD = 205 (dec). Do an lsdev and determine the character major device associated with those values. I suspect that 203 is sctl (SCSI pass-thru) and 205 is stape (SCSI Tape) but your major device numbers may be different. The next hex digit pair (04) refers to the controller instance number. Do an ioscan -fn and the device associated with ext_bus instance 4 is the controller. It will be listed above the controller itself for each bus. The next hex digit (6) is the SCSI target ID. The next hexdigit (0) is the LUN. The last two hex digits (00) are device driver dependent.

In any event, cb046000 is sctl c4t6d0 and cd046000 is stape (dev/rmt/c4t6d0) both of these should appear with an ioscan -fn.
If it ain't broke, I can fix that.
vinod_25
Valued Contributor

Re: SCSI lbolt errors

hi rob,

These errors can be due to:
1. The IO timeout setting for the physical disk being set too low

2. A problem with the SCSI connector for that device

3. Not having the latest SCSI patches
installed

so...
1. Looked at the error messages, in particular the following line:

Jul 18 12:40:04 krone vmunix: SCSI: Request Timeout; Abort -- lbolt: 810133487,
dev: cd046040, io_id: 471a23f

2. Use the following command to find the major block number of the device:

echo "0xcd=D" | adb

this will return an integer value...

3. Run the following command:

lsdev -b

it gives
# lsdev -b 31

Character Block Driver Class

this confirms the driver used by the device...

4. find the device by the corresponding minor number...


Hope you now able to find the problmetic device and trace the causes ...as mentioned above

Cheers !!!

Vinod K
Devender Khatana
Honored Contributor

Re: SCSI lbolt errors

Hi,

-->The messages have stopped now, and have not occurred against in doing another backup but what is likely to be the cause ? Electronic, SCSI bus, the tape drive itself, media, dirty heads, something else ?


Both of above responses focussed on converting the numeric devide value to its haardware path so I would think of the your other questions.

The error indicates some timeout in the process which could be due to any reason depending on the configs. It primarily seems to be some intermediant cable isssue or loose connection. I would try to refix the cables & terminators on the drive & host side in first step & will closely monitor it during next few backups if the pointed device comes out to be a tape.

It could be a bad disk as well. Disks when having media problems also generate these errors. Does your syslog.log has got any EMS notification generated regarding this. If not enable EMS monitoring for the device listed here for a better notice next time. The notice will be mentioned in the EMS event log file as well i.e. /var/opt/resmon/log/event.log

HTH,
Devender

Impossible itself mentions "I m possible"
Mahesh Kumar Malik
Honored Contributor

Re: SCSI lbolt errors

Hi Robert

If the tape drive is sharing SCSI with disks , there may be SCSI time out situations. It is strongly recommended to put tape drive on sepearate dedicated SCSI channel.

Regards
Mahesh
Cheryl Griffin
Honored Contributor

Re: SCSI lbolt errors

These devices are tape devices, according to the device files (/dev/rmt/...) but you did not mention what model drives.

The "first party bus hang" indicates that the system was waiting on the tape drive to communicate but it didn't. These are usually followed up by a resetting scsi message.

You have to consider how often these messages are happening. Check that the hardware is properly cabled, connected and terminated. Other causes of the message are tape drive firmware issues or unsupported hardware.

If you can post further details about the tape drive model, we can assist you further.
"Downtime is a Crime."