System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

lbolts in syslog issued for bus 4....but whats on bus 4 ?

SOLVED
Go to solution

lbolts in syslog issued for bus 4....but whats on bus 4 ?

Hi there,

We have a K460 with an attached Jamaica JBOD and we have noticed plenty of lbolt messages appearing in the syslog file (see enclosed).

I've run an ioscan and all disks appear CLAIMED and fine (see enclosed).

There are plenty of references to "bus 4" as the offending device or something attached to "bus 4"

Can anyone give me some pointers as to what to investigate next please and confirm what is on "bus 4" ?

thanks in advance,
Sean
6 REPLIES
Matti_Kurkela
Honored Contributor
Solution

Re: lbolts in syslog issued for bus 4....but whats on bus 4 ?

The "lbolt" message is a SCSI reset message, therefore "bus 4" refers to the 4th SCSI bus. If the bus contains disks (as it seems to), the affected disks are of the form /dev/[r]dsk/c4t*d*.

The affected devices are identified as "1f04e000" and "1f04f000". The first two digits are the major device number: 1f is hexadecimal for 31. "lsdev -e 31" would tell you the name of the device driver that uses that major number, confirming that these are disk devices.

The next two digits are the SCSI bus number, the c part of the disk device name. 04 = /dev/[r]dsk/c4*.

The next single hex digit identifies the device ID on the SCSI bus. e=14, f=15.
The last 3 digits are for the LUN number, which is always 0 on Jamaica disks.

So, the messages seem to indicate that disks /dev/dsk/c4t14d0 and /dev/dsk/c4t15d0 are about to fail.

The ioscan listing gives the hardware paths of these disks as 10/0.14.0 and 10/0.15.0. Following the path back towards the root brings us to the SCSI controller, at hardware path 10/0.

If I remember correctly, these hardware paths are documented on the server's exterior... but now it's time to open the K-class Service Manual:

http://ftp.parisc-linux.org/docs/platforms/A2375-90004.pdf

What you want is the Path Addressing table on page 30 (Table 2-2).

It identifies the path 10.0 as "Core I/O card FW DIFF SCSI connector". The internal disks have smaller SCSI ID numbers, so our problem disks are probably external. It's time to use the old Eyeball Mark I: follow the HVD SCSI cable plugged into the Core I/O and find the SCSI disks on that cable with SCSI IDs set to 14 and 15.

The SCSI IDs on each slot of a Jamaica box are set using DIP switches. The switch pattern maps directly to the binary representation of the SCSI ID, i.e. 14 is ON-ON-ON-OFF and 15 is ON-ON-ON-ON.

MK
MK
SoorajCleris
Honored Contributor

Re: lbolts in syslog issued for bus 4....but whats on bus 4 ?

Thats a superb explanation MK..
"UNIX is basically a simple operating system, but you have to be a genius to understand the simplicity" - Dennis Ritchie
Steven E. Protter
Exalted Contributor

Re: lbolts in syslog issued for bus 4....but whats on bus 4 ?

Shalom,

LVM: vg[0]: pvnum=2 (dev_t=0x1f04e000) is POWERFAILED

One of those disks in the JBOD, probably power failed.

I suggest testing them with mstm cstm or xstm and look at ioscan output to determine where the disk is.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Raj Briden
Frequent Advisor

Re: lbolts in syslog issued for bus 4....but whats on bus 4 ?

use `dd` command on the suspected disk to check the consistancy of the disk.

#dd if=/dev/rdsk/cXtXdX of=/dev/null

Re: lbolts in syslog issued for bus 4....but whats on bus 4 ?

Thanks for the replies so far guys - appreciated !

This has given us a far better understanding of the architecture and being able to identify the actual offending device.

Latest update is that we have moved the data off disk 15 which was flagging lbolts and now removed that disk from the JBOD but now we are seeing the following......

Apr 27 15:08:48 shpe01 vmunix: SCSI: First party detected bus hang -- lbolt: 52066955, bus: 4
Apr 27 15:08:48 shpe01 vmunix: lbp->state: 3020
Apr 27 15:08:48 shpe01 vmunix: lbp->offset: 40
Apr 27 15:08:48 shpe01 vmunix: lbp->uPhysScript: a00000
Apr 27 15:08:48 shpe01 vmunix: From most recent interrupt:
Apr 27 15:08:48 shpe01 vmunix: ISTAT: 29, SIST0: 00, SIST1: 00, DSTAT: 84, DSPS: 0000000a
Apr 27 15:08:48 shpe01 vmunix: lsp: 0000000000000000
Apr 27 15:08:48 shpe01 vmunix: lbp->owner: 0000000049c0f500
Apr 27 15:08:48 shpe01 vmunix: bp->b_dev: cb046000
Apr 27 15:08:48 shpe01 vmunix: scb->io_id: 45334c3
Apr 27 15:08:48 shpe01 vmunix: scb->cdb: 4d 00 40 00 00 00 00 04 00 00
Apr 27 15:08:48 shpe01 vmunix: lbolt_at_timeout: 52065755, lbolt_at_start: 52065755
Apr 27 15:08:48 shpe01 vmunix: lsp->state: 10d
Apr 27 15:08:48 shpe01 vmunix: scratch_lsp: 0000000049c0f500
Apr 27 15:08:48 shpe01 vmunix: Pre-DSP script dump [0000000044012030]:
Apr 27 15:08:48 shpe01 vmunix: 980dff00 0000000a 78350800 00000000
Apr 27 15:08:48 shpe01 vmunix: 0e000004 00a00540 80000000 00000000
Apr 27 15:08:48 shpe01 vmunix: Script dump [0000000044012050]:
Apr 27 15:08:48 shpe01 vmunix: 870b0000 00a002d8 98080000 00000005
Apr 27 15:08:48 shpe01 vmunix: 721a0000 00000000 98080000 00000001
Apr 27 15:08:55 shpe01 vmunix: SCSI: Resetting SCSI -- lbolt: 52067055, bus: 4
Apr 27 15:08:55 shpe01 vmunix: SCSI: Reset detected -- lbolt: 52067055, bus: 4

So, based on the previous definition as to how to decifer the actual device, i see....

bp->b_dev: cb046000

...which i am guessing relates to....

lsdev -e 203 (dec of hex cb)

which returns "ctl"

Then, we have SCSI bus of "04"
And finally, a device id of "6"

But, this doesnt seem to relate to any device detailed on an ioscan -fnC ctl......


# ioscan -fnC ctl
Class I H/W Path Driver S/W State H/W Type Description
======================================================================
ctl 0 8/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c0t7d0
ctl 1 8/4.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c1t7d0
ctl 2 8/8.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c2t7d0
ctl 3 8/12.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c3t7d0
ctl 4 10/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c4t7d0
ctl 5 10/8.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c5t7d0
ctl 6 10/12/5.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c6t7d0


Can anyone confirm if i am barking up the wrong tree please ?

TIA,

Sean
Matti_Kurkela
Honored Contributor

Re: lbolts in syslog issued for bus 4....but whats on bus 4 ?

In your earlier log, both SCSI IDs 14 and 15 on bus 4 were causing errors. This suggests you eliminated only one of them:

> Latest update is that we have moved the data off disk 15 which was flagging lbolts and now removed that disk from the JBOD but now we are seeing the following......

The interpretation rules for the minor device number are specific for each driver (identified by the major number). So the interpretation of a cbXXXXXX device ID won't necessarily follow exactly the same rules as a 1fXXXXXX device ID.

But we won't need to decode the ID to get the bus number - it's already decoded for us:

> SCSI: First party detected bus hang -- lbolt: 52066955, bus: 4

The major device "ctl" refers to a SCSI bus controller (HBA)... on the same bus that had problems before. As the controller is integrated to the CoreIO, it might be difficult to replace it.

The log entries in your original posting were basically the disks with IDs 14 and 15 saying, "I'm dying." Now, your new log entries are the result of the SCSI controller of bus 4 seeing the bus hanging and resetting it to clear the jam.

In this thread, Stephanie L. Davenport seems to know it could be caused by an old, unused disk failing on the SCSI bus:
http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1055496

If you moved your data off disk 14 too, but left it plugged in, it might be the one causing the trouble. Although the disk is no longer needed for data storage, its failing internal electronics may still be interfering with the proper functioning of SCSI bus 4.

MK
MK