- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- lbolts in syslog issued for bus 4....but whats on ...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-25-2010 01:19 PM
тАО04-25-2010 01:19 PM
We have a K460 with an attached Jamaica JBOD and we have noticed plenty of lbolt messages appearing in the syslog file (see enclosed).
I've run an ioscan and all disks appear CLAIMED and fine (see enclosed).
There are plenty of references to "bus 4" as the offending device or something attached to "bus 4"
Can anyone give me some pointers as to what to investigate next please and confirm what is on "bus 4" ?
thanks in advance,
Sean
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-25-2010 04:14 PM
тАО04-25-2010 04:14 PM
SolutionThe affected devices are identified as "1f04e000" and "1f04f000". The first two digits are the major device number: 1f is hexadecimal for 31. "lsdev -e 31" would tell you the name of the device driver that uses that major number, confirming that these are disk devices.
The next two digits are the SCSI bus number, the c part of the disk device name. 04 = /dev/[r]dsk/c4*.
The next single hex digit identifies the device ID on the SCSI bus. e=14, f=15.
The last 3 digits are for the LUN number, which is always 0 on Jamaica disks.
So, the messages seem to indicate that disks /dev/dsk/c4t14d0 and /dev/dsk/c4t15d0 are about to fail.
The ioscan listing gives the hardware paths of these disks as 10/0.14.0 and 10/0.15.0. Following the path back towards the root brings us to the SCSI controller, at hardware path 10/0.
If I remember correctly, these hardware paths are documented on the server's exterior... but now it's time to open the K-class Service Manual:
http://ftp.parisc-linux.org/docs/platforms/A2375-90004.pdf
What you want is the Path Addressing table on page 30 (Table 2-2).
It identifies the path 10.0 as "Core I/O card FW DIFF SCSI connector". The internal disks have smaller SCSI ID numbers, so our problem disks are probably external. It's time to use the old Eyeball Mark I: follow the HVD SCSI cable plugged into the Core I/O and find the SCSI disks on that cable with SCSI IDs set to 14 and 15.
The SCSI IDs on each slot of a Jamaica box are set using DIP switches. The switch pattern maps directly to the binary representation of the SCSI ID, i.e. 14 is ON-ON-ON-OFF and 15 is ON-ON-ON-ON.
MK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-25-2010 05:38 PM
тАО04-25-2010 05:38 PM
Re: lbolts in syslog issued for bus 4....but whats on bus 4 ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-25-2010 06:34 PM
тАО04-25-2010 06:34 PM
Re: lbolts in syslog issued for bus 4....but whats on bus 4 ?
LVM: vg[0]: pvnum=2 (dev_t=0x1f04e000) is POWERFAILED
One of those disks in the JBOD, probably power failed.
I suggest testing them with mstm cstm or xstm and look at ioscan output to determine where the disk is.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-25-2010 08:00 PM
тАО04-25-2010 08:00 PM
Re: lbolts in syslog issued for bus 4....but whats on bus 4 ?
#dd if=/dev/rdsk/cXtXdX of=/dev/null
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-27-2010 06:25 AM
тАО04-27-2010 06:25 AM
Re: lbolts in syslog issued for bus 4....but whats on bus 4 ?
This has given us a far better understanding of the architecture and being able to identify the actual offending device.
Latest update is that we have moved the data off disk 15 which was flagging lbolts and now removed that disk from the JBOD but now we are seeing the following......
Apr 27 15:08:48 shpe01 vmunix: SCSI: First party detected bus hang -- lbolt: 52066955, bus: 4
Apr 27 15:08:48 shpe01 vmunix: lbp->state: 3020
Apr 27 15:08:48 shpe01 vmunix: lbp->offset: 40
Apr 27 15:08:48 shpe01 vmunix: lbp->uPhysScript: a00000
Apr 27 15:08:48 shpe01 vmunix: From most recent interrupt:
Apr 27 15:08:48 shpe01 vmunix: ISTAT: 29, SIST0: 00, SIST1: 00, DSTAT: 84, DSPS: 0000000a
Apr 27 15:08:48 shpe01 vmunix: lsp: 0000000000000000
Apr 27 15:08:48 shpe01 vmunix: lbp->owner: 0000000049c0f500
Apr 27 15:08:48 shpe01 vmunix: bp->b_dev: cb046000
Apr 27 15:08:48 shpe01 vmunix: scb->io_id: 45334c3
Apr 27 15:08:48 shpe01 vmunix: scb->cdb: 4d 00 40 00 00 00 00 04 00 00
Apr 27 15:08:48 shpe01 vmunix: lbolt_at_timeout: 52065755, lbolt_at_start: 52065755
Apr 27 15:08:48 shpe01 vmunix: lsp->state: 10d
Apr 27 15:08:48 shpe01 vmunix: scratch_lsp: 0000000049c0f500
Apr 27 15:08:48 shpe01 vmunix: Pre-DSP script dump [0000000044012030]:
Apr 27 15:08:48 shpe01 vmunix: 980dff00 0000000a 78350800 00000000
Apr 27 15:08:48 shpe01 vmunix: 0e000004 00a00540 80000000 00000000
Apr 27 15:08:48 shpe01 vmunix: Script dump [0000000044012050]:
Apr 27 15:08:48 shpe01 vmunix: 870b0000 00a002d8 98080000 00000005
Apr 27 15:08:48 shpe01 vmunix: 721a0000 00000000 98080000 00000001
Apr 27 15:08:55 shpe01 vmunix: SCSI: Resetting SCSI -- lbolt: 52067055, bus: 4
Apr 27 15:08:55 shpe01 vmunix: SCSI: Reset detected -- lbolt: 52067055, bus: 4
So, based on the previous definition as to how to decifer the actual device, i see....
bp->b_dev: cb046000
...which i am guessing relates to....
lsdev -e 203 (dec of hex cb)
which returns "ctl"
Then, we have SCSI bus of "04"
And finally, a device id of "6"
But, this doesnt seem to relate to any device detailed on an ioscan -fnC ctl......
# ioscan -fnC ctl
Class I H/W Path Driver S/W State H/W Type Description
======================================================================
ctl 0 8/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c0t7d0
ctl 1 8/4.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c1t7d0
ctl 2 8/8.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c2t7d0
ctl 3 8/12.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c3t7d0
ctl 4 10/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c4t7d0
ctl 5 10/8.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c5t7d0
ctl 6 10/12/5.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c6t7d0
Can anyone confirm if i am barking up the wrong tree please ?
TIA,
Sean
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-27-2010 08:49 AM
тАО04-27-2010 08:49 AM
Re: lbolts in syslog issued for bus 4....but whats on bus 4 ?
> Latest update is that we have moved the data off disk 15 which was flagging lbolts and now removed that disk from the JBOD but now we are seeing the following......
The interpretation rules for the minor device number are specific for each driver (identified by the major number). So the interpretation of a cbXXXXXX device ID won't necessarily follow exactly the same rules as a 1fXXXXXX device ID.
But we won't need to decode the ID to get the bus number - it's already decoded for us:
> SCSI: First party detected bus hang -- lbolt: 52066955, bus: 4
The major device "ctl" refers to a SCSI bus controller (HBA)... on the same bus that had problems before. As the controller is integrated to the CoreIO, it might be difficult to replace it.
The log entries in your original posting were basically the disks with IDs 14 and 15 saying, "I'm dying." Now, your new log entries are the result of the SCSI controller of bus 4 seeing the bus hanging and resetting it to clear the jam.
In this thread, Stephanie L. Davenport seems to know it could be caused by an old, unused disk failing on the SCSI bus:
http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1055496
If you moved your data off disk 14 too, but left it plugged in, it might be the one causing the trouble. Although the disk is no longer needed for data storage, its failing internal electronics may still be interfering with the proper functioning of SCSI bus 4.
MK