System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

First party detected bus hang -- lbolt: 420673898, bus: 5 - K460 (help required please)

 
sean_h
Occasional Contributor

First party detected bus hang -- lbolt: 420673898, bus: 5 - K460 (help required please)

Hi there,

 

Can someone assist me with the following errors that i am seeing please........

 

In short, system was fine before the following error occurred :

 

Apr 25 07:39:21 vmunix: SCSI: First party detected bus hang -- lbolt: 420673898, bus: 5
Apr 25 07:39:21 vmunix: lbp->state: 5020
Apr 25 07:39:21 vmunix: lbp->offset: 80
Apr 25 07:39:22 above message repeats 4 times
Apr 25 07:39:21 vmunix: lbp->uPhysScript: f00000
Apr 25 07:39:21 vmunix: From most recent interrupt:
Apr 25 07:39:21 vmunix: ISTAT: 09, SIST0: 00, SIST1: 00, DSTAT: 84, DSPS: 00000001
Apr 25 07:39:21 vmunix: lsp: 0000000000000000
Apr 25 07:39:21 vmunix: lbp->owner: 0000000064fcbe00
Apr 25 07:39:21 vmunix: bp->b_dev: 1f05e300
Apr 25 07:39:21 vmunix: scb->io_id: 5fcf85a
Apr 25 07:39:21 vmunix: scb->cdb: 2a 00 00 09 16 d0 00 00 08 00
Apr 25 07:39:21 vmunix: lbolt_at_timeout: 420673798, lbolt_at_start: 420673298
Apr 25 07:39:21 vmunix: lsp->state: 5
Apr 25 07:39:21 vmunix: scratch_lsp: 0000000049af8300
Apr 25 07:39:21 vmunix: bp->b_dev: 1f05e300
Apr 25 07:39:21 vmunix: scb->io_id: 5fcf859
Apr 25 07:39:21 vmunix: scb->cdb: 2a 00 00 00 12 e0 00 00 08 00
Apr 25 07:39:21 vmunix: lbolt_at_timeout: 420674198, lbolt_at_start: 420671198
Apr 25 07:39:21 vmunix: lsp->state: 205
Apr 25 07:39:21 vmunix: Pre-DSP script dump [00000000440a6060]:
Apr 25 07:39:21 vmunix: 721a0000 00000000 98080000 00000001
Apr 25 07:39:21 vmunix: e0100004 00000000 80000000 00000000
Apr 25 07:39:21 vmunix: Script dump [00000000440a6080]:
Apr 25 07:39:21 vmunix: 890b0000 00f00198 880b0000 00f00198
Apr 25 07:39:21 vmunix: 83030000 00f00368 0b000001 00f005d0
Apr 25 07:39:22 vmunix: SCSI: Resetting SCSI -- lbolt: 420673998, bus: 5
Apr 25 07:39:22 vmunix: SCSI: Reset detected -- lbolt: 420673998, bus: 5
Apr 25 07:39:29 vmunix: LVM: Performed a switch for Lun ID = 0 (pv = 0x0000000048e96800), from raw device 0x1f05e300 (with priority: 0, and current flags: 0x40) to raw device 0x1f01e300 (with priority: 1, and current flags: 0x0).
Apr 25 07:39:29 vmunix:
Apr 25 07:39:29 vmunix: SCSI: Read error -- dev: b 31 0x05e300, errno: 126, resid: 2048,
Apr 25 07:39:29 vmunix: blkno: 8, sectno: 16, offset: 8192, bcount: 2048.
Apr 25 07:39:51 vmunix: LVM: Path (device 0x1f05e200) to PV 1 in VG 5 Failed!
Apr 25 07:39:51 vmunix: LVM: Path (device 0x1f05e100) to PV 2 in VG 5 Failed!
Apr 25 07:39:51 vmunix: LVM: Performed a switch for Lun ID = 0 (pv = 0x0000000049b4f040), from raw device 0x1f05e100 (with priority: 0, and current flags: 0x40) to raw device 0x1f01e100 (with priority: 1, and current flags: 0x0).
Apr 25 07:39:51 vmunix: LVM: Path (device 0x1f05e000) to PV 3 in VG 5 Failed!
Apr 25 07:39:51 vmunix: LVM: vg[5]: pvnum=0 (dev_t=0x1f01e300) is POWERFAILED
Apr 25 07:39:51 vmunix: LVM: vg[5]: pvnum=1 (dev_t=0x1f01e200) is POWERFAILED
Apr 25 07:39:51 vmunix: LVM: vg[5]: pvnum=2 (dev_t=0x1f01e100) is POWERFAILED
Apr 25 07:39:51 vmunix: LVM: vg[5]: pvnum=3 (dev_t=0x1f01e000) is POWERFAILED
Apr 25 07:39:51 vmunix: LVM: vg[5]: pvnum=0 (dev_t=0x1f01e300) is POWERFAILED
Apr 25 07:39:56 vmunix: LVM: Recovered Path (device 0x1f01e300) to PV 0 in VG 5.
Apr 25 07:39:56 vmunix: LVM: Restored PV 0 to VG 5.

 

An ioscan shows that i have a chunk of SCSI attached EMC disks showing as NO_HW (see ioscan attached)

I have also attached the dmesg output also to assist.

 

The system is a 2 node cluster running ServiceGuard, both K460s.

 

  • From some other forums i have found, they seem to indicated that the b_dev value is what i am looking for....correct ??
  • If so, b_dev: 1f05e300 i decipher as being c5t14d3....again can anyone confirm if this is this correct please ??

 

I have the engineers checking out the EMC disks at present but i dont have any update just yet as to whether they can see any problem.

 

Whilst i am waiting for them to come back to me, can anyone confirm the above please and if it was a disk problem has now gone away, how i deal with the NO_HW.

Is this likely to be a disk problem as i notice target 10/8.0 is showing as NO_HW also.

 

Will only a reboot sort this ?

 

Many thanks in advance of anyone having the time to reply

 

TIA

Sean

1 REPLY
Matti_Kurkela
Honored Contributor

Re: First party detected bus hang -- lbolt: 420673898, bus: 5 - K460 (help required please)

Your deciphering of b_dev looks correct to me.

 

It looks like all the LUNs and targets connected to SCSI adapter 10/8 are now in NO_HW state: only the SCSI initiator device (related to the SCSI adapter itself) is CLAIMED.

 

My first question would be, "Did someone or something accidentally disconnect or damage a cable connected to that SCSI adapter?" Remember to check the SCSI terminators too, if applicable.

 

If the cable is not the problem, then perhaps someone disabled the corresponding SCSI port on the EMC. If the EMC and the cable are both good, it might be that the SCSI adapter has failed.

 

You seem to have alternate paths configured on at least some of the PVs, and LVM is already switching to use the good paths. If all the PVs have alternate paths configured, LVM should failover them as soon as each PV is accessed.

 

If the problem is with the cable or something equally simple, the disks should come back to CLAIMED state after the problem is fixed once you run "ioscan -fn" again.

MK