Integrity Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

rx6440 external disk in HP MSA30 partially available after system crash

 
BERTRAND_7
Frequent Advisor

rx6440 external disk in HP MSA30 partially available after system crash

Hello,

Last day we have a system crash on integrity rx6440 even with internal mirrored disks.

The system couldn't reboot after crash since it detects a failure on the primary root disk (c2t1d0).

We restart the system on the mirrored root disk (c2t0d0). Yet, after the reboot not only the c2t1d0 is not available but also some external disks in an HP MSA30 MI.
Yet the external disks are fully available on the other node of the MC serviceguard cluster.

You'll find below the traces of what we get on the fault node :
ioscan -nfC ctl
Class I H/W Path Driver S/W State H/W Type Description
============================================================================
ctl 0 0/0/3/0.0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c0t7d0
ctl 1 0/0/3/0.1.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c1t7d0
ctl 2 0/1/1/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c2t7d0
ctl 3 0/1/1/1.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c3t7d0
ctl 4 0/2/1/0.6.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c4t6d0
ctl 5 0/2/1/1.6.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c5t6d0
ctl 10 0/2/1/1.15.0 sctl CLAIMED DEVICE COMPAQ PROLIANT 7L7E*DB
/dev/rscsi/c5t15d0
ctl 8 0/3/2/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c6t7d0
ctl 9 0/3/2/1.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c7t7d0
ctl 6 0/5/1/0.6.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c8t6d0
ctl 7 0/5/1/1.6.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c9t6d0


We didn't find the entry as expected :
ctl 12 0/2/1/0.15.0 sctl CLAIMED DEVICE COMPAQ PROLIANT 7L7E*DB
/dev/rscsi/c4t15d0

So all the disks c4t0d0 to c4t5d0 are not available on this computer>


We got hte following errors in the syslog :

SCSI errors in /var/adm/syslog.log on local node
============================================================================
Feb 9 14:29:36 PLSVR2 vmunix: Initializing the Ultra320 SCSI Controller at 0/1/1/0. Controller firmware version is 01.03.35.65
Feb 9 14:29:36 PLSVR2 vmunix: SCSI Ultra320 0/1/1/0 instance 2: The driver is now online
Feb 9 14:29:36 PLSVR2 vmunix: Initializing the Ultra320 SCSI Controller at 0/1/1/1. Controller firmware version is 01.03.35.65
Feb 9 14:29:36 PLSVR2 vmunix: SCSI Ultra320 0/1/1/1 instance 3: The driver is now online
Feb 9 14:29:36 PLSVR2 vmunix: Initializing the Ultra320 SCSI Controller at 0/2/1/0. Controller firmware version is 01.03.35.65
Feb 9 14:29:36 PLSVR2 vmunix: SCSI Ultra320 0/2/1/0 instance 4: The driver is now online
Feb 9 14:29:36 PLSVR2 vmunix: Initializing the Ultra320 SCSI Controller at 0/2/1/1. Controller firmware version is 01.03.35.65
Feb 9 14:29:36 PLSVR2 vmunix: c8xx BUS: 6 SCSI C1010 Ultra Wide Single-Ended A6829-60101 assigned CPU: 1
Feb 9 14:29:36 PLSVR2 vmunix: SCSI Ultra320 0/2/1/1 instance 5: The driver is now online
Feb 9 14:29:36 PLSVR2 vmunix: c8xx BUS: 7 SCSI C1010 Ultra160 Wide LVD A6829-60101 assigned CPU: 0
Feb 9 14:29:36 PLSVR2 vmunix: Initializing the Ultra320 SCSI Controller at 0/5/1/0. Controller firmware version is 01.03.35.65
Feb 9 14:29:36 PLSVR2 vmunix: SCSI Ultra320 0/5/1/0 instance 8: The driver is now online
Feb 9 14:29:36 PLSVR2 vmunix: Initializing the Ultra320 SCSI Controller at 0/5/1/1. Controller firmware version is 01.03.35.65
Feb 9 14:29:36 PLSVR2 vmunix: SCSI Ultra320 0/5/1/1 instance 9: The driver is now online
Feb 9 14:29:36 PLSVR2 vmunix: SCSI Ultra320 0/2/1/0 instance 4: IO Type : MPT Domain Validation IO has timed-out. Target ID: 0, LUN ID: 0. Inquiry Command - CDB: 12 00 00 00 24 00
Feb 9 14:29:36 PLSVR2 vmunix: SCSI Ultra320 0/2/1/0 instance 4: Domain Validation failed, device offline. Target ID = 0.
Feb 9 14:29:36 PLSVR2 vmunix: SCSI Ultra320 0/2/1/0 instance 4: IO Type : MPT Domain Validation IO has timed-out. Target ID: 1, LUN ID: 0. Inquiry Command - CDB: 12 00 00 00 24 00
Feb 9 14:29:36 PLSVR2 vmunix: SCSI Ultra320 0/2/1/0 instance 4: Domain Validation failed, device offline. Target ID = 1.
...
...
...
Feb 9 14:29:36 PLSVR2 vmunix: SCSI Ultra320 0/2/1/0 instance 4: IO Type : MPT Domain Validation IO has timed-out. Target ID: 4, LUN ID: 0. Inquiry Command - CDB: 12 00 00 00 24 00
Feb 9 14:29:36 PLSVR2 vmunix: SCSI Ultra320 0/2/1/0 instance 4: IO Type : MPT Domain Validation IO has timed-out. Target ID: 5, LUN ID: 0. Inquiry Command - CDB: 12 00 00 00 24 00
Feb 9 14:29:36 PLSVR2 vmunix: SCSI Ultra320 0/2/1/0 instance 4: IO Type : MPT Domain Validation IO has timed-out. Target ID: 15, LUN ID: 0. Inquiry Command - CDB: 12 00 00 00 24 00
Feb 9 14:29:36 PLSVR2 vmunix: SCSI Ultra320 0/2/1/0 instance 4: Domain Validation failed, device offline. Target ID = 15.
Feb 9 14:29:36 PLSVR2 vmunix: SCSI Ultra320 0/2/1/0 instance 4: IO Type : MPT Domain Validation IO has timed-out. Target ID: 15, LUN ID: 0. Inquiry Command - CDB: 12 00 00 00 24 00

Moreover during the boot of this node, it generates some SCSI errors on the other node of the cluster :

SCSI errors in /var/adm/syslog/syslog.log on alternate node of the cluster
============================================================================
Feb 9 13:49:28 PLSVR1 vmunix: SCSI: Read error -- dev: b 31 0x050000, errno: 126, resid: 1024,
Feb 9 13:49:28 PLSVR1 vmunix: SCSI Ultra320 0/2/1/1 instance 5: External SCSI bus reset detected. Condition cleared, no intervention required.
Feb 9 13:49:34 PLSVR1 vmunix: SCSI Ultra320 0/2/1/1 instance 5: External SCSI bus reset detected. Condition cleared, no intervention required.
Feb 9 13:52:54 PLSVR1 vmunix: SCSI Ultra320 0/2/1/1 instance 5: External SCSI bus reset detected. Condition cleared, no intervention required.
Feb 9 14:24:51 PLSVR1 vmunix: SCSI Ultra320 0/2/1/1 instance 5: External SCSI bus reset detected. Condition cleared, no intervention required.
Feb 9 14:25:21 PLSVR1 vmunix: SCSI: Read error -- dev: b 31 0x052000, errno: 126, resid: 1024,
Feb 9 14:25:15 PLSVR1 vmunix: SCSI Ultra320 0/2/1/1 instance 5: External SCSI bus reset detected. Condition cleared, no intervention required.
Feb 9 14:25:21 PLSVR1 vmunix: SCSI: Read error -- dev: b 31 0x053000, errno: 126, resid: 1024,
Feb 9 14:25:21 PLSVR1 vmunix: SCSI Ultra320 0/2/1/1 instance 5: External SCSI bus reset detected. Condition cleared, no intervention required.
Feb 9 14:28:59 PLSVR1 vmunix: SCSI Ultra320 0/2/1/1 instance 5: External SCSI bus reset detected. Condition cleared, no intervention required.


Could someone explains me what's wrong since we do not move any disks befor the crash.
Any suggestion is welcome.


Bruno
4 REPLIES 4
Devender Khatana
Honored Contributor

Re: rx6440 external disk in HP MSA30 partially available after system crash

Hi,

These are symptoms of adapter failure on the local host. The SCSI reset messages on the other node are quite understood and are caused by local node when trying to access the devices.

This could also be cause when something else i.e. cable , terminaltor, or the MSA 30 controller fails. The situation and solution you are in nothing can be said exactly. As adapter do not fail very often start by replacing cable/terminators first.

HTH,
Devender
Impossible itself mentions "I m possible"
Alan_152
Honored Contributor

Re: rx6440 external disk in HP MSA30 partially available after system crash

Which SCSI adapter are you using for connection to the MSA30? Is it the shipped core I/O or a replacement?

Also, is the MSA30 an Ultra3 or a U320 model?
BERTRAND_7
Frequent Advisor

Re: rx6440 external disk in HP MSA30 partially available after system crash

This problem has been solved by changing the SCSI card on the node PLSVR2 of the cluster
Thanks for anyone help.
BERTRAND_7
Frequent Advisor

Re: rx6440 external disk in HP MSA30 partially available after system crash

See comment above