TruCluster
cancel
Showing results for 
Search instead for 
Did you mean: 

SCSI Problem with TruCluster 5.1b

Reda El-Masry
Occasional Contributor

SCSI Problem with TruCluster 5.1b

• The cluster was working properly without any problem. This cluster is composed of three members:
o M1 where the boot disk is local dsk12.
o M2 where the boot disk is local dsk14.
o M3 where the boot disk is dsk6 in the shared disks in shelf2 slot 0.
o Quorum in the shared disks is dsk0 in shelf1 slot 0.
• I deleted member three then created it again. Some checks were performed (clu_quorum, clu_check_config) and everything was OK.
• Now to boot M3, while M1 running, I halted M2 to boot on M3 boot disk, but during the boot it hanged with this message:

NX MGR: Join operation complete
CNX MGR: membership configuration index: 6 (4 additions, 2 removals)
CNX MGR: Node cap105tmp 3 incarn 0x65820 csid 0x30002 has been added to the cluster
CNX MGR: Node cap105 1 incarn 0x2b035 csid 0x10001 has been added to the cluster
dlm: resuming lock activity
kch: resuming activity
CNX QDISK: Successfully claimed quorum disk, adding 1 vote.
CNX MGR: quorum (re)gained, (re)starting cluster operations.
Joining versw kch set.

And at the same time I got those messages continuously on member M1 then it hanged:

cap105:/>
cap105:/> CNX QDISK: Cluster quorum disk 19,47 has become available again.
CNX QDISK: Cluster quorum disk 19,47 has become unavailable due to a write error (status 5).
CNX QDISK: Cluster quorum disk 19,47 has become available again.
CNX QDISK: Cluster quorum disk 19,47 has become unavailable due to a write error (status 5).

• So by halting both members, I have tried to boot on M1 but it failed with those messages:

Registering CMS Services
CNX QDISK: Qdisk open failed (19) retrying every 20 seconds.
ee2: Autonegotiated, 100 Mbps full duplex
CNX MGR: insufficient votes to form cluster: have 1 need 2
CNX MGR: insufficient votes to form cluster: have 1 need 2
CNX MGR: Node cap105 id 1 incarn 0x31ed5 attempting to form or join cluster cap105cap205

• So by halting both of them, I have tried to boot on M2 and the cluster was formed. Then I booted on M1 which had joined the cluster successfully. During the boot I noted this failed messages for “dka0 quorum & dkc0 M3” on the console of M1:

Testing the System
Testing the Disks (read only)
file open failed for fidka0.le open failed 0.0.1for dkc0..1
0.0.2.1
Testing the Network

• The binary.errlog was sent to HP for analysis, but no problem was reported.
• So by checking from M1 I found the two disks “dsk0 quorum and dsk6 BootM3” are stale. From M2 no problem as they are “valid”
• At the same time on the console of M1 I have a lot of messages appearing continuously ” [700] scssi event”.
• I concluded that the quorum disk is no more valid for M1, that the reason why the cluster was formed from M2 “valid” and not from M1 “stale”
• So by suspecting the two disks, I decided to delete the quorum and create it again and replace the disks by new ones. After replacement I found that I cannot see them from M1. So I have installed them in slot 6 in both shelves instead of slot 0. After scsi scanning both disks were seen and valid from both members.
• So I created the quorum and then M3, then some checks were performed (clu_quorum, clu_check_config) and everything was OK.
• Now by booting M3, but this time while M2 running I halted M1 and booted on M3 boot disk, the exact same problem reoccurred but this time with M2.
• So in order to save time and by suspecting that the cluster OS is corrupted, I decided to create the cluster from the beginning by booting on the local “unix installation” disk. The problem is that I found the dsk0 and dsk6 are not seen even w
2 REPLIES
Michael Schulte zur Sur
Honored Contributor

Re: SCSI Problem with TruCluster 5.1b

Hi,

your post was cut short due to its length.
Have you still got the last part of it to repost it?

greetings,

Michael
Han Pilmeyer
Esteemed Contributor

Re: SCSI Problem with TruCluster 5.1b

Be sure to describe the SCSI hardware involved and in particular how the shared shelf is connected, what kind of shelf it is and which disks you use. Also include which targets you are using for the adapters in the host on the shared bus(es).

From all of the nodes in the cluster, you should be able to do a "show dev" from the console and see all the shared disks.

Looks like you might have a problem with the configuration of your shared bus.