Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

One node is not booting in cluster

One node is not booting in cluster

Hi,

We have two nodes (DS20 running AlphaVMS 7.3-1)in a cluster, both nodes are booting from same disk drive. Some time A node can boot successfully and node B is hanging on the message "Stablish connection to Quorum disk" and not progressing further. Some time B node is booting successfully and A is hanging on above message.

Vote for each node is 1 and quorum disk also has vote value=1.
Does any one can help me to resolve this problem?

Best regards,
Mohammad
MKQ
9 REPLIES
Mobeen_1
Esteemed Contributor

Re: One node is not booting in cluster

Mohammed,
This looks like a scenario where in your total no. of votes are not equal to/greater than the quorum.

Can you please check

1. VOTES

2. EXPECTED VOTES

The following definitions/formulas need to be kept in mind

1. When nodes in the OpenVMS Cluster boot, the connection manager uses the largest value for EXPECTED_VOTES of all systems present to derive an estimated quorum value according to the following formula:
Estimated quorum = (EXPECTED_VOTES + 2)/2 | Rounded down

2. During a state transition, the connection manager dynamically computes the cluster quorum value to be the maximum of the following:
The current cluster quorum value
The largest of the values calculated from the following formula, where the EXPECTED_VOTES value is largest value specified by any node in the cluster:
QUORUM = (EXPECTED_VOTES + 2)/2 | Rounded down
The value calculated from the following formula, where the VOTES system parameter is the total votes held by all cluster members:
QUORUM = (VOTES + 2)/2 | Rounded down

If you figure out that this is the issue, i advise that you review the following link

http://broadcast.ipv7.net:81/openvms-manual/72final/4477/4477pro_002.html

Also verify that your system disk and quorum disk is shared and can be seen from both the nodes in your cluster.

Let me know

regards
Mobeen
Uwe Zessin
Honored Contributor

Re: One node is not booting in cluster

Mohammad,

I have seen a similar symptom when there was a network problem.

When the second node boots, it comes to checking the quorum disk together with listening to any cluster hello packets from the other node.

If it doesn't receive any packets it could think that there is no other cluster member running. However, by checking the quorum disk it learns that there _is_ one - it just can not talk to it!

It is just unfortunate that the system does not give out a message - it justs appears to hang.
.

Re: One node is not booting in cluster

Hi Mobeen & Zessin,
Thank you for your responce and advise. I think I do not have problem with voting because this cluster was rebooted several times and we never had this kind of problem.

Yes, most probably this is a LAN problem. But again one question that I have disconnected the external LAN and systems are using internal local LAN and now it should boot?????

Regards,
Mohammad
MKQ
Mobeen_1
Esteemed Contributor

Re: One node is not booting in cluster

Mohammad,
This is turning out to be an interesting problem. Uwe should be able to help us out here....he is the man :))

Well, i take your words that there is no issue with the votes and other stuff. Now looking beyond this....

Since you say the problem is intermittent and is not limited to one member of the cluster (i.e it happens on Node A some times and as well as on Node B some times). I would look at components that are common to both nodes....

1. Can you confirm that the quorum disk
is reachable and could be seen from
both the members ?

2. As yours is a 2 node cluster, its
mandatory that you have a quorum disk and
that it is accessible from both the nodes

3. If for some reason in your environment
you don't like to share disks, then
do you have a "Quorum Watcher" setup?

Finally what is the type of cluster that you have configured?

Also let us know if these are production systems, if they are not, then we can try few things out :)
Thanks
Mobeen
Uwe Zessin
Honored Contributor

Re: One node is not booting in cluster

Hello Mobeen,
thank you for your nice words! Unfortunately I am a mere mortal - otherwise I would have fixed that problem already ;-)

Mohammad,
can you describe you network infrastructure a little bit? What cards, switches, ...

In the past I have seen negotiation problems between a DE500 network card and a switch whose vendor name I don't remember right know (see, Mobeen?). They happened only when I pulled the netwark cable and reseated it.
It worked when I did a reboot of the system.
.
Martin P.J. Zinser
Honored Contributor

Re: One node is not booting in cluster

Hello Mohammad,

how is the quorum disk connected up? If it is a SAN you might miss one of the various paths through the fabric.

Greetings, Martin
Mike Naime
Honored Contributor

Re: One node is not booting in cluster

Also, what VMS version are you running. We had a similar issue with either 7.2-1h1 or 7.3 systems (I forget which) that they patched the LAN driver for us. We would shutdown a node, and the other node would hang. It's root cause was a quorum issue.
VMS SAN mechanic
Uwe Zessin
Honored Contributor

Re: One node is not booting in cluster

Ah, that rings another bell.

It's been some time that I've build a cluster with two BA356 boxes on separate busses between two servers. Sometimes I had strange system freezes during boots. It turned out that one of the switches for the termination setting in the I/O module had not snapped in properly.

Now, the symptoms are a bit different, but who knows...
.
Terry Yeomans
Frequent Advisor

Re: One node is not booting in cluster

We have had this problem when one node loads successfully but the other hangs. Our problem was caused by the system disk (boot) being a shadow set. As the boot caused shadow copying to start, the hung node could not get at the boot disk until shadow copying had completed for the system disk. The only catch is that the problem occured on our old vax clusters using VMS 6.2
Yours Terry.