Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Why the clusters member hang while use the memory channel?

 
SOLVED
Go to solution
olive_wide
Frequent Advisor

Why the clusters member hang while use the memory channel?

I have built two nodes' cluster,config the quorum disk.The communication of cluster I selected Memory Channel and LAN.When I reboot or shutdown one node,another node will hang about 30 minutes.But when I disconnected the memory cable,another node will not hang,it is running good.
I don't know if it is the reason of memory channel.What shall I do?
6 REPLIES
olive_wide
Frequent Advisor

Re: Why the clusters member hang while use the memory channel?

I got a mistake:when one node shutdown or reboot,another node will hang about 30 SECONDS,NOT 30 MINUTES.
David B Sneddon
Honored Contributor

Re: Why the clusters member hang while use the memory channel?

Olive,

When a node leaves a cluster, there will always be
a delay for the "cluster transition". This allows
the remaining node(s) to sort things out in order
to continue.
One SYSGEN parameter that has an impact is RECNXINTERVAL
which has a default value of 20 seconds.
This is the length of time that a node will wait
until it is decided that a potentially dead node
is in fact dead. After this interval a cleanup
will take place. This parameter can be changed
depending on the needs of your cluster, it can be
reduced if you like.

Dave
olive_wide
Frequent Advisor

Re: Why the clusters member hang while use the memory channel?

David,
I have changed the value of RECINTERVAL,set to 10,5,1,but another node still hang about 30 seconds when the node shutdown or reboot.
It seems better when I disconnected the memory channel cable.
What's the reason?
Volker Halle
Honored Contributor
Solution

Re: Why the clusters member hang while use the memory channel?

Olive,

the temporary hang of the other node after shutdown of the first node (only when the MEMORY CHANNEL cable is connected !) sounds like a known problem with processing the 'LAST GASP' message on memory channel.

Please look at the connection manager messages in OPERATOR.LOG of the surviving node. If everything is o.k., you should see:

16:10:27.46 Node NODE1 (csid ...) lost connection to node NODE2
16:10:27.46 Node NODE1 (csid ...) timed-out lost connection to node NODE2

i.e. lost cnx and time-out cnx messages at the SAME time (on NODE1 when shutting down NODE2).

If those messages are multiple seconds apart (typically RECNXINTERVAL), this indicates, that the LAST GASP processing did not work correctly.

You should still have the old OPERATOR.LOG files from your experiments, so please compare the messages from the shutdown with memory channel connected vs. memory channel NOT connected.

Make sure you have the most recent patches installed (especially the most recent SYS$PEDRIVER). For a description of this problem and the solution, please see VMS732_DRIVER-V0100 patch problem description 6.1.3 Cluster Hang.

Volker.
olive_wide
Frequent Advisor

Re: Why the clusters member hang while use the memory channel?

Volker,
Thanks a lot lot lot.
The cluster running well after I installed the patch VMS732_DRIVER-V0200, and won't hang when another node leave the cluster.

olive_wide
Frequent Advisor

Re: Why the clusters member hang while use the memory channel?

I have found a solution to this question as seen in the comments below.