Operating System - OpenVMS
1751957 Members
5271 Online
108783 Solutions
New Discussion юеВ

Re: OpenVMS cluster FREEZE during one node reboot

 
SOLVED
Go to solution
smsc_1
Regular Advisor

OpenVMS cluster FREEZE during one node reboot


Hello all,
I'm going VERY VERY crazy, and I hope you can help me because now I need holidays.... :(

I have an OpenVMS 2 nodes cluster with Itanium RX2660. I tried to reboot NODE2 and check if applications running fine on NODE1, this is called redundancy! :D

And this is what happens:
NODE2 goes DOWN. When NODE2 leave the cluster (IтАЩm not properly sure here) NODE1 тАЬFREEZEтАЭ. ItтАЩs not possible to perform any commands on it. After about 12 seconds NODE1 become reachable again.

The same FREEZE happens during NODE2 Booting up. Seems again when it ask to adding to the cluster.

Two month ago the same test works and I don't change anything on OpenVMS settings.

Please please please, do you have an advice regarding this issue???

Why if I reboot one node (NOD1 or NODE2 is the same) the other node FREEZE for 12 secs... It's not normal I think....

HELP!HELP!HELP! :(
./ Lucas
17 REPLIES 17
marsh_1
Honored Contributor
Solution

Re: OpenVMS cluster FREEZE during one node reboot

hi,

it looks like a normal cluster state transition is occurring, the other node needs time to detect that a node has definitely gone, then various other timeouts on cluster traffic / shadowset processing come into play before the remaining node decides it can continue as a viable cluster. see the cluster configuration manual for more info :-

http://h71000.www7.hp.com/doc/82final/6318/aa-q28lh-tk.pdf

hth

Hoff
Honored Contributor

Re: OpenVMS cluster FREEZE during one node reboot

Looks normal.

Something as yet unrecognized has clearly changed here.

Here's how to set up a two-node cluster:

http://labs.hoffmanlabs.com/node/569

I'm guessing you have a quorum disk here (and the quorum disk can slow transitions significantly), and that needs to be located on a shared bus and the transition time (when quorum disk votes need to be and are to be counted) is sensitive to the QDSKINTERVAL setting:

http://labs.hoffmanlabs.com/node/153

I'd expect your QDSKINTERVAL is 4, and your quorum disk may or may not be configured correctly.

In general with a two-node cluster, you need either shared SCSI or another shared interconnect for the quorum disk, or you need a third voting node.
smsc_1
Regular Advisor

Re: OpenVMS cluster FREEZE during one node reboot

Thanks for reply.

QDSKINTERVAL is set to 2 and other settings are the same as two month ago when we never got this kind of FREEZE/HANG.

Ok, I already know cluster transiction state but I think 12 secs it's not normal time!

So if you have some ideas on which parameters I can setup to override this FREEZE or at least to speed up it, it will be very appreciated!!!!!! ;)

./ Lucas
Robert Gezelter
Honored Contributor

Re: OpenVMS cluster FREEZE during one node reboot

smsc,

As Hoff noted, some pause is inevitable in a cluster transition.

My first suspect is always that an unthinking change in parameters occurred. This change may go back to before the previous experiment (if the nodes have rebooted since).

I would recommend starting with the parameters. I would also check to see if there has been a change in the LAN link between the two systems.

- Bob Gezelter, http://www.rlgsc.com
smsc_1
Regular Advisor

Re: OpenVMS cluster FREEZE during one node reboot


Robert, can you please be more specific?
You say "change in parameters occurred". Ok, but what parameters???

Then "there has been a change in the LAN link between the two systems"
What does it mean LAN link?

And for sure, as already said, I already know "the pause" but 12 secs is too long as pause.. or not??
./ Lucas
marsh_1
Honored Contributor

Re: OpenVMS cluster FREEZE during one node reboot

hi,

have o look through this presentation from keith parris about the hp dt test it ran for all it's os'es, it mentions some of the parameters involved here, the recovery time in that instance was 13.71 secs :-

www2.openvms.org/kparris/Bootcamp2008_DT_Block_1.ppt

Robert Gezelter
Honored Contributor

Re: OpenVMS cluster FREEZE during one node reboot

smsc,

In over twenty five years of working with clusters, I have probably seen incorrect settings on most of the parameters that can affect a cluster (as, I'm sure, has Hoff and other active members of the community).

Generally, I recommend checking the entire set of cluster-related parameters against there definitions. The list can be extracted using SYSMAN or SYSGEN, in addition to the parameters being accessible from F$GETSYI.

In terms of probability, there is a good chance that Hoff is correct about the handling of the quorum disk, but I have seen all manner of incorrect settings.

- Bob Gezelter, http://www.rlgsc.com
smsc_1
Regular Advisor

Re: OpenVMS cluster FREEZE during one node reboot


I don't know if this can help, but I attached all SYSGEN parameters. Can someone check if there's somethings wrong?? Since I don't known whitch is the correct settings...
./ Lucas
Richard Brodie_1
Honored Contributor

Re: OpenVMS cluster FREEZE during one node reboot

You have RECNXINTERVAL set to 10 seconds, so possibly you are waiting a RECNXINTERVAL. Maybe the shutdown wasn't clean or something.

Do you see a gap of 10 seconds between the lost connection message and the cluster reconfiguration message on the console?