Operating System - OpenVMS
1827708 Members
2607 Online
109967 Solutions
New Discussion

Re: OpenVMS cluster FREEZE during one node reboot

 
SOLVED
Go to solution
smsc_1
Regular Advisor

OpenVMS cluster FREEZE during one node reboot


Hello all,
I'm going VERY VERY crazy, and I hope you can help me because now I need holidays.... :(

I have an OpenVMS 2 nodes cluster with Itanium RX2660. I tried to reboot NODE2 and check if applications running fine on NODE1, this is called redundancy! :D

And this is what happens:
NODE2 goes DOWN. When NODE2 leave the cluster (I’m not properly sure here) NODE1 “FREEZE”. It’s not possible to perform any commands on it. After about 12 seconds NODE1 become reachable again.

The same FREEZE happens during NODE2 Booting up. Seems again when it ask to adding to the cluster.

Two month ago the same test works and I don't change anything on OpenVMS settings.

Please please please, do you have an advice regarding this issue???

Why if I reboot one node (NOD1 or NODE2 is the same) the other node FREEZE for 12 secs... It's not normal I think....

HELP!HELP!HELP! :(
./ Lucas
17 REPLIES 17
marsh_1
Honored Contributor
Solution

Re: OpenVMS cluster FREEZE during one node reboot

hi,

it looks like a normal cluster state transition is occurring, the other node needs time to detect that a node has definitely gone, then various other timeouts on cluster traffic / shadowset processing come into play before the remaining node decides it can continue as a viable cluster. see the cluster configuration manual for more info :-

http://h71000.www7.hp.com/doc/82final/6318/aa-q28lh-tk.pdf

hth

Hoff
Honored Contributor

Re: OpenVMS cluster FREEZE during one node reboot

Looks normal.

Something as yet unrecognized has clearly changed here.

Here's how to set up a two-node cluster:

http://labs.hoffmanlabs.com/node/569

I'm guessing you have a quorum disk here (and the quorum disk can slow transitions significantly), and that needs to be located on a shared bus and the transition time (when quorum disk votes need to be and are to be counted) is sensitive to the QDSKINTERVAL setting:

http://labs.hoffmanlabs.com/node/153

I'd expect your QDSKINTERVAL is 4, and your quorum disk may or may not be configured correctly.

In general with a two-node cluster, you need either shared SCSI or another shared interconnect for the quorum disk, or you need a third voting node.
smsc_1
Regular Advisor

Re: OpenVMS cluster FREEZE during one node reboot

Thanks for reply.

QDSKINTERVAL is set to 2 and other settings are the same as two month ago when we never got this kind of FREEZE/HANG.

Ok, I already know cluster transiction state but I think 12 secs it's not normal time!

So if you have some ideas on which parameters I can setup to override this FREEZE or at least to speed up it, it will be very appreciated!!!!!! ;)

./ Lucas
Robert Gezelter
Honored Contributor

Re: OpenVMS cluster FREEZE during one node reboot

smsc,

As Hoff noted, some pause is inevitable in a cluster transition.

My first suspect is always that an unthinking change in parameters occurred. This change may go back to before the previous experiment (if the nodes have rebooted since).

I would recommend starting with the parameters. I would also check to see if there has been a change in the LAN link between the two systems.

- Bob Gezelter, http://www.rlgsc.com
smsc_1
Regular Advisor

Re: OpenVMS cluster FREEZE during one node reboot


Robert, can you please be more specific?
You say "change in parameters occurred". Ok, but what parameters???

Then "there has been a change in the LAN link between the two systems"
What does it mean LAN link?

And for sure, as already said, I already know "the pause" but 12 secs is too long as pause.. or not??
./ Lucas
marsh_1
Honored Contributor

Re: OpenVMS cluster FREEZE during one node reboot

hi,

have o look through this presentation from keith parris about the hp dt test it ran for all it's os'es, it mentions some of the parameters involved here, the recovery time in that instance was 13.71 secs :-

www2.openvms.org/kparris/Bootcamp2008_DT_Block_1.ppt

Robert Gezelter
Honored Contributor

Re: OpenVMS cluster FREEZE during one node reboot

smsc,

In over twenty five years of working with clusters, I have probably seen incorrect settings on most of the parameters that can affect a cluster (as, I'm sure, has Hoff and other active members of the community).

Generally, I recommend checking the entire set of cluster-related parameters against there definitions. The list can be extracted using SYSMAN or SYSGEN, in addition to the parameters being accessible from F$GETSYI.

In terms of probability, there is a good chance that Hoff is correct about the handling of the quorum disk, but I have seen all manner of incorrect settings.

- Bob Gezelter, http://www.rlgsc.com
smsc_1
Regular Advisor

Re: OpenVMS cluster FREEZE during one node reboot


I don't know if this can help, but I attached all SYSGEN parameters. Can someone check if there's somethings wrong?? Since I don't known whitch is the correct settings...
./ Lucas
Richard Brodie_1
Honored Contributor

Re: OpenVMS cluster FREEZE during one node reboot

You have RECNXINTERVAL set to 10 seconds, so possibly you are waiting a RECNXINTERVAL. Maybe the shutdown wasn't clean or something.

Do you see a gap of 10 seconds between the lost connection message and the cluster reconfiguration message on the console?
Robert Gezelter
Honored Contributor

Re: OpenVMS cluster FREEZE during one node reboot

smsc,

A question on the wording of the original posting that started this thread.

When the post says "NODE2 goes DOWN.", what precisely happened? Was SHUTDOWN run or was the machine simply halted and rebooted?

- Bob Gezelter, http://www.rlgsc.com
smsc_1
Regular Advisor

Re: OpenVMS cluster FREEZE during one node reboot


"NODE2 GOES DOWN" means I use "REBOOT" command.
./ Lucas
Steve Reece_3
Trusted Contributor

Re: OpenVMS cluster FREEZE during one node reboot

As others have offered smsc, I'd reckon that the time that you're seeing is just about right. you can speed things up by reducing system parameters but that can then lead you into being susceptible to system crashes if the interconnect gets flooded or disconnected.

This all depends on what your application is too though - if you're running an ERP system then it's probably ok. If you're managing a satellite then it probably isn't. YMMV.
Robert Gezelter
Honored Contributor

Re: OpenVMS cluster FREEZE during one node reboot

smsc,

With all due repect, there is no REBOOT command on OpenVMS. One can define a symbol which forces a reboot (and indeed, there is just such a definition in the LOGIN.COM for SYSTEM; but it is not clear if that definition is the one in use).

If I understand correctly, that one types "REBOOT" at the "$" prompt, please do a SHOW SYMBOL REBOOT and post the result.

- Bob Gezelter, http://www.rlgsc.com

smsc_1
Regular Advisor

Re: OpenVMS cluster FREEZE during one node reboot


Goodmorning Robert and all,
this is the Reboot symbol:

SYSTEM> sho sym reboot
REBOOT == "@sys$system:shutdown 0 shutdown no yes later yes save"
./ Lucas
Joseph Huber_1
Honored Contributor

Re: OpenVMS cluster FREEZE during one node reboot

REBOOT == "@sys$system:shutdown 0 shutdown no yes later yes save"

I'm not really sure, but adding the option "REMOVE_NODE" (i.e. SAVE,REMOVE_NODE), could eventually reduce the cluster reconfiguration time, since the cluster does not have to wait for the reconnection interval.
http://www.mpp.mpg.de/~huber
smsc_1
Regular Advisor

Re: OpenVMS cluster FREEZE during one node reboot


REMOVE_NODE help me, but this is failover test, so software or hardare related, and I don't think hardware fault use REMOVE_NODE!!! ;)

By the way, thanks to all fo documentation, I read the cluster transiction state it's a normal behaviour!

Points assigned and thrad closed! ;)
./ Lucas
smsc_1
Regular Advisor

Re: OpenVMS cluster FREEZE during one node reboot

OPS! Forgot to close!
./ Lucas