Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

VAXCluster Node Won't Boot Properly

Phillip McCollum
Occasional Visitor

VAXCluster Node Won't Boot Properly

Hello all,

Disclaimer: I'm a network guy, know next to nothing about OpenVMS and VaxClusters...

... but I have been tasked with figuring out why all satellite nodes will not boot from a MOP server unless the cluster is physically isolated from the rest of the network (which happens to contain other non-clustered VAX Machines). I believe the OS version is 6.2. Is there any logging I can do on the node and MOP server to gather more clues? Has anyone else run into something similar before?

Thanks all!
Phillip
11 REPLIES
Phillip McCollum
Occasional Visitor

Re: VAXCluster Node Won't Boot Properly

Thought I should provide a little more info. The MOP server is up and running. When the satellite node is booted, it appears to connect to the MOP server and also says "%VAXCluster, not authorized to perform conversational bootstrap." It then stays there for awhile and states "%VAXCluster, no connection to disk server." It will then repeat the cycle.

When the node first connects, I see a message pop up on the MOP server stating "Events Load request completed."

But that's it.. Hope that helps a bit. I apologize for not being a little more technical on the VAX/VMS aspects.

Your support is appreciated!
Phillip

Hoff
Honored Contributor

Re: VAXCluster Node Won't Boot Properly

If you're in a hurry, get somebody in for a look.

Chances are, however, that your networking folks have segmented the network here; you need to have a vLAN or bridge or otherwise in place, and the switch(es) here must pass traffic that your average switch jockey has never seen before, and a subset of the switch jockeys are then agast to realize doesn't involve IP. At all.

The usual approach for debug is to watch the console or an operator terminal (REPLY /ENABLE) to see the messages as the satellite bootstraps are requested and (if the stuff is working) as the downloads occur.

As for documentation, start here:

http://h71000.www7.hp.com/doc/731final/4477/4477pro_023.html#ci_appendix

You'll need the switches to pass Ethernet 60-01, 60-02, and 60-07, and probably a few other protocols. Or you'll need a bridge.

Cluster satellite bootstraps via MOP are basically the same as DECserver bootstraps, and I have some materials posted on DECserver MOP bootstraps and on NCP here:

http://labs.hoffmanlabs.com/node/183
http://labs.hoffmanlabs.com/node/271

NCP is the most common way to operate here, though you could also be using DECnet-Plus or LANCP here. (And there are links to that stuff at the above URLs.)
Hoff
Honored Contributor

Re: VAXCluster Node Won't Boot Properly

Ah, nuts. You slipped that reply in.

Check the console boot flags for whichever VAX satellites are in use here; the console syntax varies. Post up the particulars of the VAX model(s) involved, and we can get further.

The VAX hardware gear here can also be old enough that the console batteries have been drained, and weird stuff can happen then. Lost time values, settings that won't stick.
Robert Gezelter
Honored Contributor

Re: VAXCluster Node Won't Boot Properly

Phillip,

Two notes to amplify what Hoff mentioned.

First, if the batteries are dead (meaning NVRAM settings are lost), the system will power up and all settings will be default. Depending on which systems, you may also get messages asking about the language (e.g., English) etc. for the console dialog. It would be helpful to know which hardware models the satellites are.

Secondly, a LAN monitor (e.g., WireShark) is invaluable in this type of situation. A good step is to see if the MOP dialogue is behaving as expected by monitoring precisely what is being seen by the boot host.

There are many other possibilities. As Hoff noted, it may pay to get outside assistance (Disclaimer: We provide such services, as do Hoff and others who are active in this forum).

- Bob Gezelter, http://www.rlgsc.com
Phillip McCollum
Occasional Visitor

Re: VAXCluster Node Won't Boot Properly

Ahh, great replies gents. Many thanks. I will post the boot parameters as soon as I get them. I did manage a network capture via Wireshark, but because I lack knowledge of how the cluster bootstrap communication works, I couldn't make heads or tails of it. Is there a good primer on this somewhere? I tried to google it, but came up empty.

Thanks again,
Phillip
Robert Gezelter
Honored Contributor

Re: VAXCluster Node Won't Boot Properly

Phillip,

The hardware model number (e.g., VAXserver, VAXstation nnnnn) would also be be helpful.

The DECnet protocol suite was documented publicly. A set of the documents appears to be online at the DECnet for Linux site on sourcefourge at http://linux-decnet.sourceforge.net/docs/doc_index.html.

The actual MOP 3.0 specification appears to be at http://linux-decnet.sourceforge.net/docs/maintop30.txt.

- Bob Gezelter, http://www.rlgsc.com

Phillip McCollum
Occasional Visitor

Re: VAXCluster Node Won't Boot Properly

The machines in question are all VaxStation 4000 Series 60 machines.
Hoff
Honored Contributor

Re: VAXCluster Node Won't Boot Properly

This console has:

>>> HELP

Key for troubeshooting the problem at hand are these two commands:

>>> SHOW BFLG
>>> SHOW BOOT

The boot flags are usually (for instance E0000001) the boot root (E) and the flags (here 00000001, conversational boot.

The boot device (for a satellite boot) is usually ESA0.

If the administrator in the environment is truly paranoid, most console access may be locked.

Reading:

http://vt100.net/mirror/mds-199909/cd1/vax/pmarioma.pdf
Steve Reece_3
Trusted Contributor

Re: VAXCluster Node Won't Boot Properly

You might also want to look at the MOP server (hopefully that's a cluster member of the same cluster as the satellites are booting into?) and issue a REPLY/ENABLE command on a terminal window. Watch this screen for any messages about the booting node as it tries to boot into the cluster.

Steve
Volker Halle
Honored Contributor

Re: VAXCluster Node Won't Boot Properly

Phillip,

if you say everything is fine, if 'the cluster is physically isolated from the rest of the network', then I would conclude, that the boot parameters and the MOP and cluster configuration is o.k. and the satellites will all boot in an isolated network segment.

Note that much of the cluster formation code relies on multicast messages. There are situations, where network switches tend to drop (or filter out) those cluster multicast messages, which prevents your cluster from working !

If you are a network guy, look for ethernet frames with a protocol id of 60-07. The cluster multicast address will be AB-00-04-01-xx-xx, you can obtain your cluster's full MC address with $ MC SYSMAN CONF SHOW CLUSTER on the boot server.

Volker.
Steve Reece_3
Trusted Contributor

Re: VAXCluster Node Won't Boot Properly

Could it also be that, when the cluster is connected to the rest of the network, the wrong VAX system is responding to the request for a network boot? This then decides that it can't join a standalone node but can't communicate with the right node for one of the reasons like the network switches are dropping the packets?