1839276 Members
2476 Online
110138 Solutions
New Discussion

Re: Set host fails

 
SOLVED
Go to solution
Randy Hancock
Advisor

Set host fails

We had a FATAL BUGCHECK error which caused a reboot of our ES40 this morning. Running OpenVMS 7.2-2. Since then, whenever we try to set host to another machine, we encounter the following error:

$ set host es47
%RMS-F-DEV, error in device name or inappropriate device type for operation

We also cannot copy files from one node to another. Is this related to decnet, and if so, what should I be looking for? Or am I barking up the wrong tree?

Whether related or not, here is 1st page of dump error log:

OpenVMS (TM) Alpha Operating System, Version V7.2-2 -- System Dump Analysis 13-MAR-2006 08:49:02.40 Page 1
Crashdump Summary Information:



Crash Time: 13-MAR-2006 08:49:02.40
Bugcheck Type: CLUEXIT, Node voluntarily exiting VMScluster
Node: ES40 (Cluster)
CPU Type: AlphaServer ES40
VMS Version: V7.2-2
Current Process: NULL
Current Image:
Failing PC: FFFFFFFF.BDCE9580 CNX$BUGCHECK_CLUSTER_C+00020
Failing PS: 28000000.00000804
Module: SYS$CLUSTER (Link Date/Time: 8-MAY-2002 15:06:17.48)
Offset: 00011580

Boot Time: 12-MAR-2006 23:57:22.00
System Uptime: 0 08:51:40.40
Crash/Primary CPU: 00/00
System/CPU Type: 2208
Saved Processes: 525
Pagesize: 8 KByte (8192 bytes)
Physical Memory: 10240 MByte (4194304 PFNs, discontiguous memory)
Dumpfile Pagelets: 1309644 blocks
Dump Flags: writecomp,errlogcomp,dump_style
Dump Type: compressed,selective,shared_mem
EXE$GL_FLAGS: poolpging,init,bugdump
Paging Files: 1 Pagefile and 1 Swapfile installed
12 REPLIES 12
Hein van den Heuvel
Honored Contributor
Solution

Re: Set host fails

Your network did not start up correctly.
It appears you lost a physical network connection for a while. Check error log and operator log for other recent entries.
It would explain both the cluexit and telnet failure, but it does not explain how the node was allowed back up. Somethign is still wrong.

$Help /mess cluexit

"CLUEXIT, node voluntarily exiting VMScluster

Facility: BUGCHECK, System Bugcheck
Explanation: To avoid partitioning of the cluster, this system exited the cluster by crashing. ...

Generally, this condition arises when the intracluster communications of one or more systems is delayed in some way."

fwiw,
Hein.



Karl Rohwedder
Honored Contributor

Re: Set host fails

The node left the cluster voluntarily, e.g. if he can't see all the other nodes, the node assumes a fatal hardware condition and leaves the cluster to avoid data corruption (this is a very simple explanation!)

Are you sure, that the node rejoined the cluster, use SHOW CLUSTER e.g. to check, if all nodes are available.
Perhaps you have some problems with your ethernt interfaces, use DIAGNOSE for analysing the errorlog and check the console for error messages during boot.

regards Kalle
Arch_Muthiah
Honored Contributor

Re: Set host fails

Randy,
The CLUEXIT error is a type of bugcheck initiated by the Connection Manager, the OpenVMS Cluster software component that manages the interaction of cooperating OpenVMS Cluster computers. Most such bugchecks are triggered by conditions resulting from failures in communications paths, configuration errors, system management errors, and hardware failures.

Btw, have you done any updrade to your system before this cluexit bugcheck.

Archunan
Regards
Archie
Randy Hancock
Advisor

Re: Set host fails

Actually, we had some of our systems guys in last night upgrading parts of our network and I am just discovering that BOTH our ES40 AND our ES47 crashed with the same error about 37 minutes apart. We are not live with the ES47 yet, so I was not immediately aware of this, and just discovered it moments ago. Both of these systems were disconnected from the switch they were connected to during this upgrade process, and I am guessing that may be why we had the crashes.

So at this point, while we investigate the crashes in more detail, the problem that remains is that apparently DECNET did not start on our ES40 (but did on the ES47). Any pointers on logs I should be checking?

And thanks to all who are assisting. I am mainly a programmer trying to learn more about VMS systems administration, and our systems guys are mainly Windows oriented (!!).
Arch_Muthiah
Honored Contributor

Re: Set host fails

Randy,

if you see the operator log, you can see the details of communication link failure info.

What exactly the network upgradation were done before system crashed. and your network software version details will be also be helpful.

also have you tried to start DECNet manually?


Archunan
Regards
Archie
Volker Halle
Honored Contributor

Re: Set host fails

Randy,

your other thread with the same question, on which I had added some troubleshooting information, has apparently been removed by the ITRC moderators (as a duplicate ?), so here we go again:

The second-best approach to capturing console output during startup, if you have not connected your console to a console manager system, is to set the system parameter STARTUP_P2="D". This will cause all messages from the STARTUP process to be written to SYS$SYSTEM:STARTUP.LOG and will allow you to easily find errors during startup.

If you don't have the console output from the boot after the crash, it will be mostly speculation and guess-work to try to find out why DECnet did not start. DECnet phase IV would have needed to be started with @STARTNET from SYSTARTUP_VMS.COM, DECnet Phase V is started automatically. What are you running on your system ? Were there any other products, which were not started as well ?

Regarding the CLUEXIT crashes: if you disrupt the ability of communication between the cluster members for more than RECNXINTERVAL seconds, at least one of them will have to leave the cluster with a CLUEXIT crash after communications has been re-established. RECNXINTERVAL is a dynamic parameter, so you could have increased it, before clobbering your network (a concept probably not understood by your network/Windows people).

If you are interested in finding more details about those crashes, please consider to post the CLUE files (from CLUE$COLLECT:CLUE$node_ddmmyy_hhmm.LIS) as text attachments. Please also describe the setup of your cluster (is a quorum disk in use) ?

Volker.
comarow
Trusted Contributor

Re: Set host fails

99.9% of the time a clue exist is a network problem. The problems appears isolated to that node.

Likely causes could be the controller, the cable, the port, the network, or the switch.

To help identify if it's the local controller,

Anal/crash sysdump.dmp
sda>show lan/count

and look for any failures.

also, on the live system
anal/system
show lan/count

Can you telnet to the system? That would indicated a if it were networking versus software.

Can you do a set host 0?
Helps test the software, log in without going through the net.

Try changing the cable and plugging into another port.

Good luck.
Heinz W Genhart
Honored Contributor

Re: Set host fails

Hi Randy

The reason why a SET HOST ES47 does not work could be, that there is a Logical Name ES47
Can You check this ?

Regards

Heinz
Randy Hancock
Advisor

Re: Set host fails

One of our network guys was finally able to track down some port problems between our two alpha systems yesterday. We have addressed those and so far so good, we have not had any problems in the last 12 hours.

Thanks again to everyone for your assistance.

It is amazing to me that after programming in VMS environments for nearly 20 years, there are still new things that appear from time to time that I have never run into before.
Duncan Morris
Honored Contributor

Re: Set host fails

Hi Randy,
from your Forum Profile:


I have assigned points to 3 of 15 responses to my questions.



Maybe you can find some time to do some assigning?

http://forums1.itrc.hp.com/service/forums/helptips.do?#33

Mind, I do NOT say you necessarily need to give lots of points. It is fully up to _YOU_ to decide how many. If you consider an answer is not deserving any points, you can also assign 0 ( = zero ) points, and then that answer will no longer be counted as unassigned.
Consider, that every poster took at least the trouble of posting for you!

To easily find your streams with unassigned points, click your own name somewhere.
This will bring up your profile.
Near the bottom of that page, under the caption "My Question(s)" you will find "questions or topics with unassigned points " Clicking that will give all, and only, your questions that still have unassigned postings.

Thanks on behalf of your Forum colleagues.

PS. nothing personal in this. I try to post it to everyone with this kind of assignment ratio in this forum. If you have received a posting like this before please do not take offence none is intended!

Randy Hancock
Advisor

Re: Set host fails

My apologies for not assigning points in a more consistent/timely manner. I had forgotten how easy it is, and yes, if someone takes the time to respond, they deserve that acknowledgement.

Asking questions here and getting useful answers is much easier (not to mention less costly!) than contacting HP directly.

Duncan Morris
Honored Contributor

Re: Set host fails

Thanks for doing that Randy.

For your info you can assign 0 points to an answer rather than leaving it unassigned.

As an example, you can do this to my two postings here!!