Self Restart on One VMS Cluster Member

Ricky Pardede · ‎03-09-2009

Dear Friends,
I have an OpenVMS cluster that consists of 3 nodes.
1 node just did self reboot.
I tried to find the root cause, but still not successfull.
Operator.log only tells the moment when the node dissapears and rejoin the cluster.

Please suggests me to check the root cause.

Many Thanks,
Ricky Pardede

Volker Halle · ‎03-09-2009

Ricky,

the node may have just crashed and rebooted automatically. You should at least find a bugcheck entry in ERRLOG.SYS.

Depending on OpenVMS version and architecture, there is other crash information to be found.

Does $ ANAL/CRASH SYS$SYSTEM: on that node report any crash information ?

Volker.

Ricky Pardede · ‎03-09-2009

Thanx for the quick reply.
My machine use OpenVMS V7.3-2.

As your suggestion :
ID72:SYSTEM> ANAL/CRASH SYS$SYSTEM:
%SDA-I-SINGLEMEM, single member shadow set; accessing dump file via _DSA0:

OpenVMS (TM) system dump analyzer
...analyzing an Alpha compressed selective memory dump...

Dump taken on 9-MAR-2009 16:22:25.69
MACHINECHK, Machine check while in kernel mode

Now I still check how to use this utility.

Volker Halle · ‎03-09-2009

Ricky,

for OpenVMS Alpha V7.3-2, you can easily get a crash summary by typing:

$ TYPE CLUE$HISTORY

This will show one line for each crash. There is also a more detailled CLUE summary file for each crash: CLUE$COLLECT:CLUE$NODE_ddmmyy_hhm.LIS

A MACHINECHK crash is most likely caused by a hardware problem. You need to examine the errorlog (with DECevent or WEBES/SEA, depending on the maschine type).

You can easily extract the most recent errlog entries from the crashdump itself:

$ ANAL/CRASH SYS$SYSTEM:
SDA> CLUE ERRLOG

This will show the most recent errors and also extract them to SYS$SCRATCH:CLUE$ERRLOG.SYS. You can then use this file for detailled analysis of the error leading to the crash.

Volker.

Ricky Pardede · ‎03-09-2009

Hi Volker,

Thanks for the great help.

$ TYPE CLUE$HISTORY
=> I don't find entry for today crash.

SDA> CLUE ERRLOG
------------------------------------
Sequence Date Time Error Message Type
-------- ----------- ----------- --------------------------------
13818 9-MAR-2009 16:22:25.29 unknown entry
13819 9-MAR-2009 16:22:25.69 Machine Check 670
13820 9-MAR-2009 16:22:25.69 * Crash Entry

I think I need the WEBES tool for further investigation.
Can I get WEBES license for free ?

Thanks,
Ricky Pardede

Volker Halle · ‎03-09-2009

Ricky,

the CLUE$SDA process should be run automatically during startup and produce both the entry in CLUE$HISTORY and the CLUE$COLLECT:CLUE$node_ddmmyy_hhmm.LIS file. If this does not work, check SYS$MANAGER:CLUE$STARTUP_node.LOG for errors.

You can freely download WEBES. I would suggest, that you download and install the Windows variant. It can also analyze OpenVMS ERRLOG.SYS files.

http://h18023.www1.hp.com/support/svctools/

Form older system types, use DECevent.

Volker.

Hoff · ‎03-09-2009

I'd probably call in your hardware service organization earlier rather than later.

Machine Checks can be fairly simple to fix (re-seating DIMMs or swapping), or can be more involved.

Most any services organization is familiar with the steps involved here, and with decoding the machine check information that will be produced by DECevent or WBEM/WEBES, etc.

The sooner services is on-line for a diagnostic pass, the sooner the box gets fixed.

Ricky Pardede · ‎03-09-2009

Hi Volker and Hoff,

Thanks a lot for the great help.
I will call HP asap to help analyze the root cause.
Thanks for the new knowledge.

Rergars,
Ricky Pardede

Ricky Pardede · ‎03-09-2009

Hi Volker,

It seems the CLUE$SDA process not running in all nodes.
Can you suggest to turn on CLUE$SDA properly ?

ID72:SMSC> PIPE SHO SYS /CLUSTER | SEARCH SYS$INPUT NODE, sda, clue
OpenVMS V7.3-2 on node SMID71 10-MAR-2009 01:18:19.95 Uptime 84 08:25:06
OpenVMS V7.3-2 on node SMID72 10-MAR-2009 01:18:19.96 Uptime 0 08:51:31
OpenVMS V7.3-2 on node SMID73 10-MAR-2009 01:18:19.97 Uptime 35 07:32:42

Volker Halle · ‎03-09-2009

Ricky,

the CLUE$SDA process only runs temporarily during startup. It exits, after diagnosing the dump file. Look at SYS$MANAGER:CLUE$STARTUP_node.LOG for possible error messages.

Volker.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Self Restart on One VMS Cluster Member

Self Restart on One VMS Cluster Member

Re: Self Restart on One VMS Cluster Member

Re: Self Restart on One VMS Cluster Member

Re: Self Restart on One VMS Cluster Member

Re: Self Restart on One VMS Cluster Member

Re: Self Restart on One VMS Cluster Member

Re: Self Restart on One VMS Cluster Member

Re: Self Restart on One VMS Cluster Member

Re: Self Restart on One VMS Cluster Member

Re: Self Restart on One VMS Cluster Member