Operating System - OpenVMS
1753595 Members
6090 Online
108796 Solutions
New Discussion юеВ

Frequently Alpha DS20E get halted...

 
Alanjones_1
Advisor

Frequently Alpha DS20E get halted...

Dear Friends,

i m working with 4 node alpha server cluster environment, in that one node DS20E frequnetly goes halted state. what might be the cause? when i analysed after boot the server there is no errors found in error log and operator log.

is there any other analysis method to analyse the entrie server? can any one...

Alan
U CAN !!! U CAN !!!
9 REPLIES 9
Volker Halle
Honored Contributor

Re: Frequently Alpha DS20E get halted...

Alan,

if you get an unexpected halt on an OpenVMS Alpha system, it's best to set the console enviroment variable AUTO_ACTION to RESTART. This will cause the console to restart OpenVMS after an unexpected HALT and cause OpenVMS to write a crashdump and reboot.

This will save errlog entries and make the halt analyzeable.

Alternatively, you need to capture the console output from the halt, this at least gives some clue about the underlying problem.

Volker.
labadie_1
Honored Contributor

Re: Frequently Alpha DS20E get halted...

Hello

You have not said which vms version you use.
Depending upon this version, different tools show the hardware errors
_ ana/error
- diagnose
- Compaq analyze (SEA)
- ana/err/elv translate

Check the memory with the command
$ sh memory
Do you see some bad pages ?

Check the Cpu and memory errors with a procedure such as
http://dcl.openvms.org/stories.php?story=06/03/21/8098045

Post your Vms version.
Volker Halle
Honored Contributor

Re: Frequently Alpha DS20E get halted...

Gerard,

if an OpenVMS system halts and you haven't set AUTO_ACTION RESTART, you need to go to the console and type >>> B to get the system up again.

In this case NO ERRLOG buffers are saved from the time preceeding the halt and absolutely no information is available from the halt situation.

Volker.
Peter Zeiszler
Trusted Contributor

Re: Frequently Alpha DS20E get halted...

Also make sure your dump file is not a shadowed disk. Check your parameters if your dump file is different than the system disk.

Logical - CLUE$DOSD_DEVICE
sysgen - DUMPSTYLE
http://h71000.www7.hp.com/DOC/82FINAL/6549/6549pro_001.html#rb_tab1
Andy Bustamante
Honored Contributor

Re: Frequently Alpha DS20E get halted...

You should also check sysgen parameter DUMPSTYLE. For an Alphaserver I'd generally use 9, but see the help in SYSGEN. Make sure the sysdump.dmp file is large enough for your crash dump, autogen will tell you if changes are recommended.

Another option would be to schedule downtime and run console diagnostics. I'd recommend using a PC and terminal emulator to be able to scroll back display.


Andy
If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
Dean McGorrill
Valued Contributor

Re: Frequently Alpha DS20E get halted...

hi Alan,
if it happens again, do a >>> exam pc
exam ps, exam r26. on the running system see
if you can find out where the processer was
with an anal/system and exam the pc address.
kari salminen
Advisor

Re: Frequently Alpha DS20E get halted...

I've seen this few times and the reason has been a bad memory module.

If the bad memory resides within the system area, and the EXCEPTION code
gets hit by the bad memory, then no dump file or Error Log entry is written,
just plain HALT instruction.

Ask your local HP service to perform a full memory diagnose.
Wim Van den Wyngaert
Honored Contributor

Re: Frequently Alpha DS20E get halted...

Just this weekend I had bad memory and this resulted in a hang, not a crash. control-p no longer reacting (GS160).

After a few restarts we got a crash instead of a hang and then the ecc errors were reported.

Wim
Wim
Jim Geier_1
Regular Advisor

Re: Frequently Alpha DS20E get halted...

We have a DS20e, and it was also halting by itself with no crash and no entries in the error log, no evidence at all as to the nature of the problem. Some investigation revealed an obscure problem with the front control panel, called the OCP board, at other sites. Since the front control panel board was replaced, that DS20e has not halted once.