Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

alphaserver 255 continuosly crashing

 
SOLVED
Go to solution

alphaserver 255 continuosly crashing

Hi all
My alphaserver is crashing every day with messages like this.

halted CPU 0
halt code = 1
operator initiated halt
pc = ffffffff8011ae04
>>>

If I type "b" it boot without any problem

There is no dump file at all:
SYS>SET DEF DSA0:[SYS0.SYSCOMMON.SYSERR]
SYS>DIR /SINCE=20-MAR-2009 CLUE*.*;* /DATE
%DIRECT-W-NOFILES, no files found
SYS>

Attached is the ERRLOG.CVT from yesterday (the last crash).

Cheers
Juan


22 REPLIES
Jess Goodman
Esteemed Contributor

Re: alphaserver 255 continuosly crashing

Are there any lines with "BUGCHECK" in them on the console? If not then your system did not really crash. Hitting on the console keypad my cause the system to halt. There is also a halt button on the front of the Server.

Dump files are not automatically created - you must create it and then reboot. The simplist way to do tihs is to run:
$ @SYS$UPDATE:SWAPFILES
Check your memory size, SYSGEN parameter DUMP_STYLE, and the VMS documentation to decide what size to make it.

Even without a dump file you may get some information after a crash with:
$ TYPE CLUE$HISTORY
(I'm not sure if that file will be updated when there is no dump file.)
I have one, but it's personal.
Jess Goodman
Esteemed Contributor

Re: alphaserver 255 continuosly crashing

Oh, just re-read your post. By default the system dump file is:
SYS$SPECIFIC:[SYSEXE]SYSDUMP.DMP
I have one, but it's personal.
cnb
Honored Contributor
Solution

Re: alphaserver 255 continuosly crashing

Check the OPERATOR.LOG files to see who or what is shutting the system down.

Could it be an environmental issue (temp, low input power, etc...)?

The error log wasn't much to go on except an excessive amount of previous device errors from devives attached to PKA0:, but the system clock was way off (check the TOD battery).

Does it always shutdown at the same time of day?

hth,

Re: alphaserver 255 continuosly crashing

It really exists but was modified some time ago(27-FEB-2009). The crash begins a couples of days ago.

SYS>dir SYSDUMP.DMP /full

Directory SYS$SPECIFIC:[SYSEXE]

SYSDUMP.DMP;1 File ID: (8280,2,0)
Size: 148071/148086 Owner: [1,1]
Created: 31-MAY-2004 16:46:09.28
Revised: 27-FEB-2009 12:04:06.75 (6)
...
Total of 1 file, 148071/148086 blocks.
SYS>show def
SYS$SPECIFIC:[SYSEXE]
SYS>
Hoff
Honored Contributor

Re: alphaserver 255 continuosly crashing

ITRC conspired to eat my previous reply here, so here is a fast re-write of the key items from that vaporized reply.

AlphaStation 255/233 box with hard halt on OpenVMS Alpha V7.3-2.

This can be bad hardware or bad software.

Set the SRM console AUTO_ACTION to restart, as that gets crashdumps written on halt operations.

Apply mandatory and UPDATE ECO kits to bring to current for OpenVMS V7.3-2. Apply mandatory ECO kits for any kernel-mode software in use here other than OpenVMS. This includes TCP/IP Services ECO kits (or whatever IP stack) as well as any other third-party packages.

Use ANALYZE /ERROR /ELV here, and TRANSLATE /SINCE=YESTERDAY. That'll get you a better list of errors.

Start a new error log: RENAME ERRLOG.SYS ERRLOG-END-27-MAR-2009.OLD or such. That'll help target the error information.

Following all proper personal safety procedures and proper electrostatic protection procedures, plan to unplug the box from power and then re-seat all memory sticks and all PCI widgets, and unplug and plug all cables and connectors. Inside the box and outside. Blow out the inevitably accumulated dust, too.

Also start planning to replace at least some hardware parts here, or (probably more economical) start to plan to move to something newer than this box. This case could be anything from a bad disk to a bad memory stick to a processor error to a bad or loose or ill-terminated cable to buggy driver software... But one thing is certain: this AlphaStation 255 is ancient, and ancient parts will fail.

Re: alphaserver 255 continuosly crashing

Hi all
Tanks a lot to everybody for reply, I finally check the hardware and replace the TOD battery. Since then this machine is up for more than one day.

Uptime 1 02:07:04

Let see how things going today

Re: alphaserver 255 continuosly crashing

Hi cnb

Finally it crash twice since I replace the battery. The funny thing is always was at the same time(more or less).
What this really means?

Cheers
Juan
cnb
Honored Contributor

Re: alphaserver 255 continuosly crashing

Did it generate a good dump file this time? Check the operator and error log for any device/error entries before the crash. Did you generate a new error log file per Hoff's instructions? If so, please post. Crashing always at or around the same time might be a power spike indication. Is it on an islolated properly grounded circuit?

FYI, If anyone's answer(s) are helpful please assign points to those who've taken the time to help.

hth,
Kumar_Sanjay
Regular Advisor

Re: alphaserver 255 continuosly crashing

Juan,

When the problem is started, would to try to recall, what was last changes since this problem started, have you installed incompatible hardware or softwares ?


Sanjay Kumar
Wim Van den Wyngaert
Honored Contributor

Re: alphaserver 255 continuosly crashing

We once had simular crashes due to microcoupures (very short power cuts). Of course with no crash (dump).

Wim
Wim
Volker Halle
Honored Contributor

Re: alphaserver 255 continuosly crashing

Juan,

your would normally get a halt message like this, if you press the HALT button or type CTRL-P on the serial console line.

You can try to look at the program counter in the running system:

$ ANAL/SYS
SDA> EXA/INS 8011AE04-40;50
SDA> EXIT

to determine, which module the HALT PC is in.

What is the setting of AUTO_ACTION ? Did you set it as >>> SET AUTO_ACTION RESTART

If this happens again, try

>>> CRASH

to force a system crash in this situation. This may preserve error log entries, if there were any HW errors prior to this halt.

Volker.

Re: alphaserver 255 continuosly crashing

Hi Volker

With
>ANAL/SYS
SDA> EXA/INS 8011AE04-40;50

I get a table but can't understand.
AUTO_ACTION is set to boot and I didn't try >>> CRASH

(if happens again I'll try to force the crash)

Anyway yesterday HP changed my mainboard into the alpha. It is up since yesterday, I'm waiting a couples of days more just to be sure.

Thank you very much
Volker Halle
Honored Contributor

Re: alphaserver 255 continuosly crashing

Juan,

SDA> EXA/INS ...

should show you the instruction stream. Please just post this data and I can decode or interpret it for you.

Volker.

Re: alphaserver 255 continuosly crashing

Ok Volker
Here you are:
SYS>ANAL/SYS

OpenVMS (TM) system analyzer

SDA> EXA/INS 8011AE04-40;50
SCH$CALC_CPU_LOAD_C+002B4: BLBC R28,#X000004
SCH$CALC_CPU_LOAD_C+002B8: LDQ R26,#XFF88(R13)
SCH$CALC_CPU_LOAD_C+002BC: LDQ R27,#XFF90(R13)
SCH$CALC_CPU_LOAD_C+002C0: BIS R31,#X32,R0
SCH$CALC_CPU_LOAD_C+002C4: JSR R26,(R26)
SCH$CALC_CPU_LOAD_C+002C8: BIS R31,#X03,R16
SCH$CALC_CPU_LOAD_C+002CC: BIS R31,R0,R27
SCH$CALC_CPU_LOAD_C+002D0: BIS R31,R1,R26
SCH$CALC_CPU_LOAD_C+002D4: MTPR IPL
SCH$CALC_CPU_LOAD_C+002D8: BIS R31,R27,R0
SCH$CALC_CPU_LOAD_C+002DC: BIS R31,R26,R1
SCH$CALC_CPU_LOAD_C+002E0: BIS R31,R31,R8
SCH$CALC_CPU_LOAD_C+002E4: LDL R17,#X0688(R6)
SCH$CALC_CPU_LOAD_C+002E8: AND R17,#X08,R25
SCH$CALC_CPU_LOAD_C+002EC: BEQ R25,#X00004A
SCH$CALC_CPU_LOAD_C+002F0: LDQ R24,#X0590(R6)
SCH$CALC_CPU_LOAD_C+002F4: ADDQ R24,#X01,R23
SCH$CALC_CPU_LOAD_C+002F8: STQ R23,#X0590(R6)
SCH$CALC_CPU_LOAD_C+002FC: LDL R22,#X0068(R6)
SCH$CALC_CPU_LOAD_C+00300: BEQ R22,#X000053
SCH$CALC_CPU_LOAD_C+00304: LDQ R28,#XFEC8(R13)
SDA> exit
Volker Halle
Honored Contributor

Re: alphaserver 255 continuosly crashing

Juan,

assuming that you didn't change system parameters and the instruction stream shown is the actual instruction stream present at these virtual addresses at the time of the 'operator initiated halt', then this HALT must have been forced 'externally' (someone pressing HALT button or entering CTRL-P). There is NO instruction in that instruction stream, which could have caused this halt. This means this could NOT have been an OpenVMS software problem.

Volker.

Re: alphaserver 255 continuosly crashing

Hi Volker
This machine is in a safe room, only me have access to it. Could be a reset button problem?

Cheers
Juan
Volker Halle
Honored Contributor

Re: alphaserver 255 continuosly crashing

Juan,

I can't recall, if the AlphaStation 255 has a separate HALT button or a combined Reboot/Halt button, that needs to be jumpered for either HALT or REBOOT.

If there is a separate HALT button or if the single Reboot/Halt button is jumpered to HALT, then a problem with this button is certainly possible.

Volker.
Volker Halle
Honored Contributor

Re: alphaserver 255 continuosly crashing

Juan,

the system just has a 'Reset Push Button'. It's function can be switched between 'Halt Request' or 'Reset' via Switch 4 on the motherboard.

See the Digital AlphaStationâ ¢ 255 Family User Information Guide page B-10 and B-11:

http://h18002.www1.hp.com/alphaserver/download/ek-vllxa-ui-b01.pdf

Volker.

Re: alphaserver 255 continuosly crashing

Hi Volker
I checked the switch and it was (and is) in the reset position. I checked reset button and replaced firmly in the frontal.
Right now the machine is up since three days right now.

Thanks a lot
Juan

Re: alphaserver 255 continuosly crashing

Hi Volker
Thanks for your help (and to everyone else too). The machine is behaving well now, so closing the thread.

Juan
Ali.nazarof
Occasional Visitor

Re: alphaserver 255 continuosly crashing

Dear Juan

Since I am experiencing the same problem with my DS15 alphaserver runnig Tru64 os, can you please tell me that your problem sloved by changing mainboard or by checking the halt/reset button?

Thanks,
Ali

Re: alphaserver 255 continuosly crashing

Hi Ali.nazarof

What I did was check the front cover of my alphastation 255, and check to be firmly seated to not press a bit the reset button. I didn't change any hardware at all.
I did change the mainboard battery.

Juan