Operating System - Tru64 Unix
1748224 Members
4640 Online
108759 Solutions
New Discussion юеВ

Alpha just resets and reboots

 
A.W.R
Frequent Advisor

Alpha just resets and reboots

Hi,

We have an ES40 Alpha server with 4 by EV6.7 (21264A) 667 processors, and 3GB of RAM on Tru64UNIX v5.1B rev 2650.

Periodically the system just resets and because we have the AUTO_ACTION parameter set to reboot it reboots. There are no errors in the binary error log and no messages in the messages files.

If anyone can provide input on why this maybe occurring it would be greatly appreciated.

Thanks
Andrew
7 REPLIES 7
Steven Schweda
Honored Contributor

Re: Alpha just resets and reboots

> Periodically [...]

Do you really mean "periodically", that is,
"at regular intervals", like, say, every day
at 00:01, or do you really mean
"occasionally", as in "repeatedly and
unpredictably"?

Are you getting crash dumps?
John Manger
Valued Contributor

Re: Alpha just resets and reboots

Its probably worthwhile attaching a serial console to capture what happens during these 'restarts'.

Also, check further back over recent weeks/months in the binary errlog and see if there were any h/w related warnings.

John M
Nobody can serve both God and Money
A.W.R
Frequent Advisor

Re: Alpha just resets and reboots

Hi,

This happens at irregular intervals for no apparent reason that we can diagnose. The binary errlog is clean, there are no crash dumps. There are no messages in the messages file.

Thanks
Andrew
Steven Schweda
Honored Contributor

Re: Alpha just resets and reboots

If it's just winking out and rebooting, with
no sign of an organized crash or shutdown,
then I'd tend to suspect a power problem
(supply interruption, or failing power
supply). I'm not sure how one might easily
test that bad-power-supply hypothesis,
however, other than swapping out the
suspect(s)
Vladimir Fabecic
Honored Contributor

Re: Alpha just resets and reboots

ES40 machines had problems with RAM memory.
I had several cases like yours. Problem was bad memory DIMM.
Find some time for machine downtime and do memory tests.

>>> memexer 3

It can take a very long time to find failing DIMM.
In vino veritas, in VMS cluster
cnb
Honored Contributor

Re: Alpha just resets and reboots

Most likely memory, an intermittent OCP or PS, but you can try some things first...

Use the console and RMC to check the system health via the env and status commands. Clear any ALERTS and look for anything out of spec. See these guides for reference:

http://h18000.www1.hp.com/alphaserver/download/es40fg_revb.pdf

http://h18000.www1.hp.com/alphaserver/download/es40og_revb.pdf


The next unplanned restart use RMC env to see what alerts are set.

Use '# consvar -s auto_action halt', the next time it halts, look in the NVRAM error logs via SRM commands.

Capture the following...

P00>>> show power
P00>>> cat el
P00>>> sho fru
P00>>> show error

Rgds,
cnb
Honored Contributor

Re: Alpha just resets and reboots

Also use sys_check -escalate off hours to see if anything else looks weird.

Have you tried using evmget or evmwatch?

Just some more ideas...

Rgds,