System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

my tru64 machine often reboot by itself?

xxl_1
Frequent Advisor

my tru64 machine often reboot by itself?

two compa ds20e build a ase cluster environment ,but one machine often reboot by itself!
My Os is tru4.0G,cluster is ase1.6!
the attached docment is binary.errlog,
someone can tell me why?
6 REPLIES
Han Pilmeyer
Esteemed Contributor

Re: my tru64 machine often reboot by itself?

It's difficult to say from the binary error log what happened. You're using uerf to translate the data in the binary error log and that doesn't help either. For DS20E systems, you should be using DECevent (the dia utility).

In this case I would start looking in the /var/adm/messages file to see if there are any hints there. The console log (if you have one) may help also.

It's possible that the system which reboots itself detected a cluster problem and as a saveguard rebooted itself. In this case I would expect messages in the log files. You may also want to check the log files on the surviving system.
xxl_1
Frequent Advisor

Re: my tru64 machine often reboot by itself?

to Han Pilmeyer:
first,thank u for your reply!
the attached docment is got by using the tool "dia",It seems no extra hints in the /var/adm/messages file ã
I check the cluster config using the command "clu_ivp -v",the output tell me the cluster config is correct!
what problem else there will be ??
Manish PATHAK_2
Regular Advisor

Re: my tru64 machine often reboot by itself?

can you prepare the syscheck report and submit it to forum.

also verify all the CPU & Memory cards & also the environmental variable such as temperature etc.
Han Pilmeyer
Esteemed Contributor

Re: my tru64 machine often reboot by itself?

V4.0G is getting old. It seems I can't remember that far back.

Instead of DECevent (dia) you should use SEA (a.k.a. Compaq Analyse) which is part of WEBES. There were a lot of events in the error log that were not properly translated. SEA should be able to translate those.

When a system reboots in an ASE cluster there must be messages about that in the log file of the surviving node. Do you get messages at all?

sys_check sounds like a good suggestions. That should gather all relevant data.
Joris Denayer
Respected Contributor

Re: my tru64 machine often reboot by itself?

Han has right,

There are a lot of entries that are not translated correctly.
You had 2 machinechecks (entry 1479 and 1481) on resp Feb-11 and Apr-02.
Only the raw (not translated) data is shown.

Best is to take contact with the local HP Service Organization. Send them the file binary.errlog for inspection.
The latest analyser packages should be able to translate these entries to more meaningfull text.

In attachment, you find a smaller overview of the events that you posted earlier

Enjoy

To err is human, but to really faul things up requires a computer
fred mudgett_2
Occasional Advisor

Re: my tru64 machine often reboot by itself?

Hello, An even simpler way to find what was causing the system to reboot is in /var/adm/messages and grep for the panic string. If its a machine check causing the crash it'll have register dumps which can be read, compared etc.

Fred