Simpler Navigation for Servers and Operating Systems
Completed: a much simpler Servers and Operating Systems section of the Community. We combined many of the older boards, so you won't have to click through so many levels to get at the information you need. Check the consolidated boards here as many sub-forums are now single boards.
Operating System - Tru64 Unix
cancel
Showing results for 
Search instead for 
Did you mean: 

Tru64unix:ES40 server restarting automatically

Learn_1
Regular Advisor

Tru64unix:ES40 server restarting automatically

Hi,
having problem in Alpha Server ES40 running Tru64unix.a few days ago due to dirty power failure and UPS failure the server went down unplanned.now since from that date, the server is having problem that it restarts automatically once everyday at anytime.last night i shut it down accordingly alongwith the controllers.now i want to inquire what log files can help me to diagnose the cause and which monitoring tools can be helpful to pin point the problem.the server is also running Oracle database.one more thing to mention is that there is an error reported on the desktop as below.
"Console log for ranjha.mobilink.net.pk
STDC Assertion failed:"childp"
file=actionTt.c
line=394 "

your detailed suggestions will help me to resolve this issue.
11 REPLIES
Ralf Puchner
Honored Contributor

Re: Tru64unix:ES40 server restarting automatically

First check if a crashdump exists on /var/adm/crash. If so post it here.

If not check if binary.errlog (dia -R) contains machine check errors, if so open a call within the support center for hardware analyze.

Next check the messages and daemon.log for information, maybe the emvd timers and settings are too low.
Help() { FirstReadManual(urgently); Go_to_it;; }
Dave Bechtold
Respected Contributor

Re: Tru64unix:ES40 server restarting automatically

Hello,

I agree with Ralph in checking all the various logs, the Environmental Monitoring daemon (envmond) should report Power supply status to the logs, daemon.log, kern.log, or messages. It will manny times broadcast critical failures to all terminals connected at the time of the failure. You can use "man envmond" and envconfig to configure envmond daemon parmaters.

But, if envmond is not running - you will not have a warning, etc...

Review the binary.errlog using the appropriate tool - DECEvent - dia for ES40.

The ES40's can have multiple power supplies, it's possible that one of them is marginal now after taking a power hit - maybe try reducing the number of power supplies to isolate the suspect - or have them checked out by Field Service.

As for the "STDC Assertion failed ..." this is most likely being produced by a "C" program using the G-LIB (GNU Freeware "C" Library) and or gcc compiler. Find out what application and or who the user is using the file 'actionTtc.c' and work it from that angle.

Hope that helps,
Dave Bechtold
Learn_1
Regular Advisor

Re: Tru64unix:ES40 server restarting automatically

Hi,
Attached is the crash dump file for further suggestion.
Learn_1
Regular Advisor

Re: Tru64unix:ES40 server restarting automatically

attached isd the log for further consideration.
this error log is referring to the cpu0.
Ralf Puchner
Honored Contributor

Re: Tru64unix:ES40 server restarting automatically

The log indicates a machine check error which is simply a fault on cpu/cache/memory.
Log a call within the HP support center and sent binary.errlog for further investigation to them.
Help() { FirstReadManual(urgently); Go_to_it;; }
Learn_1
Regular Advisor

Re: Tru64unix:ES40 server restarting automatically

Hi Ralph,
Thanks for your guidance.i will be logging a support call with HP soon.just for your opinion i am also forwarding you the binary.errlog.this log states error in power supply and memory.i have c hecked the power supply status which is OK also there is't any error during the memory test.now you can also check this binary.errlog and suggest any further action which can help me to pin point the faulty component so that i can bring this server online in a stable condition as soon as possible.
Once again thanks.
Ralf Puchner
Honored Contributor

Re: Tru64unix:ES40 server restarting automatically

The memory test does not test all kind of problems (and does not test the cache memory of the CPU).

A machine check requires special tools (memcheck decoder) to analyze the root cause of the problem. These tools are only available for hardware specialists within the support center.

Help() { FirstReadManual(urgently); Go_to_it;; }
Learn_1
Regular Advisor

Re: Tru64unix:ES40 server restarting automatically

hi Ralf,
i did logged call with hp and at the i was advised to replace all the DIMM modules as they diagnosed that these modules are faulty.
i am arranging the modules.in the mean time we had the identical system in the cabinet which was not being used.so just to keep work going we replaced the system disks with the spare server and started up the server alongwith the storage.the system kept working fine for a week and just today system restarted again.now all the hardwre is new even the memory modules.now what u think what could be the problem
Ralf Puchner
Honored Contributor

Re: Tru64unix:ES40 server restarting automatically

why not sending the binary.errlog and crash-data (if available) to the support center again? It seems to be the same root cause again....
Help() { FirstReadManual(urgently); Go_to_it;; }
Vikash_2
Occasional Visitor

Re: Tru64unix:ES40 server restarting automatically

Did you tried sys_check with esclate option .Long time back I also landed up in the same situation where I had to replace few Memory bank .
HP support is going to suggest that :)-
Regards
Vikash
Bernard-Granger
Occasional Visitor

Re: Tru64unix:ES40 server restarting automatically

Hello,

I think it's a memory problem :
Full Description:
A CPU0 uncorrectable double-bit memory fill load event at address
x0588E3240 on an indeterminate data bit has been diagnosed. This System
event may require qualified service to one or more probable field
replaceable unit(s) listed below.


FRU List:
Probability: High
Manufacturer: Compaq
Device Type: Memory DIMM
Physical Location: Slot Array 0 Set 0 MMB0-J1 or MMB0-J2 or MMB1-J1 or MMB1-J2
FRU Part Number: Unavailable
FRU Serial Number: Unavailable
FRU Firmware Rev: Unavailable

You have to upgrade the alphaserver firmware (6.1 mini) and be sure than memory_test parameter is set to full (from SRM >>> set memory_test full). I can't locate the bad DIMM between (Slot Array 0 Set 0 MMB0-J1 or MMB0-J2 or MMB1-J1 or MMB1-J2)
regards,