HPE 9000 and HPE e3000 Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

RP5450 rebooted with few traces left

 
SOLVED
Go to solution
mgais
Advisor

RP5450 rebooted with few traces left

Hello,

I have a RP5450 server (with HP-UX 11i v1) that the other night has rebooted unexpectedly, leaving very few traces:
- nothing in syslog
- nothing in shutdownlog
- in /var/crash there is no crash file
- in /var/tombstones there is a ts99 with correct Unix timestamp, but reading it, the registers are all zeros and for the HPMC
info it says "no valid timestamps"
- 'last' command reports a "reboot system boot "

A similar event happened about 2 months ago.
Reboot during the night with very few traces left.

Where do I begin to investigate the possible cause? The machine basically is used for SCM tasks.
13 REPLIES 13
Warren_9
Honored Contributor
Solution

Re: RP5450 rebooted with few traces left

hi,

how about the MP console log? any message show on console? and maybe other MP log.

GOOD LUCK!!
Enrico P.
Honored Contributor

Re: RP5450 rebooted with few traces left

Hi,
for the log until the reboot check the /var/adm/syslog/OLDsyslog.log.

For the crash file you need to set the SAVECRASH=1 e SAVECRASH_DIR= in the /etc/rc.config.d/savecrash file.


Enrico
RAC_1
Honored Contributor

Re: RP5450 rebooted with few traces left

Any problems with power supply/UPS?

Did you check console logs? and chasis logs?
Have you configured envd? If yes, anything in there?
There is no substitute to HARDWORK
mgais
Advisor

Re: RP5450 rebooted with few traces left

@Warren: sorry I'm not sure I understand what you mean with MP console. Nothing is connected to physical console (/dev/console).
I now checked the logs from GSP and there is some event logged there. I've to analyze those.

@Enrico: currently there is no SAVECRASH settings in crashconf, but there is a CRASHCONF_ENABLED=1. Is it the same? Anyway, the server has in some other circumstances dumped a crashfile.

Thanks to both!
Warren_9
Honored Contributor

Re: RP5450 rebooted with few traces left

hi,

MP = GSP, hope some hint from there...

GOOD LUCK!!
Warren_9
Honored Contributor

Re: RP5450 rebooted with few traces left

BTW,

the SAVECRASH is in /etc/rc.config.d/savecrash
not in the crashconf.

rariasn
Honored Contributor

Re: RP5450 rebooted with few traces left

Hi Massimo,

Verify also:

/etc/shutdownlog
/etc/rc.log
/etc/rc.log.old

rgs,

ran
Enrico P.
Honored Contributor

Re: RP5450 rebooted with few traces left

Hi,
the crashconf is for the dump, the savecrash is for the crash file creation.

For to see the mp log:

Log to console:

ctrl-b

user=Admin
password=Admin (default)

sl (show log)

select error log

Enrico
mgais
Advisor

Re: RP5450 rebooted with few traces left

Ok, I have 11 events logged in MP.
The oldest entry is:

ALERT LEVEL: 12 = Software failure

SOURCE: 1 = processor
SOURCE DETAIL: 1 = processor general SOURCE ID: 0
PROBLEM DETAIL: 0 = no problem detail

CALLER ACTIVITY: B = system panic STATUS: 0
CALLER SUBACTIVITY: 80 = implementation dependent
REPORTING ENTITY TYPE: E = HP-UX REPORTING ENTITY ID: 00

0xF8E000C01100B800 00000000 0000B800 type 31 = legacy PA HEX chassis-code
0x58E008C01100B800 00006A04 06130723 type 11 = Timestamp 05/06/2006 19:07:35

Next is an event of the type Software failure/system panic/major change in the system state, at same time.

Then another pair of similar "Software failure" 3 mins later, then six "Non-urgent operator attention required", 1s later, and last two warnings "System blocked waiting for operator input" a couple of minutes later.

Nothing in /etc/rc.log*

About the SAVECRASH, all the parameters there are commented out, and by default I think it means enabled. However, I will explicitely enable, to be on the safe side.
Saurav_1
Valued Contributor

Re: RP5450 rebooted with few traces left

Hi,

Please check the /var/adm/syslog/OLDsyslog.log

As root type 'elm' and say y for the next 2 prompts. you will get the mail in the folder.
If you find Event monitor Notification. Investigate them, Please attach them with your next post if you find anything interesting.

#set_fixed -L check the devices in down state. This may give some hint.
mgais
Advisor

Re: RP5450 rebooted with few traces left

Saurav,
thanks for your hints but:
- NO messages logged in any syslog (I save syslogs even older then last OLDsyslog, in case a server reboots twice in a row)
- set_fixed says all devices are OK (and in fact unless the other day I would have noticed that a session on that server dropped, I wouldn't have noticed the reboot)

It seems the only trace, before a 'sudden death' that didn't give any time to write any log, are the MP logs posted yesterday.

I guess I have to open a call with HP to interprete them.
Michael Steele_2
Honored Contributor

Re: RP5450 rebooted with few traces left

When you have a HW problem, like you do now, the O/S has been taken out of the loop. So take the GSP logs and start a call with HW support. HW support will be able to interpret the codes for you. :-)
Support Fatherhood - Stop Family Law
Jeff Schussele
Honored Contributor

Re: RP5450 rebooted with few traces left

Hi,

Most of the time when the ts99 contains no valid timestamps I've found that a bad CPU has panicked the system.
But HP can analyze the ts99 & tell you this for sure and also which CPU is the culprit.

HTH,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!