HPE 9000 and HPE e3000 Servers
1752590 Members
3049 Online
108788 Solutions
New Discussion юеВ

rp5470 random halts

 
SOLVED
Go to solution
Timo J
Frequent Advisor

rp5470 random halts

Anyone to interpret tombstone information on following problem?

One of our rp5470 servers halts randomly between 1-3 weeks. Nothing on syslog, nothing on /var/adm/crash (maybe because RAM=6G and /var has only 2.5G free space available.). Only clue that might help, is tombstone. Last tombstone included.
N/A
16 REPLIES 16
Torsten.
Acclaimed Contributor

Re: rp5470 random halts

There is nothing attached.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Timo J
Frequent Advisor

Re: rp5470 random halts


Uhm sorry. Now with better luck.
N/A
Matti_Kurkela
Honored Contributor
Solution

Re: rp5470 random halts

Have you checked the GSP logs?

If you have a LAN cable connected to the GSP, logon to the GSP, type the Ctrl-E - c - f keystroke and press Ctrl-B to access the GSP.

If you're using the local serial console, just press Ctrl-B.

There are various GSP commands that might be useful in this case:

- PS (Power Status) can be used to verify that all your system's PSUs and fans are OK.

- SL displays hardware-level error logs, one message at a time. Type SL, then E for Error log, press Enter for no filtering and then view the messages one at a time.

- CL displays the console log, i.e. what has been sent to the console terminal in recent past. It's most useful when viewed immediately after the crash/reboot: if the kernel displayed a panic message, you can review it. Note that the console log is a ring buffer: when new data is added by any actions on the system console, the oldest data is thrown away.

MK
MK
F Verschuren
Esteemed Contributor

Re: rp5470 random halts

I am not a hartware engenier but what I do know if that if your ts99 looks like this you will need to have a hartware engenier...

going true the ts?? file I see 4 procesors and tree are having no valit time stamp, do you use normaly all 4 cpu's?

mits
Respected Contributor

Re: rp5470 random halts

Your TS file has the timestamp Mon Sep 17 03:33:03 GMT 2007. It is too old. So I assume your system did not halt with HPMC. As Matti Kurkela suggested, you need to capture hardware logs to understand your problem. Also you can describe how the system haled. Did the system just halt? Or was its DC power shut down?
Bill Hassell
Honored Contributor

Re: rp5470 random halts

There won't be anything in syslog for system crashes or power failures because there is no OS running at that moment. Look at /etc/shudownlog for some clues. When you say "halts", do you mean that the machine stops running? Check the lights and the GSP/MP port to see if it has really halted. Or do you mean that it stopped and rebooted automatically? In this case, a system crash may have taken place. Check /etc/rc.config.d/savecrash to make sure SAVECRASH=1 is uncommented. It does not matter if /var is too small. Withe SAVECRASH=1, a partial dump will be created in /var/adm/crash.


Bill Hassell, sysadmin
Timo J
Frequent Advisor

Re: rp5470 random halts

Yep, ts99 was little bit old. The timestamp of the file itself got updated when I powered up the system.

Last entries on GSP Error log are included. Looks like there were some problems with DC power, though the system was up & running at least couple of hours after 16:17:29 when that DC error was logged.

In fact the system didn't halt (cleanly) as I told on initial post. It just fell down so fast that it couldn't wrote anything to syslog etc. As Mits thought, it might have been just DC power shut down. So next job is to identify faulty part...

N/A
Steve Post
Trusted Contributor

Re: rp5470 random halts

I've had this happen. The UPS is running fine. It is set up to periodically test the batteries. But the batteries are dead. So when it tests those, the system crashes. To add insult to injury it does not complain about the dead batteries, even though that was the purpose of the test. (an old K570)

Here's another one, the UPS is running fine. The cable between the UPS and the server doesn't run so hot. When the serial connection drops, so does the power. (this one was from the early 1990's).

Third one. The UPS and server are fine. I plug in a laptop into the UPS with a serial cord. The UPS doesn't like the serial cord you use. It powers off and takes the server with it. (this happened to me 3 months ago....ow).
Michael Steele_2
Honored Contributor

Re: rp5470 random halts

Dear Mikko:

There are a lot of places to look at when you have a panic. Tombstones is only one place and its there for recording HPMCs which is a CPU related problem. You don't have any HPMCs.

/etc/shutdownlog will give you an idea of whether or not the panic was O/S or HW related. So paste the last lines of this file for evaluation.

GSP error logs can be the most valuable. Another responder has already gone over this. This is usually where most admins start. Your looking for ALERT levels, i.e., ALERT LEVEL 10, 12, etc. Get this data.

There are also the /etc/opt/esmon logs. You should check these as well. Persistence, register, etc. There's a half a dozen that you should cross reference by time stamp of panic.
Support Fatherhood - Stop Family Law