HPE 9000 and HPE e3000 Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

System hang detected via timer popping

 
Wim Rombauts
Honored Contributor

System hang detected via timer popping

I know there are already a lot of threads about this message, bot none of them are conclusive :

One of our cluster nodes (rp5470 2-CPU)suddenly went down with only the following message in the GSP (version B.02.14):

************* SYSTEM ALERT **************
SYSTEM NAME: krimson-gsp
DATE: 05/14/2004 TIME: 06:46:51
ALERT LEVEL: 13 = System hang detected via timer popping

REASON FOR ALERT
SOURCE: 1 = processor
SOURCE DETAIL: 1 = processor general SOURCE ID: 0
PROBLEM DETAIL: 4 = timeout

LEDs: RUN ATTENTION FAULT REMOTE POWER
OFF OFF OFF OFF FLASH
System Power is Off.

0x78E000D41100F000 00000003 00000001 - type 15 = Activity Level/Timeout
0x58E008D41100F000 00006804 0E062E33 - type 11 = Timestamp 05/14/2004 06:46:51

Let me be clear :
The system did not HANG, so a TOC is not an option to find more info.
The GSP comand "SS" returned with saying that System power was off. I could turn the system back on with the PC command, and after that it booted normally.
There is no entry in syslog.log, OLDsyslog.log or shutdownlog.
There are no other GSP messages just before the above GSP alert.
Since it happened (thursday may 20) it has not happened again and no new logs indicate any problem.
We are running a serviceguard cluster, but I don't believe that anythin in the cluster could cause the power to be switched off. It could halt, reboot or TOC a system, but not power-off (if I'm right).

Iam already in contact with HP support, but they don't seem to be able to lay their finger on the heart of the problem.

I know many of you have seen the same issue. Could you tell me what solvged this in your case ?

Regards, and many thanks in advance
6 REPLIES 6
Kent Ostby
Honored Contributor

Re: System hang detected via timer popping

Wim --

I have only seen this when I've had a true hang on the box, but I suppose that you could "hang" long enough if you had a "mini-hang" that went for some time but then righted itself.

Something quick to check which is "uptime" on the box. Does this show that we really weren't down at this time (i.e. that we've been up longer then 4 days) ?

Nothing in shutdownlog doesn't mean the box didn't go down .. there are cases (especailly powerfails) that can reboot a box without a shutdownlog entry.

"uptime" on the other hand will tell us how long we've been running.

Best regards,

Kent M. Ostby
"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"
Wim Rombauts
Honored Contributor

Re: System hang detected via timer popping

Hello Kent.

I think you misunderstood. The system went down. There is no discussion about that. If the power is off, I doubt that any system can keep running :-)
Kent Ostby
Honored Contributor

Re: System hang detected via timer popping

Wim .. sorry .. I misread the part about "The system did not HANG" to mean that it continued normal operations.

Given what you have said, is there any data in the /var/tombstones/ts99 file ?

You can attempt to re-save a system crash by the following:

savecrash -r -v

where directory is the destination where you would like the crash to end up in.

"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"
Roopashree B
Occasional Visitor

Re: System hang detected via timer popping

Hi Wim,

I too have an rp2470 with a single CPU which just hangs giving the similar "alert level 13" message.
The GSP f/w version i am using is C.02.14
Could you please tell me what was the solution to your problem?

Thanks,
Roopa
Wim Rombauts
Honored Contributor

Re: System hang detected via timer popping

I don't know if I can help you.
My system was a rp5470 and uses a different GSP firmware. Firmware version B.02.20 contained a fix for the symptoms we had on our server.
After the firmware upgrade of the GSP, the problem has not returned.
Wim Rombauts
Honored Contributor

Re: System hang detected via timer popping

I didn't even know this thread was still open ...