HPE 9000 and HPE e3000 Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

Server down! GSP message: "System Hang detected via timer popping"

 
Yogeeraj_1
Honored Contributor

Server down! GSP message: "System Hang detected via timer popping"

Hello experts!

Yesterday, my RP5430 went down by itself and the GSP showed the following error message on the server console:
============================================================
System Alert
Reason for alert
source: 1 = processor
Source detail: 1 = processor guard Sourceid = 0
Problem Detail: 4 = timeout

ALERT LEVEL: 13 = System Hang detected via timer popping.
SOURCE: 1=processor general

LEDs:
RUN: OFF
Attention: OFF
Fault: OFF
Remote: OFF
Power: Flash

System power is OFF.
============================================================

Also,
- /var/adm/syslog/syslog.log does not indicate any errors!
- There are also no "crash dumps" in /var/adm/crash
- /var/adm/shutdownlog does not indicate any shutdown

How do i proceed in the troubleshooting? What could be the real cause?

Thank you in advance for your replies.

Best Regards
Yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)
30 REPLIES 30
Ravi_8
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Hi, Yogee

Look like power has been disrupted, so the machine went down. when the power supply resumed power LED started flashing.
could you try to boot the machine and let's know
never give up
T G Manikandan
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

if there are no entries in the syslog or shutdownlog.

Can you please check the GSP logs.
this could also result in a processor problem.

check the other logs of the GSP and revert
Eugeny Brychkov
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

I agree with TG. Check GSP logs if there's anything pointing to CPU, power supplies or fan failure. Two more useful GSP commands: SS and PS
Eugeny
Yogeeraj_1
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

hi,
Additional info:
================
1. there was no power-cut.
2. GSP log
============================================================
Log Entry # 1:
SYSTEM NAME : SLX1
DATE : 01/19/2003 TIME:09:02:12
ALERT LEVEL : 13 = SYSTEM HANG DETECTED VIA TIMER POPPING

SOURCE : 1 = PROCESSOR
SOURCE DETAIL: 1 = PROCESSOR GENERAL SOURCE ID = 0
PROBLEM DETAIL : 4 = TIMEOUT

CALLER ACTIVITY : F =DISPLAY_ACTIVITY () UPDATE STATUS=0
CALLER SUBACTIVITY:00 = IMPLEMENTATION DEPENDENT
REPORTING ENTITY TYPE : E = HP-UX REPORTING = ENTITY ID : 00

0x78E000D41100F000 00000003 00000000 TYPE 15 = ACTIVITY LEVEL / TIMEOUT
0x58E008D41100F000 00006700 1309020C TYPE 11 = TIMESTAMP 01/19/2003 09:02:12
===========================================================

thank you for your reply.

Regards
Yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)
Rajeev Shukla
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Hi Yogeeraj,
Get in touch with HP its an CPU problem.
I too got this error once on a A500 server which finally was detected as a CPU problem.

Cheers
Rajeev
Yogeeraj_1
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

hi Rajeev,

how can you conclude this is an CPU problem?

Does the GSP message contain sufficient information that confirms that?

thank you for a reply.

Best Regards
Yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)
T G Manikandan
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Yogeeraj,

Also do check the firmware version on the machine.

echo "selclass qualifier cpu;info;wait;infolog"|cstm > /tmp/firmware

Revert
Yogeeraj_1
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

hi,
uploading the output from:
echo "selclass qualifier cpu;info;wait;infolog"|cstm > /tmp/firmware

Regards
Yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)
Patrick Wessel
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

How would it be to start with a more analytic attempt?

What does ???ALERT LEVEL: 13 = System Hang detected via timer popping.??? mean?
All what we can say is that the Operating System hung. The next step, after this message occurs, is to perform a transfer of control (TOC) and analyze the dump file. Anything else is wild guessing
You are able to configure your system for an automatic restart. Any time an alert level 13 occurs, it will TOC it self without manual interaction
There is no good troubleshooting with bad data
Jeff Schussele
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Hi Yogeeraj,

I've seen this before & what is timing out is a CPU in the system. It timed out because it's waiting for a resource that a "hung" CPU is never going to release. Believe the default timeout value is one minute. Anyway you'll have to analyze the crash dunmp to determine just which CPU hung up. If it didn't create a dump you'll have to TOC it the next time it hangs - and it probably will. The bad CPU won't fix itself.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Eugeny Brychkov
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

According to your attachment there's no valid timestamp in tombstone so there was no error registered by this processor. You posted only latest event from GSP log about CPU hanging. Yous should look at preceiding GSP events, maybe answer is there
Eugeny
Yogeeraj_1
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Dear Patrick,
Can you please ellaborate a bit on this transfer of control - TOC? Should i do it manually or it is something automatic?

Hi Jeff,
Hang- In fact, we did not experience any hang! The server just went down and as i said we had no other option but to power it back ON.


Hi Eugeny,
Note that there are no other errors at the GSP level!


the previous message was from 17/01/2003:
==========================================================================================
Log Entry # 4:
SYSTEM NAME : SLX1
DATE : 01/17/2003 TIME:09:53:06
ALERT LEVEL : 10 = BOOT POSSIBLE, FUNCTIONALITY LOST

SOURCE : 3 = PDH
SOURCE DETAIL: 6 = INTERCONNECT MEDIUM SOURCE ID : 0
PROBLEM DETAIL : 3 = NON-RESPONDING, MAY NEED GSP RESET

CALLER ACTIVITY : 2 =OPERATION STATUS : 0
CALLER SUBACTIVITY:02 = PLATFORM INTERNAL INTERCONNECT REPORTING ENTITY TYPE : 1 = SERVICE PROCESSOR REPORTING ENTITY ID : 00

0x581008A336002020 00006700 11093506 TYPE 11 = TIMESTAMP 01/17/2003 09:53:06
==========================================================================================

Also note that, i ran the GUI version of STM today and there was not error messages except CPU (5e0) 33 was yellow with the "Information Killed message" which disappeared when we check it's information! (became "Exercise Successful")

Please help! This problem is still a mystery...

Best Regards
Yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)
Patrick Wessel
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Yogeeraj,
A TOC forces the system to write a memory dump. This is usually the only way to find out what happened on a hanging system (and your box definitely hung)
The manual way to perform a TOC is to log onto the GSP and enter: TC
An other way is to use the AR command of the GSP to configure an automatic restart. Whenever the system runs into a hang the GSP will TOC the system automatically.
Your support provider will help you to analyze the memory dump produced by the TOC to find the reasons for the hang.
There is no good troubleshooting with bad data
Yogeeraj_1
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Dear Patrick,

If my server is DOWN! (LEDs: RUN=ATTENTION=FAULT=REMOTE=OFF; POWER=FLASH) Is it true that the "TC" command at the GSP can still write a memory dump?


Whenever the system runs into a hang the GSP will TOC the system automatically.

Is this something configurable?

How do we analyze this "memory dump produced by the TOC" ?


Thank you a lot for your time and precious guidance.

Best Regards
Yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)
Eugeny Brychkov
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Yogeeraj,
it's useless to do TOC when server functions. If server, like in your case, hangs, then if GSP does TOC then there should be crash dump and tombstones saved after system restart.
As soon as you stated that there're no these files appeared, then I do not believe it was a TOC - it was simpy power off. Why this power off occur? For solution I think you should call HP. Anyway this last GSP event you posted about interconnect error is not good and from my point of view pointing to hardware issue. And my last guess: when something wrong occurs with server - too many fans failure, PS failure etc - before this 'timer popping' event GSP logs 'real' event caused this hang/power off. As soon as in your case NOTHING was logged by GSP I suspect that server has GSP issue
Eugeny
Antonio Franco
Occasional Visitor

Re: Server down! GSP message: "System Hang detected via timer popping"

This Alert in itself is just saying that the
GSP detected that HPUX is no longer responding.

1)Check proceeding messages for additional
information.

2) If the system didnt TOC on it's own, issue
a "TC" command from the GSP prompt.

3) Call HP to have them look at the coredump
and /var/tombstones/ts99 for possible cause.


*** DONOT automatically assume that a "processor" is bad.. Processor PDC is just
the messenger*****
Patrick Wessel
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Yogeeraj,
A TC helps you to collect troubleshooting data when a system hangs. Anything else than analyzing the toc-dump is wild guessing when you deal with a system hang (and the GSP message you posted was a hang)
The TC is only helpful when the system hangs, not after a system crash.

You are able to configure you system to perform an automatic restart. Therefore you need to go to the GSP and enter AR. Set the automatic restart for alert level 13.

Don???t mind about the ???interconnect medium is not responding??? message that is a whole new ballgame and has nothing to do with the system hang. This might be solved with a firmware update on the GSP but it???s not a hardware defect.
There is no good troubleshooting with bad data
Guilherme Belinelo
Occasional Advisor

Re: Server down! GSP message: "System Hang detected via timer popping"

Yogeeraj,

I had the same problem last week and HP team classified it as a hardware problem.

Talking to the technician, he ask me if the disks were "freezing" and to take the disks out and reconnect after a system hang (my disks are hot-swap). It worked !

After that we updated the disks firmware using ODE and dfdutils2 and everything is ok.

Hope it helps.

Regards,

Guilherme
Steven E. Protter
Exalted Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Sorry to bother you Yogeeraj.

I have the same problem.

Right now.

How did you eventually fix it?

Hardware is here and he doens't know what to do.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
G. Vrijhoeven
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Hi,

We had the same error message. IS you GSP CLAIMED when you perform an ioscan?

Gideon
G. Vrijhoeven
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

 
Ron Thompson
Advisor

Re: Server down! GSP message: "System Hang detected via timer popping"

We have had this same problem on several of our systems. It has always been repaired by having the platform monitor board replaced. Other indications are a GSP error log of a fan not working when they are all working and crashes that keep hapening. Sometimes you can not get the system to boot until you physically remove the power connection from the machine and then plug it back in.
Jeff Schussele
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Hi Vrijhoeven,

That has to be an all-time record for length of post.
Somebody call Guinness!!!

Cheers my friend,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Tobias Hartlieb
Trusted Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Hi Yogeeraj,

I don't know why you assigned Vrijhoeven only 0 points for his (lengthy) response, because he was actually dead right:

Update the GSP FW will prevent the system from having this Problem again. It is likely caused by a busy I2C bus (you can check the I2C bus, btw, from GSP>XD ...). GSP FW since Version B.02.17 has fixed it.
It might help to update OnlineDiagnostic, too!
Diagnostic uses this I2C bus as well. Have a Diagnostic version > A.30.00 (start screen of 'cstm' will have revision).

For most recent GSP Version B.02.20 (just out), check:
http://itrc.hp.com
=> individual patches
=> Firmware
=> Search PF_CCANGSPB0220

Note:
Patch Dependencies:
s800: 11.00: PHNE_27393 PHCO_27370
s800: 11.11: PHNE_26326 PHCO_27243

...and the Warning:
If the currently installed firmware is older than B.02.15 then updating to GSP firmware revision B.02.20 requires updating to B.02.15 first.



For OnlineDiag, look at
http://www.software.hp.com/cgi-bin/swdepot_parser.cgi/cgi/displayProductInfo.pl?productNumber=B6191AAE


Regards.

Tobias