HPE 9000 and HPE e3000 Servers
1752427 Members
5789 Online
108788 Solutions
New Discussion юеВ

Server down! GSP message: "System Hang detected via timer popping"

 
Jeff Schussele
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Hi Yogeeraj,

I've seen this before & what is timing out is a CPU in the system. It timed out because it's waiting for a resource that a "hung" CPU is never going to release. Believe the default timeout value is one minute. Anyway you'll have to analyze the crash dunmp to determine just which CPU hung up. If it didn't create a dump you'll have to TOC it the next time it hangs - and it probably will. The bad CPU won't fix itself.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Eugeny Brychkov
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

According to your attachment there's no valid timestamp in tombstone so there was no error registered by this processor. You posted only latest event from GSP log about CPU hanging. Yous should look at preceiding GSP events, maybe answer is there
Eugeny
Yogeeraj_1
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Dear Patrick,
Can you please ellaborate a bit on this transfer of control - TOC? Should i do it manually or it is something automatic?

Hi Jeff,
Hang- In fact, we did not experience any hang! The server just went down and as i said we had no other option but to power it back ON.


Hi Eugeny,
Note that there are no other errors at the GSP level!


the previous message was from 17/01/2003:
==========================================================================================
Log Entry # 4:
SYSTEM NAME : SLX1
DATE : 01/17/2003 TIME:09:53:06
ALERT LEVEL : 10 = BOOT POSSIBLE, FUNCTIONALITY LOST

SOURCE : 3 = PDH
SOURCE DETAIL: 6 = INTERCONNECT MEDIUM SOURCE ID : 0
PROBLEM DETAIL : 3 = NON-RESPONDING, MAY NEED GSP RESET

CALLER ACTIVITY : 2 =OPERATION STATUS : 0
CALLER SUBACTIVITY:02 = PLATFORM INTERNAL INTERCONNECT REPORTING ENTITY TYPE : 1 = SERVICE PROCESSOR REPORTING ENTITY ID : 00

0x581008A336002020 00006700 11093506 TYPE 11 = TIMESTAMP 01/17/2003 09:53:06
==========================================================================================

Also note that, i ran the GUI version of STM today and there was not error messages except CPU (5e0) 33 was yellow with the "Information Killed message" which disappeared when we check it's information! (became "Exercise Successful")

Please help! This problem is still a mystery...

Best Regards
Yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)
Patrick Wessel
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Yogeeraj,
A TOC forces the system to write a memory dump. This is usually the only way to find out what happened on a hanging system (and your box definitely hung)
The manual way to perform a TOC is to log onto the GSP and enter: TC
An other way is to use the AR command of the GSP to configure an automatic restart. Whenever the system runs into a hang the GSP will TOC the system automatically.
Your support provider will help you to analyze the memory dump produced by the TOC to find the reasons for the hang.
There is no good troubleshooting with bad data
Yogeeraj_1
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Dear Patrick,

If my server is DOWN! (LEDs: RUN=ATTENTION=FAULT=REMOTE=OFF; POWER=FLASH) Is it true that the "TC" command at the GSP can still write a memory dump?


Whenever the system runs into a hang the GSP will TOC the system automatically.

Is this something configurable?

How do we analyze this "memory dump produced by the TOC" ?


Thank you a lot for your time and precious guidance.

Best Regards
Yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)
Eugeny Brychkov
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Yogeeraj,
it's useless to do TOC when server functions. If server, like in your case, hangs, then if GSP does TOC then there should be crash dump and tombstones saved after system restart.
As soon as you stated that there're no these files appeared, then I do not believe it was a TOC - it was simpy power off. Why this power off occur? For solution I think you should call HP. Anyway this last GSP event you posted about interconnect error is not good and from my point of view pointing to hardware issue. And my last guess: when something wrong occurs with server - too many fans failure, PS failure etc - before this 'timer popping' event GSP logs 'real' event caused this hang/power off. As soon as in your case NOTHING was logged by GSP I suspect that server has GSP issue
Eugeny
Antonio Franco
New Member

Re: Server down! GSP message: "System Hang detected via timer popping"

This Alert in itself is just saying that the
GSP detected that HPUX is no longer responding.

1)Check proceeding messages for additional
information.

2) If the system didnt TOC on it's own, issue
a "TC" command from the GSP prompt.

3) Call HP to have them look at the coredump
and /var/tombstones/ts99 for possible cause.


*** DONOT automatically assume that a "processor" is bad.. Processor PDC is just
the messenger*****
Patrick Wessel
Honored Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Yogeeraj,
A TC helps you to collect troubleshooting data when a system hangs. Anything else than analyzing the toc-dump is wild guessing when you deal with a system hang (and the GSP message you posted was a hang)
The TC is only helpful when the system hangs, not after a system crash.

You are able to configure you system to perform an automatic restart. Therefore you need to go to the GSP and enter AR. Set the automatic restart for alert level 13.

Don???t mind about the ???interconnect medium is not responding??? message that is a whole new ballgame and has nothing to do with the system hang. This might be solved with a firmware update on the GSP but it???s not a hardware defect.
There is no good troubleshooting with bad data
Guilherme Belinelo
Occasional Advisor

Re: Server down! GSP message: "System Hang detected via timer popping"

Yogeeraj,

I had the same problem last week and HP team classified it as a hardware problem.

Talking to the technician, he ask me if the disks were "freezing" and to take the disks out and reconnect after a system hang (my disks are hot-swap). It worked !

After that we updated the disks firmware using ODE and dfdutils2 and everything is ok.

Hope it helps.

Regards,

Guilherme
Steven E. Protter
Exalted Contributor

Re: Server down! GSP message: "System Hang detected via timer popping"

Sorry to bother you Yogeeraj.

I have the same problem.

Right now.

How did you eventually fix it?

Hardware is here and he doens't know what to do.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com