1834713 Members
2363 Online
110069 Solutions
New Discussion

Re: System hang (rp7400)

 
Anh.Quan
Frequent Advisor

System hang (rp7400)

Dear Expert,

I have cluster in 2 node (server rp7400). In Some day Node 2 hung. I check syslog and nothing happen. Please help me!

This is content of GSP log.
==========================================
ALERT LEVEL 13: = System hang detected via timer popping
SOURCE :1 = PROCESSOR
SOURCE DETAIL: 1 = PROCESSOR GENERAL SOURCE ID: 0
PROBLEM DETAIL: 4 = timeout
CALLER ACTIVITY :F = display_activity() update STATUS 0
................
==========================================

Many thanks
Q.Vu.
7 REPLIES 7
Victor Fridyev
Honored Contributor

Re: System hang (rp7400)

Sameer_Nirmal
Honored Contributor

Re: System hang (rp7400)

Hi,

I would suggest to log a h/w call to HP and provide them logs mentioned below to get ride of the issue ASAP.

Alert 13 indicates GSP has detected the HPUX OS is hung. There is heartbeat mechanism alongwith a timer between MP and HPUX. If timeout occurs, this event would be logged and system would be TOC'ed by GSP.

As the log show SOURCE as processor, it maybe possible the hung occured on account of one of the processor's failure to respond or died.

There should a "chassis code" entries just below the message you posted and other log entries ( GSP activity and error logs )which are important to be noted.

Since the system is working now, you can check errors in dmesg,OLDsyslog, /etc/shutdownlog

System chassis logs could also be retrieved using
# cclogview /var/stm/logs/os/ccerrlog > test
2) Via STM's (cstm or mstm or xstm) LOGTOOL "Chassis / View Error Log" utility.
When using LOGTOOL to gather this data, be sure to specify "DETAILS" to get the detailed chassis code data.
You need to have latest version of STM installed.

The crash file generated in /var/adm/crash
may need to analized to know the cause.
Steven E. Protter
Exalted Contributor

Re: System hang (rp7400)

Shalom Q.Vu,

System hangs can also occur because a system is not properly patched.

The timer popping issue never caused a hang on any of my systems.

I recommend setting crashdump to on in the configuration file /etc/rc.config.d/savecrash by setting the first variable to 1.

You may be having a processor problem and I'd recommend seeing how many processors the system sees versus how many you documented are installed.

There should be a HPMC, High priority machine check if a processor has been lost.

If your system hangs, use GSP/CSP to TOC, transfer of control. This will crash the system with a crash dump, which you should process with the q4 utility and send to HP for further analysis.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Anh.Quan
Frequent Advisor

Re: System hang (rp7400)

Dear All,

How can i idenfify error CPU to replace it ??

Tks.
Rocco Foti
Occasional Advisor

Re: System hang (rp7400)

Hi,
can you see the initial cpu tests results?

regards
Rocco
Anh.Quan
Frequent Advisor

Re: System hang (rp7400)

Ys, the result test of CPU is OK. but System still down.
Anh.Quan
Frequent Advisor

Re: System hang (rp7400)

Ys, the result test of CPUs is OK. but System still down.