cancel
Showing results for 
Search instead for 
Did you mean: 

System hang issues

navin
Super Advisor

System hang issues

Hello All,

We had a system hand and had to reboot the box - since it is a system hang no crash - we did want to do the clean shutdown so we did not do a toc.If we have donw the toc - we would have collected the crash dump - but the system did not crash - so we dont want to do that.But hp could not find out the cause for the system hung with what we provided(syslog ,etc) .They want to see the crash dump - Please advice in this situvation - instead of toc what else we can provide to hp to analyze about the system hung -
pls advice
Learning ...
5 REPLIES
Patrick Wallek
Honored Contributor

Re: System hang issues

Not much.

Without the crash dump about all you have available are logs. If you have MeasureWare running, you could possibly send the measureware data files so that performance data for the time of the hang can be extracted.

At this point, "after the fact," it is very difficult to diagnose a "system hang."
Tingli
Esteemed Contributor
Tingli
Esteemed Contributor

Re: System hang issues

Sorry, ignore my post as it is for VMS.
Bill Hassell
Honored Contributor

Re: System hang issues

'hang' really means that something is not coming back from the server over specific channels (ie, LAN connections). When the system hangs, you must go to the real console (not a network) and see if you can get a prompt and login. If not, the system cannot be diagnosed without a crash dump. If you can login, you need to capture real time information such as top, vmstat, ps -ef, Glance (if present).

A hang may be caused by a bad disk or a bad NFS server, but if your system isn't being patched regularly, patches are probably the real cause.


Bill Hassell, sysadmin
Duncan Edmonstone
Honored Contributor

Re: System hang issues

>> we did want to do the clean shutdown so we did not do a toc

How were you going to do a clean shutdown if the system was hung and you couldn't login? That implies you were able to login - which in turn implies the system wasn't *completely* hung - so you could have done some diagnosis with HP there (assuming you had the time). Presumably you were under pressure to return the system to service and had to reboot? At that stage you should have made it clear to your management that the choices were i) do a clean rebootand lose vital data for diagnosing root cause or ii) do a TOC and collect a full xrash dump for analysis.

Why were you worried about doing a TOC anyway? If the system was already "hung" (or at least "semi-hung"), then presumably you application was in an unknown state already - doing the hard stop that a TOC implies could hardly have made things worse...

>> Please advice in this situvation - instead of toc what else we can provide to hp to analyze about the system hung

This is like asking aircraft crash investigators to determine exactly why a plane crashed without the black box - there's only so much to go on, and a lot of the evidence is destroyed - without a crash dump you will just have to wait for the hang to happen again and then do the right thing...

HTH

Duncan

HTH

Duncan