Operating System - HP-UX
1849849 Members
2025 Online
104044 Solutions
New Discussion

Re: help needed with system hanging

 
Ian_77
Occasional Contributor

help needed with system hanging

I have a l2000 box with hpux 11.11 that keeps on hanging. can't telnet, can't ftp, console stops responding. can ping. we have checked the system logs and gsp logs and can't see anything listed that could help us find what is causing it to hang. had personality card in machine replaced but still hangs. We are stuck and if you have any suggestions on what we could do to try to fix this then any help would be appreciated. I've attached a file with a list of any patches we have applied to this machine
11 REPLIES 11
Shaikh Imran
Honored Contributor

Re: help needed with system hanging

Hi,
I feel this is a very rare case Hp Risc based systems hanging.
You can go about checking few things
1) Any new hardware added recently
2) Kernel was tuned recently after doing any changes.
3) Network related checks
a) speed,duplex & negotiation settings from the server as well as switch side.
b) Also see the behaviour throug console if it hangs. I mean if you are unable to telnet or ping doeas the console works if yes then network is to be attacked.
4) Root Disk health ==> check for stale PE's
5) Run STM and take the information and if possible do verify (does not require license) and get the infromation of any device failure etc..,
6) Check the PDC firmware Updates
7) If the frequency of hanging is high,
Pls try to cp the current kernel and try to reboot it from a different kernel and observe.

Many More to Go......


Regards,



I'll sleep when i am dead.
Naveej.K.A
Honored Contributor

Re: help needed with system hanging

hi,

One of the reason why your system hangs may be a defective root hard disk. I have faced a similar problem with one of the old D370. The system would just not write anything to log files relating to the hardware failure but will just hang and used to behave in almost the same way, your system is behaving.
You can have a thorough check of your root hdd using the cstm utility and see if any errors are logged while testing.

With best wishes
Naveej
practice makes a man perfect!!!
Ralf Seefeldt
Valued Contributor

Re: help needed with system hanging

Hi,

I don't know, why this server is hanging - I have the same problem some times and mostly am able to solve it with a reboot.

I would propose you to write a short script, which logs some systemstatus (uptime, network use, filesystems, CPU use, the most active processes, and so on) may be every minute.
You mah find some more hits about your problems with this logfile.

I hope, this helps a little
Ralf
David Child_1
Honored Contributor

Re: help needed with system hanging

Two things to consider;

1. Do you know if it is truly hung or perhaps just running very, very slowly? I have had a couple servers start thrashing so badly that they appeared hung. I could start a telnet session and it would finally ask for the password a couple minutes later. Unfortunately it would time out before I could complete the session. I found out that the DBAs had increased the amount of memory allocated to their databases and this is what ate up all the memory. If you have MeasureWare running on these servers you could look back at what was going on during these lock-ups.

2. Do you have EMC PowerPath on this server? Depending on which version of PowerPath you have it could be causing the problem. I had this a while back. Sometimes the server would lock up and sometimes it would crash. It mostly occured when someone was using Glance. There are patches for PowerPath to fix this issue. I've had PowerPath installed for a long time, but the problem didn't appear until I applied some HP patches to my servers.

David
Anupam Anshu_1
Valued Contributor

Re: help needed with system hanging

Suggest you to check the followings:

1. Check if there is any nfs mount on the system (if the nfs mount becomes stale, it makes the system to hang).

2. If there is any while(1) loop or any such program running on the system. These can hog resources to the max and other users may feel the hang because of such a program running on the system.

3. Check for the network card/connection. If you can borrow a network card from some other machine and test it on your system. Also check for any problem in the network cable too (ie the whole of physical connection, may be if you can move the system to different location(where some other system is working fine). This will ensure that there is no problem in the physical connection).

4. Any program having memory leak or any recusion program running for some longer time. Verify the size of such process using top/glance/ps. Kill such processes.

Generally such hang is rare in HP-UX.

Best of luck,

Anshu

Sanjay_6
Honored Contributor

Re: help needed with system hanging

Hi,

Are you getting the crash dump. Check in /var/adm/crash. If so, you can do the crash dump analysis which will help identify the problem. Do a toc if you are not getting a crash dump.

Hope this helps.

Regds
KapilRaj
Honored Contributor

Re: help needed with system hanging

The system may behave like this when it runs out of memory !! Hv a cron script to run & log vmstat so that you are aware what is the state at the time when u had a problem.

Kaps
Nothing is impossible
Brian King
Advisor

Re: help needed with system hanging

You don't mention if the system boots fine and is operation for a period of time and then hangs. If this is the case then you should check (via Perf or some other capturing method as mentioned by others in this thread) to see if there were any performance issues when the system started to hang. High memory utilization will definitely cause a problem if the system starts to thrash. Once in this state, the system will start to kill processes that appear to be idle (for a period of time), including inetd which obviously prevents telnet and ftp sessions. I've witnessed this first hand.

Hope this helps,

Brian
Anupam Anshu_1
Valued Contributor

Re: help needed with system hanging

Hi Ian,

Let us know if your problem us over. What did you do to resolve it.

If the problem is not resolved, then let us know about the problems you are facing, error messages.

Hope we will be able to help and resolve the issue.

Cheers,

Anshu

PS: When do you plan to assign points to your questions. Till now, none of your questions you have assigned any points. Looks (feels) like the forum couldn't help you resolve any of your issues.
Ian_77
Occasional Contributor

Re: help needed with system hanging

There has been alot of responses so will try to reply to everything mentioned. We are still having the issue

ems software - are loading tonight and we hope it will give us a indication of what is causing the mahcine to crash
pdc revision - updated but still crashes
tombestones - nothing in here
crash files - nothing in here - what is toc?
This morning it hung again. had to do a reset to get it up - came up with error about disk 6 in gsp log for first time so possible faulty disk??
tried to run mstm on the console but for this command it seemed to not clear the screen and kept partly overwriting so couldn't get any furuther on this. Was working with a hp engineer over the phone on this and he wanted me to analyse memory and cpu but couldn't do this because of the screen not being clearly visible so stuck on this one.

when machine hangs we can still get to the gsp prompt. log files have shown nothing before besides the disk error so really need to run mstm go test but can't get this working.

kernal parameters that we have changed at some point

* Tunable parameters

STRMSGSZ 65535
dnlc_hash_locks 512
max_thread_proc 256
maxdsiz 1879048192
maxdsiz_64bit 1879048192
maxfiles 2048
maxfiles_lim 2048
maxssiz 0X800000
maxssiz_64bit 0X800000
maxswapchunks 2048
maxtsiz 0X4000000
maxtsiz_64bit 0X40000000
maxuprc 300
maxusers 250
maxvgs 20
ncsize 10120
nfile 40000
nflocks 1000
ninode 5000
npty 180
nstrpty 180
nstrtel 180
semmni 300
semmns 1500
semmnu 1500
shmmax 0X4000000
system: END

We do have only 1 application running on this server. Was crashing, although not as frequently before it was put on

Cpu states:
CPU LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS
0 0.72 100.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
1 0.41 0.0% 0.0% 0.0% 100.0% 0.0% 0.0% 0.0% 0.0%
--- ---- ----- ----- ----- ----- ----- ----- ----- -----
avg 0.57 49.5% 0.0% 0.0% 50.5% 0.0% 0.0% 0.0% 0.0%

Memory: 348480K (316468K) real, 385980K (342868K) virtual, 1728596K free Page#
1/11

CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND
0 ? 3343 planner 241 20 301M 291M run 320:39 99.99 99.82 smauto


on all times we have to do a physical reset. has worked fine on bootup all except this morning where we needed to boot it 2 times because of a gsp error on disk6.

Once we have the problem resolved I will assign points to the people who helped lead us to a solution. We are looking at this stage in perhaps replacing our server since it is a critical server and is crashing about 3-4 times a week now. Thank you for all your ideas so far.

I think I may of just got mstm to work so will look at analysing things now
Ian_77
Occasional Contributor

Re: help needed with system hanging

This server seems to not be crashing now. Although we didn't find out for sure what was causing it to crash, your input helped in trying to find a solution.

In the process we
install emc, but didn't have enough space for crash dump
run program to analyse system to try to find faults.
replaced a faulty fan
replaced the main box,
replaced the root disks
install hpux 11.0 again and point to disk array

We still don't know for sure what was causing it to crash, perhaps it could of been the root disks which we replaced in the end or something to do with the o/s which was installed from scratch.

Thank you for your help. At this stage for about 1 week now it has not been crashing.