1840135 Members
1864 Online
110161 Solutions
New Discussion

HP-UX boxes machine hung

 
romano r
Frequent Advisor

HP-UX boxes machine hung

hello,
this is the 2nd time in a couple of weeks that 2 of my UX11i workstation hung (B2000 1,2 GB RAM, C3000 1,5 GB RAM, Dec '06 patched):
the 2 machines were hanged and was not possible to log in neither from console (the ping was ok), I had to "hard rebooting" the machines, and then it seems was ok, after a while (5') they hanged again, reboot again and now ok.
On one machine, before it hangs again I noticed that the rbootd daemon was in the top of CPU usage, but I don't know if it is meaningful. Furthermore, I checked the system logs, but I haven't noticed nothing relevant.
It seems that the reason of haging comes "from outside" because the 2 machines has the same problem in the same "time slot", about: 8:00am - 8:30am, in this time the main external "factor" is that netbackup daemon starts(bpbkar32) from the central backup server. Some problem occourred also to a rp3440 and the main issue I noticed was that the users can't login with "exceed" or any graphical terminal, but was accessible with telnet/ftp/rlogin, after aabout 1h (without doing nothing) the previous problem is disappeared, the backup is ended, and also the rbootd daemon load slew down.
On other UX10.20 machines (even if under backup) I don't see any problem.
On another rp3440 I noticed nothing like the above menthioned.
All the machines mount the same NFS exports.

Any idea about the causes and how to arrange a solution?
thank you
Romano
5 REPLIES 5
Steven E. Protter
Exalted Contributor

Re: HP-UX boxes machine hung

Shalom,

I'd investigate the possibility that machines are jumping on the network with the same IP address during this time window.

I'd also check the crontab file for all users for heavy hitting jobs that eat the whole system.

Your investigative process to date is good.

Check the logs of the NFS servers that are being mounted.

If there is an IP conflict on a system that relies on remote NFS mounts, this will quickly and easily hang the system.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Todd McDaniel_1
Honored Contributor

Re: HP-UX boxes machine hung

Romano,

Are these backups every day? or only once a week? It seems a bit odd that the hangs only happen every 2 weeks, if you have backups running every day.

Do you have active logins on the boxes during the hang? I would leave a user logged in so that you can check when the box hangs, to see what is available at the time of the hang.

I have seen cases where new connections are refused but existing IDs logged in were able to work normally.

Also, I would monitor your system resources during this time of the backups. It may be that you are experiencing some bottleneck that can hang a box.


Regards,

Todd A McDaniel
Unix, the other white meat.
romano r
Frequent Advisor

Re: HP-UX boxes machine hung

Thank you for your help!
Steven, the suggestion of IP conflict is for sure a way I'd follow.
Todd, you're right the backups shouldn't be the problem. Yes during the hang users were in, but their session stucked, I know what you mean, but this is not the case.

Regards
Romano
Todd McDaniel_1
Honored Contributor

Re: HP-UX boxes machine hung

Sounds to me like a memory leak.

I have seen cases where memory causes issues like what you are seeing. Where the system runs fine for a while, then after a period of time, it locks up the system.
Unix, the other white meat.
Bill Hassell
Honored Contributor

Re: HP-UX boxes machine hung

> All the machines mount the same NFS exports.

If something on the network clobbers the NFS server then every NFS client machine will immediately hang. Actually, not exactly hang -- programs that do not use the NFS mountpoints or scan all the filesystems (like bdf or login) will probably work OK.


Bill Hassell, sysadmin