Operating System - HP-UX
1837574 Members
3298 Online
110117 Solutions
New Discussion

Re: K370 Repeatedly Crashes...

 
Binu Raj
Occasional Contributor

K370 Repeatedly Crashes...


Hi

We have one K370 development server which gives us lots of trouble.

The server does not respond suddenly and telnet rlogin and any other internet services, but ping responds.
The console just go blank. The LCD Panel displays F19F and
The top shows the system load as 75 and above and it goes up( If top was started from any telnet
session which were already active). The top display shows one user process occupying 95% of cpu.
But the users claim that this program is working perfectly alright in customer sites and there is no modification done on this
program for last 1 year.

We have to do a TC and restart the server, twice the server did not boot normally because /home and another FS got
corrupted. We were unable to mount the FS, when we try to mount this FS the system won't give the prompt back and it
gives the same symptoms described above. We kept this lv aside and never reused this space thinking it might be useful
to debug this. After few days it happend to another user filesystem.(each one of them in a different disk).

No HPMC error codes generated.

The core dump was taken after TC and was sent for analysis and I am waiting for the reponse.

Can anybody suggest how to track which process cause the system to go busy, please.


Hardware Configuration

Server K370 Single CPU
2GB RAM/ 1*18 GB root disk/1*36 GB (User data)

Software Configuration

B3901BA B.10.20.15 HP C/ANSI C Developer's Bundle for HP-UX 10.20 (S800)
B3913DB A.01.21.15 HP aC++ Compiler S800
B3919EA_AGP B.10.20 HP-UX 64-User License
B6267AA A.02.22 Unicenter TNG Framework For HP-UX 10.20_800
B7682AA B.10.20.HWE.3 Hardware Extensions 3.0 for HP-UX 10.20
B8342AA B.10.20.03 Netscape Communicator 4.72
HPUXEngCR800 B.10.20 English HP-UX CDE Runtime Environment
OnlineDiag B.10.20.18.13 HPUX 10.0 Support Tools Bundle
XSW800GR1020 B.10.20.49.3 General Release Patches for HP-UX 10.20 Servers (June 2000)
XSW800HWCR1020 B.10.20.49.3 Hardware Enablement and Critical Patches for HP-UX 10.20 Servers (June 2000)
#
# Product(s) not contained in a Bundle:
#

MQSERIES B.10.510.00 MQSeries for HP-UX
perl 5.6.0 perl
5 REPLIES 5
Berlene Herren
Honored Contributor

Re: K370 Repeatedly Crashes...

I would take a look at the nettlLOG to see if anything network-wise is logging there.

#netfmt -f /var/adm/nettl.LOG00 > /tmp/file.out

Check for duplicate MAC/IP, distaster messages at the time of lockout, etc.

Also, turn on verbose logging, with #inetd -l

Berlene
http://www.mindspring.com/~bkherren/dobes/index.htm
Rita C Workman
Honored Contributor

Re: K370 Repeatedly Crashes...

FxnF
(HP-UX) Indicates the system is running. An F in the first and fourth digits indicates the system is running normally. The x is updated every five seconds with the length of the run queue at that time (an instantaneous reading not an average). It indicates the number of processes. Loads higher than nine display as A. The n indicates the number of processors (1 or 2).

As for tracking which process...I would run glance and watch it very closely. At first I thought your fs problems were disk problems, but you say it's happening on other disks too...I agree you need to confirm there is nothing 'flakey' with your network, check your hosts file, etc. I think your looking at multiple problems, some not related..like your console blanking out. Check the connections and make sure it's solid.
For telnet issue...check hosts, nsswitch, nslookup and see how IP's are getting resolved. If telnets are delaying or timing out..try adding that IP to the hosts file and see if it picks up. Check the ntpy, etc parms and see if they need increased. Is it possible that more processes are being started than the system is set to handle...I would look at nfile (# open files allowed) and nproc (# active proc allowed). Might also want to look at how many files you have set on nflock (# locked files allowed).
Check the network for any possible problems.
...but (after being put thru wringer by a couple folks..who swore it couldn't be their app...) I would be looking hard there.
Sorry for rambling....just some thoughts,
/rcw
Steven Sim Kok Leong
Honored Contributor

Re: K370 Repeatedly Crashes...

Hi,

If you have q4, run this crash dump analysis tool on the core dump.

Hope this helps. Regards.

Steven Sim Kok Leong
Brainbench MVP for Unix Admin
http://www.brainbench.com
Steven Sim Kok Leong
Honored Contributor

Re: K370 Repeatedly Crashes...

Hi,

Just to add on. I had experienced on one occasion whereby my server crashed repeatedly. It never got to saving the entire crash dump before crashing again. That was a nightmare.

And it was all caused by a driver being inappropriately installed. After the driver was removed, everything went back to normal.

Hope this helps. Regards.

Steven Sim Kok Leong
Brainbench MVP for Unix Admin
http://www.brainbench.com
Binu Raj
Occasional Contributor

Re: K370 Repeatedly Crashes...

We Installed the latest patch bundle(General Release Patch Bundle - XSW800GR1020, March 2001) then and were observing the system for all these days.

To my surprice, the problem never occured again. I guss there were come critical OS fileset corruption etc or there was a bug with vxfs which got patched.

Thanks everyone for your replies.


Binu Raj