Operating System - HP-UX
1848179 Members
6080 Online
104022 Solutions
New Discussion

Re: Panic - causing reboots acroos multiple 11i systems

 
Scott McDade
Frequent Advisor

Panic - causing reboots acroos multiple 11i systems

I have 14 C3600's running 11i that are controllers for some semiconductor test equipment. They are setup so that all of the *.libs and *.sl for the application are using a single NFS mount. They are also setup using NIS across subnets. They have been running without incident for approx 16 months. No patches or S/W have been loaded since 6/1/03. Just the other day we started seeing the systems rebooting themselves. When I would look in /etc/shutdownlog it would tell me "01:38 Sun Sep 01 2003. Reboot after panic: Conditional trap" I checked the tombstone files and there was nothing in there that HP-support could use to identify the problem. HP has also reviewed "/var/adm/crash" files and the cause is inconclusive. The strange this is the systems are all identical and were created from Make_Recovery Images. 10 out of the 14 systems are all panic'ing at the same time every morning @ 1:38. I have disconnected one from the network and it still panic's. I have shutdown all applications prior to the 1:38 and they still panic. I am running out of ideas. The "#vmstat 30 1000" tells me I still have plenty of memory prior to the panic. I did notice the while the systems are running I hear the system beep every second or so. Any ideas as to what could be causing this panic or if anyone has seen this before?

-Scott
Keep it Simple!~
5 REPLIES 5
Robert Gamble
Respected Contributor

Re: Panic - causing reboots acroos multiple 11i systems

The first thing I would check is:

at -l # to check for at jobs
crontab -l # to check for cronjobs that kick off around 01:30-01:38

It's possible a normal cron script that runs at night became corrupted or was recently changed.

Hope this helps!
Steven E. Protter
Exalted Contributor

Re: Panic - causing reboots acroos multiple 11i systems

It is possible the critical system has a real crash. It might be time for crash dump analysis if there are files in /var/adm/crash

After analysis via the cookbook I'm attaching, a missing patch might need to be installed.

It could also be a hardware fault. Try dmesg on the boxes and look for problems like lbolts.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Massimo Bianchi
Honored Contributor

Re: Panic - causing reboots acroos multiple 11i systems

Other check i would do: is there any window box on the same network?

The work blaster can cause some network problem, like dce killed process of traffic jam on the net.

YOu said that you use NIS and NFS. problem on the net can cause such kill.

http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0xe7a81584b3a3d0409fb2d8520d6ee02a,00.html

http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0xb7e270a9afd29c48afc938ef5aa8a492,00.html

Massimo
Elena Leontieva
Esteemed Contributor

Re: Panic - causing reboots acroos multiple 11i systems

Scott,

You may want to check not just root cron jobs scheduled to run around 1:30 am, but also other users too, especially adm, etc.

Elena.
Steven Gillard_2
Honored Contributor

Re: Panic - causing reboots acroos multiple 11i systems

While I'm not usually an advocate of "shotgun patching", that would be my next step in this case because they're crashing so frequently. Download the latest support plus patch bundle and install it on one of the systems.

Are you running any 3rd party kernel modules? If the crash is happening there you'll need to contact the vendor because HP won't touch it.

Otherwise if they still crash after patching send the new /var/adm/crash files to HP and don't accept "inconclusive" as the answer... they have (used to have anyway!) expert centres and labs who do nothing else all day except read dumps and should at least be able to put in place an action plan to capture more information next time the problem happens.

Regards,
Steve