Operating System - HP-UX
1826605 Members
3204 Online
109695 Solutions
New Discussion

common system failure events

 
Mladen Despic
Honored Contributor

common system failure events

I need to make a list of the most common HP-UX
system failure events with brief descriptions
of common preventive and/or troubleshooting
strategies that minimize system and/or application
down time. Something like:

failure event solution strategies:
---------------------------------------------------------
disk failure 1. MirrorDisk/UX, RAIDs, etc.
2. STM
3. EMS
4. HP Predictive Support
5. ???

kernel parameter
limit reached 1. monitor system table
utilization using cron
2. Measureware alarms
3. ???

file system full 1. isolate critical directories
to separate filesystems
2. monitor file systems
using cron, or OpenView
3. ???

The list should be a lot longer, so I am trying to save
some time by finding a relevant web link or other
type of reference that can be used.

Any suggestions?









2 REPLIES 2
Steven Sim Kok Leong
Honored Contributor

Re: common system failure events

Hi,

I noticed that you have not included the system logs. Because the OS as well as most OS applications write to the system logs, system logs are useful in identifying the symptoms as well as troubleshooting the source of a system problem (including system crash).

Some of the important logs you may wish to check are:
- /var/adm/syslog/syslog.log
- /var/adm/syslog/OLDsyslog.log
- /etc/rc.log
- /etc/rc.log.old
- /etc/shutdownlog
- /var/adm/cron/log
- /var/adm/cron/OLDlog

In addition, system crash files and tombstones are also essential in identifying the cause of a system crash.
- /var/adm/crash
- /var/tombstones

Hope this helps. Regards.

Steven Sim Kok Leong
Brainbench MVP for Unix Admin
http://www.brainbench.com
Vincent Stedema
Esteemed Contributor

Re: common system failure events

file system full:

3. Use OnlineJFS to minimize impact.

Vincent