cancel
Showing results for 
Search instead for 
Did you mean: 

HPUX Health Checks for script

likid0
Honored Contributor

HPUX Health Checks for script

Hi,

I know this topic has been dancing around before, and that you can use HP SIM or HP health checks tools.

With this in mind I'm working on a tool that checks the output of a script that we have that gathers info from hpux servers, for hardware/software configuration errors.

I allready have these mainly hardware related check/tests:
#Put the title in Errors.html
#Devices in No hardware state in ioscan
#Bad label lvlnboot
#unavailable VG/PV, stale PEs
#check failed disks in sas raid controller
#Check smartarray raid controller
#check for failed components in partstatus(cpu/mem/fans)
#CSTM memory.
#EMS LOGS Check for Critical/Warning
#Read/write errors CSTM disk.
#shutdown.log check for recent panics/machine checks/hpmcs/etc
# presence of files in /var/tombstones
#LAN:netfmt lost link errors
#LAN:Network devices with failed ports in configuration
#fcmsutil output check for topology, link speed, driver state and probably some link statistics (Loss of signal,etC)
#"olrad -q" output check for slot anomalies
#HDW:cprop checking for failing component status
#check errors in check_patch
#check filesets not in configured state
#LOG:/var/adm/syslog/syslog.log checking for different errors in vmunix

Can you please help me with other hardware related test/checks.

Also any Configuration health checking that you can think off would also go next.

Thanks for your help.

 

Windows?, no thanks
1 REPLY
Bill Hassell
Honored Contributor

Re: HPUX Health Checks for script

There are several sanity checks you should do for any server, especially to make sure it will reboot the next time:

  • From setboot, verify that the primary and alternate paths are valid.
  • Check the LIF area on boot disks
  • Is /stand almost full? (ie, less than 20 MB left)?
  • Does /stand have the current and previous vmunix kernels present and more than zero bytes?
  • Are the vmunix files type s800 or ELF-64?
  • Check for the ioconfig file in /stand and /etc
  • Check that rootconf file is valid:
  • ROOTCONF=/stand/rootconf
    MAGIC="$(xd $ROOTCONF | head -1 | awk '{print $2 $3}')"
    [[ $(echo "$MAGIC" | grep -c deadbeef) -ne 1 ]] &&
    ErrMsg "$ROOTCONF magic number wrong, should be deadbeef (hex)" "rootconf = $MAGIC (hex)"
  • Check that /stand/bootconf has both primary and alternate boot paths and are valid.
  • Check that dead gateway detection is disabled
CHECKDEADGW="ndd -get /dev/tcp ip_ire_gw_probe"
[[ $(eval "$CHECKDEADGW") -ne 0 ]] &&
      echo "DEAD GATEWAY detection is enabled\n  $CHECKDEADGW = 1"

For a very complete acceptance test script, see Dusan Baljevic's excellent script at:

http://www.circlingcycle.com.au/Unix-sources/HP-UX-check-OAT.pl.txt

It's in Perl, good coding structure and a few comments.

 



Bill Hassell, sysadmin