Operating System - Linux
1833016 Members
2239 Online
110048 Solutions
New Discussion

Re: proliant health monitoring

 
Matthias Eble
Occasional Advisor

proliant health monitoring

Hi all,

I have some questions about monitoring proliant health in linux. We have a couple of DL and BL Servers running linux. In the past, we have never installed the psp/agents, because we think that the installation procedure lets happen too many thing's we can't really follow (eg new modules, running processes, etc).

The hpsmhd looks nice, but I really dislike that kind of process since I can't really trust it.

My goal is to monitor the servers as well as possible using nagios.

For monitoring Power Supply, temperature and fans I installed the hpasm package.
The disks can be monitored using hpacucli or cmaeventd. So I wrote a wrapping perl script that executes hplog with some parameters. It also calls hpacucli and executes physicaldrive all show for every controller.

The big deficit of this solution is the need for root privileges. There is sudo of course, but I don't like the solution either.

Recently, we had a problem on a bl20pG2.
The only way to find the reason was to boot from a smartstart cd and run the diagnostics.

Can the diagnostics/test be run from a shell,too? Which packages are needed to do so?

Wouldn't it be great, if there was only one binary to query complete system health?

I'd really like to know how you monitor your server health.

Thank you very much
matthias
2 REPLIES 2
Rick Garland
Honored Contributor

Re: proliant health monitoring

Via the PSP.
This is integrated with HPSIM as well.

Ross Minkov
Esteemed Contributor

Re: proliant health monitoring

We install the PSP for Linux on all of our ProLiant/Linux servers. Part of the PSP for Linux is the hp Server Management Drivers and Agents (hpasm -- stands for hp Advanced System Management) package. hpasm is a collection of driver and tools which enable monitoring of fans, power supplies, temperature and other management events. This package includes the basic server support. ProLiant Servers are equipped with hardware and firmware to monitor certain abnormal conditions such as abnormal temperature readings, fan failures, ECC memory errors, etc. The Management Drivers and Agents monitor these conditions and notify the system administrator of abnormal conditions.

The following is a list of some features supported by hpasm:

- Monitoring abnormal temperature conditions
- Monitoring fan failures
- Monitoring the system Fault Tolerant Power Supply
- Monitoring ECC memory errors
- Automatic Server Recovery (ASR)

We also use Nagios for UP/DOWN state & remote service monitoring.

HTH,
Ross