Server Management - Systems Insight Manager
1833784 Members
2177 Online
110063 Solutions
New Discussion

Re: Predicitve Failure Notification

 
Glen R Martin
Occasional Contributor

Predicitve Failure Notification

We are running HP SIM 5.1. We only have four servers at this location, but it has been worthwhile to set up specifically for the alerting on hardware failures.

I recently found that one of my servers (ML370 G4) has a predictive failure on one of the hard drives. I discovered this accidentally while in the server room (orange light flashing on the drive - which is behind the door of the server in a tower configuration). The problem is, SIM does not alert on this condition, as it is considered a "Minor Event". I have alerting turned on for Major and Critical Events, but I have not included Minor Events because of the level of noise produced. Does anyone have a creative way to catch these predictive failures, while still keeping the noise level down? For example, filtering the Minor Event capturing to specific machines, or possibly (but less preferable) filtering it to specific events? Of course the second option means you have to know what you specifically want to capture, which usually happens after the first crisis you encounter :>)
7 REPLIES 7
Rancher
Honored Contributor

Re: Predicitve Failure Notification

You can set the trap for the hard drive predictive failure to critical, and then you will get notifications. I have done this for a few different things.
Glen R Martin
Occasional Contributor

Re: Predicitve Failure Notification

Thanks for the quick answer! I was not aware you could do that. How do you change the status level of a trap?
Rancher
Honored Contributor

Re: Predicitve Failure Notification

You are very welcome!

We also have ML370 G4s, so hopefully this will be the correct trap for you. If not, your system logs will tell you which MIB and which trap you want to change and you can use the same procedure.

In SIMS, go to Options, Events, SNMP Trap Settings.
Click the drop down box by MIB name and select cpqida.mib
CLick the drop down box by Trap name and select (SNMP) Physical Drive Status Change (3046)
In the Severity box, select critical.

This is what I have done to get alerts.
Glen R Martin
Occasional Contributor

Re: Predicitve Failure Notification

Sorry for the rookie questions...you are already far deeper into SIM than I have ever delved :>) I have a basic understanding of MIB concepts, but no experience working with them at this level.

I have followed your directions, but when I get to the Trap Name drop down, I don't have any entries that start with SNMP. What I have is numerous cpqDA(x) entries, where (x) is either 2-7, or there is a number of entries with cpqDA with no number after. So there are a number of "PhyDrvStatusChange" entries, but all of these already appear to be set to "Critical".

What you indicated (Physical Drive Status Change (3046)) shows up in the field below Trap Name titled "Event Type". The number (3046) is different for each of the cpqDA entries.

I have double checked the basic alerting functionality by sending a test trap, and it does appear to be working. I have the SNMP configured, and I do occasionally receive other alerts.

Thanks again for the help.
Glen R Martin
Occasional Contributor

Re: Predicitve Failure Notification

I still haven't got a solution for this - anyone?
Glen R Martin
Occasional Contributor

Re: Predicitve Failure Notification

I am still waiting for a final answer to this. It's killing me that I am 90% of the way there. Can anyone provide the answer to my last question to Rancher?

Thanks.
marsh_1
Honored Contributor

Re: Predicitve Failure Notification

hi,

did you check the logs as rancher suggested to identify the mib / trap involved ?