Server Management - Systems Insight Manager
1752591 Members
2658 Online
108788 Solutions
New Discussion

Re: HPSIM 7.2 - Hardware traps not picked up

 
NAVCANADA
Occasional Contributor

HPSIM 7.2 - Hardware traps not picked up

Hi all,

 

  I am having an enormous amount of fun trying to get HPSIM 7.2 working as I would like it.

 

  The current needs is for HPSIM to simply monitor and send us alerts when there is a hardware failure on one of the monitored devices.

 

  At this point in time: HPSIM is not generating alerts when there is a piece of hardware that fails (HDD or P\S).  My SNMP is properly configured (I checked this config many times - community strings match, localhost & the CMS FQDN are listed as accepted hosts - services restarted all around).  

 

 My servers are all discovered properly (hardware info is all discovered...) - I get alerts when the monitored devices go off\online no problem.

 

  Generating SNMP test traps seem to be hit and miss.  They worked last week, but since Friday I get no joy; no test traps are received by HPSIM.

 

  My questions are:

 

 1.  Why would my SNMP tests be so flaky?  Any have any idea what could cause traps to work on one day, then not on the next?

 

 2. Anyone ever encounter hardware faults not generating alerts in HPSIM, but everything else pertaining to communication seem to work.

 

 

 Additional info:

 

  CMS - VMWare virtual server - 2008 R2 STD.  DB running on a SQL instance hosted on our SAN

  Test client - Physical Proliant DL360G6 - 2008 R2 STD - PSP 9.1 installed

 

 

17 REPLIES 17
shocko
Honored Contributor

Re: HPSIM 7.2 - Hardware traps not picked up

When you state 'no traps are being recieved' how have you verified this? Run a sniffer when generatung the test traps. Also, does the CMS servers have a single IPaddress?

If my post was helpful please award me Kudos! or Points :)
NAVCANADA
Occasional Contributor

Re: HPSIM 7.2 - Hardware traps not picked up

So, it ends up the traps are being received by the CMS.  They are just showing up as Critical\Major events.

 

An example is a I have a server with a failed HDD which should trigger a Critical alarm - but it is not.

 

When I view the impacted system in 'All Systems' - I see a major event in the HS column when I hover over it I get a pop up with 'minor event' that says: 

 

Drive Array-There is a minor problem that is causing limited interference.

 

It is the same for all systems and all events which should be critical\major except when a system goes offline (which means the PING events are properly handled).

LGentile
Trusted Contributor

Re: HPSIM 7.2 - Hardware traps not picked up

So, a general observation, because I have a similar issue with very specific events.

There is an event type used for certain hardware components that is called "Status Change". If you are not aware, you can alter the severity of every SNMP event within SIM (Tools - Events - SNMP Trap Settings). However, the "status change" events seem to have many values that have severities within that I cannot seem to change.

For example, this one: (SNMP) Accelerator Board Status Change (3038) - this one can be minor, critical, major depending on the type of status change. Generally, I am OK with this, if there was a way to alter the behavior. The severity for a "low battery charge" event is Minor. If the module fails, it is Critical. That's great by default, but what if you want to change those? I cannot figure out how. The end result is that there are few versions of this event that come through as a severity level that I don't want or do want. Our autoticketing system is defined to accept Critical/Major only, so we either miss or get too many of certain events since I can't change this behavior of the 3038 event. There is a hard drive event (3046) that has the same issue. I realize this may be by design, but not being able to adjust the individual value events is troublesome.

Now, that my not be your issue. I would suggest looking at the event details for the MIB that is handling the event in question, then to go the SNMP Trap Settings page and adjust the severity on the specific events you need to change. For example, if you want all HD/controller issues to generate Major severity events, just go to cpqida.mib and look for your event(s). For example, the tape cleaning events used to be major/critical - i made those all Informational...

So, two things - you can adjust the severity level for most event, but you may able be seeing the variance in the "status change" events that I cannot figure out how to adjust. I've opened the MIB files and i can't tell how SIM is determining the end severity result from the OID value.
NAVCANADA
Occasional Contributor

Re: HPSIM 7.2 - Hardware traps not picked up

Thanks for the info LGentile.  I did play with with the SNMP Trap Settings a few weeks ago - all the traps pertaining to HDD status change are set to Critical.  I just double checked them all.

LGentile
Trusted Contributor

Re: HPSIM 7.2 - Hardware traps not picked up

Can you paste the entire details of the events you are having trouble with? the SNMP event number, etc.. ?
Bendom
Frequent Advisor

Re: HPSIM 7.2 - Hardware traps not picked up

  My questions are:

 

 1.  Why would my SNMP tests be so flaky?  Any have any idea what could cause traps to work on one day, then not on the next?

 

It is standard behavior :) But there is always a reason, no matter how big or small in the end it always get sense.

 

 2. Anyone ever encounter hardware faults not generating alerts in HPSIM, but everything else pertaining to communication seem to work.


I am kind a confused in here. As it works for me SNMP traps are generated at managed system then send to HP SIM and HP SIM just hanlde them and send mail to our mail server. Check public/private permissions, ip/dns name where they are supposed to send SNMP traps. If it is ok they try to find why CMS refusing those traps or HP SIM dumps them -> firewall, open ports and i hp sim there option where you can set which traps you would like to receive and white/blacklist of subnets from where can be trap received.

 

In case you are able to get traps in HP SIM, and you are concerned only about event handling check options > Events there are maybe four options which are easy to understand ( processing a received trap).

 

If i get you wrong, and you are dealing with some other problem please let me know.

 

BR,

 

Martin

shocko
Honored Contributor

Re: HPSIM 7.2 - Hardware traps not picked up

Ok, so we know:

 

  1. that the server side of things (i.e...., the monitored servers are working fine. They are generating windows events in relation to hardware failures and sending an SNMP trap to SIM
  2. The issue seems to be that SIM is not handling these properly

So, for the time being I would ignore the overall hardware status of a machine in SIM. If we take a test system and do the following

 

  1. Pull a hard disk
  2. View the windows event local on the server indicating that the disk has failed
  3. On the CMS, open that server and look at the events list. There should be an event for the hard disk failure. What is it's criticality?

Now, if you have a email alert handler for that event/criticality, it should fire. I have seen issues in the past though with thousands of .xml files in the Traps folder under the SIM installation directory. Can you check this directory and clear down if there is a lot of .xml files in there ;)

If my post was helpful please award me Kudos! or Points :)
LGentile
Trusted Contributor

Re: HPSIM 7.2 - Hardware traps not picked up

shocko, where is this XML directory? I wasn't aware of it myself.

Thanks
shocko
Honored Contributor

Re: HPSIM 7.2 - Hardware traps not picked up

As stated above, in the 'Traps' folder under the SIM installation directory.

If my post was helpful please award me Kudos! or Points :)