ProLiant Servers (ML,DL,SL)
1752278 Members
5144 Online
108786 Solutions
New Discussion

ILO2 MIBs and SNMP Alert's IODs

 
AlonsoRojas
Occasional Advisor

ILO2 MIBs and SNMP Alert's IODs

Hi All

I was wondering if any one knows the IODs for the ilo2 alerts listed in the iLo2 User Guide page 66, these alerts are

 

ALERT_TEST

ALERT_SERVER_POWER

ALERT_SERVER_RESET

ALERT_ILLEGAL_LOGIN

ALERT_LOGS_FUL

ALERT_SECURITY_DISABLED

ALERT_SELFTEST_FAILURE

ALERT_SECURITY_ENABLED

ALERT_THRESHOLD_BREACH

 

For example, as per configuring the ilo for SNMP alerts and testing I know ALERT_ILLEGAL_LOGIN corresponds to OID 1.3.6.1.4.1.232.9.2.2.9  but I need the full list and there are some I can't test.

 

No HP MIB kit or MIB's website seem to provide any information on the actual IOD numbers

Thanks for the help

 



 

6 REPLIES 6
Matti_Kurkela
Honored Contributor

Re: ILO2 MIBs and SNMP Alert's IODs

A search on OID 1.3.6.1.4.1.232.9.2.2.9 at mibdepot.com indicates this OID seems to belong to CPQSM2-MIB, which is included in the HP MIB kit.

 

But it does not seem to be a trap OID: in symbolic form, it's a SNMP integer object named ...enterprises.compaq.cpqSm2.cpqSm2Component.cpqSm2Cntlr.cpqSm2CntlrSelfTestErrors, which is a bit field of various iLO self-test error flags.

 

The trap OID corresponding to ALERT_ILLEGAL_LOGIN would seem to be .1.3.6.1.4.1.232..9003, i.e. enterprise .1.3.6.1.4.1.232, trap code 9003. This trap information (and 15 other traps relevant to iLO and its predecessor Remote Insight) is also listed at the end of the CPQSM2-MIB.

 

Are you sure you (or the tool you used) decoded the trap message correctly?

MK
AlonsoRojas
Occasional Advisor

Re: ILO2 MIBs and SNMP Alert's IODs

Hi Matty, thank you for answering


I agree that the IOD I presented as an example points to MIB CpqSM2.MIB and it has a trap code of 9003.  My concern is that I used two tools: snmputil and WireShark for the traces, both of them point to  1.3.6.1.4.1.232.9.2.2.9 which like you said it seems to be  cpqSm2CntlrSelfTestErrors

But If I look for the description instead "pqSm2CntlrBadLoginAttemptsThresh" I get  1.3.6.1.4.1.232.9.2.2.14  so we see .14 instead of .9, so that confuses me. (http://oid-info.com/get/1.3.6.1.4.1.232.9.2.2.14)


Matti_Kurkela
Honored Contributor

Re: ILO2 MIBs and SNMP Alert's IODs

In a trap message, the main meaning is conveyed by the combination of the "enterprise" and "trap code" fields. All the other OIDs are just optional added information.

 

cpqSm2CntlrBadLoginAttemptsThresh (OID 1.3.6.1.4.1.232.9.2.2.14 ) is a SNMP variable that tells you how many bad login attempts the iLO firmware must see before the "bad login" trap (properly identified as cpqSm2UnauthorizedLoginAttempts) is sent.

 

If e.g. cpqSm2CntlrBadLoginAttemptsThresh value is 2, the iLO user is allowed to make a typo in username or password twice without sending the alarm; only the third login attempt (respectively) causes a trap message to be sent. This may be useful if you want to avoid nuisance alarms caused by human error.

 

The name of the trap 9003 in the MIB file is cpqSm2UnauthorizedLoginAttempts.

The description of the trap in the MIB file is:

cpqSm2UnauthorizedLoginAttempts TRAP-TYPE
  ENTERPRISE compaq
  VARIABLES  { sysName, cpqHoTrapFlags, cpqSm2CntlrBadLoginAttemptsThresh}
  DESCRIPTION
     "Remote Insight/ Integrated Lights-Out Unauthorized Login Attempts.

     The Remote Insight/ Integrated Lights-Out firmware has detected
     unauthorized login attempts."
  ::= 9003

 

The "VARIABLES" field indicates the SNMP variables that may be attached to this trap message as a pertinent information. With this particular trap, it includes:

  • sysName (standard OID 1.3.6.1.2.1.1.5, i.e. the name of the system that originated the trap),
  • cpqHoTrapFlags (described in CPQHOST-MIB: OID 1.3.6.1.4.1.232.11.2.11.1, looks unimportant for this particular trap)
  • cpqSm2CntrlBadLoginAttemptsThresh (OID 1.3.6.1.4.1.232.9.2.2.14)

In plain language, the trap could be interpreted as: "Hey, I'm <sysName> and you've asked me to tell you whenever there is <cpqSm2BadLoginAttemptsThresh> or more consecutive failed logins. I have that situation now."

 

 

According to the CPQSM2-MIB, the only trap that includes 1.3.6.1.4.1.232.9.2.2.9 (cpqSm2CntrlSelfTestErrors) as a variable is trap 9005, aka cpqSm2SelfTestError. Here is its description in the MIB file:

  cpqSm2SelfTestError TRAP-TYPE
      ENTERPRISE compaq
      VARIABLES  { sysName, cpqHoTrapFlags, cpqSm2CntlrSelfTestErrors }
      DESCRIPTION
          "Remote Insight/ Integrated Lights-Out Self Test Error.

          The Remote Insight/ Integrated Lights-Out firmware has 
          detected a Remote Insight self test error."
      ::= 9005

 

Of course, it is possible that your particular iLO firmware version has a bug that causes it to attach a wrong variable to the UnauthorizedLoginAttempts trap, attaching the 1.3.6.1.4.1.232.9.2.2.9 instead of .14.

 

I'd recommend that you get a MIB browser utility of some sort. For example, this one seems to be free for personal use on Windows. I use Linux, so I use mbrowse instead. In my opinion, even just the consolidated view of the MIB tree that is offered by the MIB browser utilities is reason enough to install one and configure it with the applicable MIBs: it makes it much easier to find a particular OID and see its description. The ability to send SNMP requests and interpret the answers using a GUI is a good thing too.

MK
AlonsoRojas
Occasional Advisor

Re: ILO2 MIBs and SNMP Alert's IODs

Hi Matti and Forum

I totally agree with you on everything and appreciate you taking the time to answer and help. Most of the things you mention I have checked too. As you stated I see in the network trace that this trap has

Enterprise: .1.3.6.1.4.1.232
Specific-Trap: 9003
Variable Bindings:

 

  • sysName (standard OID 1.3.6.1.2.1.1.5, i.e. the name of the system that originated the trap),
  • cpqHoTrapFlags (described in CPQHOST-MIB: OID 1.3.6.1.4.1.232.11.2.11.1, looks unimportant for this particular trap)
  • cpqSm2CntrlSelfTestError (OID 1.3.6.1.4.1.232.9.2.2.9)  >> This one is an integer with a value of 3, which is the treshold configured in ilo for login attempts so that matches, even when the description doesn't.



I agree that we could have a bug in the ilo firmware (I have tested a couple but all could have the same thing). If an user a MIB Browser and load the MIBs like CPQSM2.mib downloaded from HP.com. I see what you saw too that 1.3.6.1.4.1.232.9.2.2.9 is for ControllerSelfTestError and 1.3.6.1.4.1.232.9.2.2.14 is for BadLoginAttemptsThresh

The interesting thing is that if I have the ilo send the trap to SIM (which uses the same CPQSM2.mib I loaded in the MIB browser), SIM will return the correct message about login failures and not about selft test errors.  This makes me believe that SIM only cares about the Enterprise ID and the Specific Trap in order to decode the message.


Whit this informtion I have to deduce the rest of the list.

11003 ALERT_TEST
9002 ALERT_SERVER_POWER
9001 ALERT_SERVER_RESET
9003 ALERT_ILLEGAL_LOGIN
9011 ALERT_LOGS_FULL
9012 ALERT_SECURITY_DISABLED
9005 ALERT_SELFTEST_FAILURE
9013 ALERT_SECURITY_ENABLED
11018 ALERT_THRESHOLD_BREACH

 I know all of them willl have at least the SysName and TrapFlags, but some have more variables like ALERT_THRESHOLD_BREACH that has 5 variables, and if the IOD on that other doesn't match the MIB list, I am afraid this variables could not match either:

&snmptrap_cpqPwrWarnType,

&snmptrap_cpqPwrWarnThreshold, 
&snmptrap_cpqPwrWarnDuration,

&snmptrap_cpqSerialNum, 
&snmptrap_cpqServerUUID, 


The thing is that I need all the correct OIDs in order to monitor them

Matti_Kurkela
Honored Contributor

Re: ILO2 MIBs and SNMP Alert's IODs

So the login threshold variable in the 9003 trap seems to have the correct data, just mislabelled with the wrong OID?

 

Then it would seem that the MIB has correct information after all: it tells you which variables to expect in each trap type, and maybe even the expected order of the variables in the trap message.

 

You can confirm this by doing a SNMP GET operation for the appropriate OIDs of the iLO:

  • OID 1.3.6.1.4.1.232.9.2.2.9 (cpqSm2CntrlSelfTestError) should have a value of 0
  • OID 1.3.6.1.4.1.232.9.2.2.14 (cpqSm2CntrlBadLoginAttemptsThresh) should have a value of 3

If that's true, the problem is a mislabeling of the variable in the trap 9003 message by the code responsible of sending that particular trap, and the trap variable list in the MIB file should have priority in interpreting the meanings of the trap variables.

 

On the other hand, if cpqSm2CntrlSelfTestError has a non-zero value, then the iLO is reporting a self-test error, and we would have to assume the fault might also cause the iLO to have errors in any data it sends out, including the trap messages. cpqSm2CntrlSelfTestError value 3 would indicate a combination of "Busmaster I/O read error" and "Memory test error".

 

> [...] SIM will return the correct message about login failures and not about self test errors.  This makes me believe that SIM only cares about the Enterprise ID and the Specific Trap in order to decode the message.

 

Yes, exactly. Once you've decoded the Enterprise ID and the Specific Trap value, you'll know the overall meaning of the trap, and can look up the expected variables from the trap definition in the MIB if you choose to do so. You could just use the MIB to translate the attached variable OIDs to human-readable names, attach the variable names & values to the monitoring alert as additional data, and let the human receiving the alert worry whether the variable data makes sense or not.

 

Or, for maximally robust trap decoding, you would need a per-sender configuration option in your trap receiver software, like: "If the variables in traps coming from IP address A.B.C.D don't exactly match the MIB definition, A) assume the MIB variable list is correct, B) assume the OIDs of the attached variables are correct?" 

 

> The thing is that I need all the correct OIDs in order to monitor them

 

When you have seen a mislabeled variable in one trap message, assuming that other traps might also have mislabeled variables would be robust programming. However, assuming that all the other traps definitely have mislabeled variables based on just one sample would be overly pessimistic, I think.

 

For example, according to the CPQHOST-MIB, the trap 11018 for Enterprise .1.3.6.1.4.1.232 ("compaq", now owned by HP) has proper name "cpqHo2PowerThresholdTrap" and it has 7 variables, not 5. Apparently the sysName and cpqHoTrapFlags are prefixed to the variable list for most traps of this Enterprise.

  • sysName (standard OID 1.3.6.1.2.1.1.5)
  • cpqHoTrapFlags (OID 1.3.6.1.4.1.232.11.2.11.1)
  • cpqPwrWarnType (OID 1.3.6.1.4.1.232.11.2.16.1)
  • cpqPwrWarnThreshold (OID 1.3.6.1.4.1.232.11.2.16.2)
  • cpqPwrWarnDuration (OID 1.3.6.1.4.1.232.11.2.16.3)
  • cpqSerialNum (OID 1.3.6.1.4.1.232.11.2.16.4)
  • cpqServerUUID (OID 1.3.6.1.4.1.232.11.2.16.5)

With the MIB browser, this information was really easy to look up: just find the trap object 11018 under the "compaq" enterprise branch, and then look at its attributes.

 

When your monitoring system receives this trap, your monitoring system should generally emit an alert like:

Power threshold exceeded at <trap_sender_IP> (<hostname by reverse DNS lookup etc.>). 
Additional information: 
- System name <sysName> - Warning type <cpqPwrWarnType> - Threshold level <cpqPwrWarnThreshold> - Condition existed at least for <cpqPwrWarnDuration> minutes - Server serial number <cpqSerialNum> - Server UUID <cpqServerUUID>

 

If some (or even all) variables cannot be found in the trap message, the alert should still be emitted, but with appropriate placeholders (like "[DATA_MISSING]" in place of each variable item that could not be found in the message.

 

Or if you want to be maximally informative, you might have your monitoring system send SNMP GET requests for the missing data items, since the MIB file tells you the OIDs you'd need.

 

Just don't let the GET requests delay the sending of the actual alert too much (definitely implement a timeout for the GET requests if you do this!). A timely alert with some data missing is usually better than all the information but delivered too late. If the trap message is about some critical situation, the trap message might be the last thing the system sends before doing a hard shutdown to prevent hardware damage, so the subsequent GETs might well fail.

 

MK
AlonsoRojas
Occasional Advisor

Re: ILO2 MIBs and SNMP Alert's IODs

Matti,

 

Once again we have reached similar conclusions, and I believe they are correct.  For example in the network trace for that trap  I get the 2 Variables that we know come all the time (Server ID, and Flags) plus the 1.3.6.1.4.1.232.9.2.2.9  with a value of 3. 

 

This is what I believe should be 1.3.6.1.4.1.232.9.2.2.14 with that value of 3, since just by looking at the ilo configuration one can tell that its default for 3 login attempts and then it will create the alert.  So I agree that it seems to be a simple miss labeling that SIM is able to overcome by using the Enterprise ID and the Trap Specific number, in this case 9003.

I also agree that is unrealistic to believe that all other traps will have miss labeled variables, and I believe this was the only one.  From the list of iLo traps I posted only 4 have more than the 2 known standard variables.  Out of those 4 traps I know 2 of them

 

ALERT_TEST with a matching OID of 1.3.6.1.4.1.232.11.2.8.1  and ALERT_ILLEGAL_LOGIN with the miss labeled  1.3.6.1.4.1.232.9.2.2.9. So being optimistic I can just form the two missing trap's variables from the information in the web.

 

Since I am unable to replicate the other two traps  ALERT_THRESHOLD_BREACH or ALERT_SELFTEST_FAILURE I will optimisticaly go with the information on the web and only time will tell that they are correct.

 

Thanks for all the help.