ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

DL180 - SS 8.70 - Management Agents SNMP (6039) (6032) (6034)

DL180 - SS 8.70 - Management Agents SNMP (6039) (6032) (6034)

 

We have about 29 of these servers running amuck with false error messages.

 

The case for them being false is very strong, they are distributed over 250,000 sqr miles and are very secure.

 

They are randomly emitting SNMP Traps in large batches which get sent to system admins wasting their time confirming they are false.

 

The problems seem to emerge from the Insight Management Agents monitoring the server hardware, specifically the "Server Agents".

 

This is on a Windows 2008 R2 x64 set of systems, with the 8.70.0.0 Management Agents.

 

There is a control panel for managing and enable/disabling the components.

 

We are considering disabling the broken component since monitoring the Storage arrays and their hard drives is of most concern. Eventually it would be nice to have "accurate" fans and power supply information from the agents, but they are serving no purpose at this time since the information must be discarded and assumed false based on the volume of output.

 

However there is not a one to one Mib to Agent correspondence. And the Windows Event logs say "Server Agents" created the event.. but the Management Agents are not "named" that way in the enable/disable list of components that can be disabled. Trial and error seems the only method we can use at this time.

 

Here is the data collected so far on the problem:

 


ProLiant DL180 G6
System ROM Firmware-020 (active) 2011.01.24
product id: 507168-B21

Lights Out 100
IPMI 2.0
Firmware 4.21


Thu 6/30/2011 5:07 AM : (SNMP)  Fan Removed (6039): HP SIM Alert
Thu 6/30/2011 5:07 AM : (SNMP)  Fan Removed (6039): HP SIM Alert
Thu 6/30/2011 5:07 AM : (SNMP)  Fan Removed (6039): HP SIM Alert
Thu 6/30/2011 5:07 AM : (SNMP)  Fan Removed (6039): HP SIM Alert
Thu 6/30/2011 5:07 AM : (SNMP)  Fan Removed (6039): HP SIM Alert
Thu 6/30/2011 5:07 AM : (SNMP)  Fan Removed (6039): HP SIM Alert
Thu 6/30/2011 5:07 AM : (SNMP)  Fan Removed (6039): HP SIM Alert
Thu 6/30/2011 5:07 AM : (SNMP)  Fan Removed (6039): HP SIM Alert
Thu 6/30/2011 5:07 AM : (SNMP)  Fan Removed (6039): HP SIM Alert
Thu 6/30/2011 5:07 AM : (SNMP)  Power Redundancy Lost (6032): HP SIM Alert
Thu 6/30/2011 5:07 AM : (SNMP)  Fan Removed (6039): HP SIM Alert
Thu 6/30/2011 5:07 AM : (SNMP)  Fan Removed (6039): HP SIM Alert
Thu 6/30/2011 5:07 AM : (SNMP)  Fan Removed (6039): HP SIM Alert
Thu 6/30/2011 5:07 AM : (SNMP)  Fan Removed (6039): HP SIM Alert
Thu 6/30/2011 5:07 AM : (SNMP)  Power Redundancy Lost (6032): HP SIM Alert
Thu 6/30/2011 5:07 AM : (SNMP)  Power Supply Removed (6034): HP SIM Alert


HP Insight Management Agents for Windows Server 2003/2008 x64 Editions 8.70.0.0
HP Insight Diagnostics Online Edition for Windows Server 2003/2008 x64 Editions 8.7.0.3946
HP ProLiant 100-Series Management Controller Driver for Windows Server 2003/2008 x64 edition 1.4.0.0
HP Insight Management WBEM Providers for Windows Server 2003/2008 x64 Editions 2.8.0.0


Windows 2008 R2 x64

- <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
- <System>
  <Provider Name="Server Agents" />
  <EventID Qualifiers="33845">1126</EventID>
  <Level>3</Level>
  <Task>4</Task>
  <Keywords>0x80000000000000</Keywords>
  <TimeCreated SystemTime="2011-06-30T10:07:05.000000000Z" />
  <EventRecordID>26674</EventRecordID>
  <Channel>System</Channel>
  <Computer>server.compromised.agents.com</Computer>
  <Security />
  </System>
- <EventData>
  <Data>0</Data>
  <Data>1</Data>
  <Data>!=</Data>
  <Data>0</Data>
  <Data>1</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Binary>5D009D000001030202000000010000000000000000000000EE0200000000000000000000000000000000000000000000000000000000000000000000003541514E42304234445A4937315A000000000000000000000000000000000000000000000001020000000000000000000000003531313737382D3030310000000000000000000000000000000000000000000000556E6B6E6F776E00010000000200000000000000000000005058130000000000010000000000000000000000660435840700010001000000FF000000E0AC2F00000000009D005D005E00FFFFFEFF000000000000000000000100000000000000000000000000000000000000000001000100FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF0100020000000100000000000100010000000000000000000000000000000000000000000000020000000000000060962F0000000000</Binary>
  </EventData>
  </Event>

- <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
- <System>
  <Provider Name="Server Agents" />
  <EventID Qualifiers="33845">1128</EventID>
  <Level>3</Level>
  <Task>4</Task>
  <Keywords>0x80000000000000</Keywords>
  <TimeCreated SystemTime="2011-06-30T10:07:05.000000000Z" />
  <EventRecordID>26675</EventRecordID>
  <Channel>System</Channel>
  <Computer>server.compromised.agents.com</Computer>
  <Security />
  </System>
- <EventData>
  <Data>2</Data>
  <Data>2</Data>
  <Data>=</Data>
  <Data>0</Data>
  <Data>2</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Binary>5D009D00000202020100000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000556E6B6E6F776E00010000000000000000000200000000000000000000000000020000000000000000000000680435840700000001000000FF0000007DAD2F00000000009D005D000200FFFFFEFF020000000000000000000100000000000000000000000000000000000000000001000100FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF01000200000001000000000001000100000000000000000000000000000000000000000000000200000000000000B03F300000000000</Binary>
  </EventData>
  </Event>

- <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
+ <System>
  <Provider Name="Server Agents" />
  <EventID Qualifiers="33845">1133</EventID>
  <Level>3</Level>
  <Task>4</Task>
  <Keywords>0x80000000000000</Keywords>
  <TimeCreated SystemTime="2011-06-30T10:07:05.000000000Z" />
  <EventRecordID>26676</EventRecordID>
  <Channel>System</Channel>
  <Computer>server.compromised.agents.com</Computer>
  <Security />
  </System>
- <EventData>
  <Data>2</Data>
  <Data>2</Data>
  <Data>=</Data>
  <Data>0</Data>
  <Data>3</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Binary>5E00100000030302030102000100020100000000000002000000000000000000000000000300000000000000000000006D0435840700000001000000FF000000C08631000000000010005E000300FFFFFEFF020000000000000000000100000000000000000000000000000000000000000001000100FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF01000200000001000000000001000100000000000000000000000000000000000000000000000800000000000000B043300000000000</Binary>
  </EventData>
  </Event>

- <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
- <System>
  <Provider Name="Server Agents" />
  <EventID Qualifiers="33845">1133</EventID>
  <Level>3</Level>
  <Task>4</Task>
  <Keywords>0x80000000000000</Keywords>
  <TimeCreated SystemTime="2011-06-30T10:07:05.000000000Z" />
  <EventRecordID>26677</EventRecordID>
  <Channel>System</Channel>
  <Computer>server.compromised.agents.com</Computer>
  <Security />
  </System>
- <EventData>
  <Data>2</Data>
  <Data>2</Data>
  <Data>=</Data>
  <Data>0</Data>
  <Data>4</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Binary>5E00100000040302030102000100020100000000000002000000000000000000000000000400000000000000000000006D0435840700000001000000FF000000D08631000000000010005E000300FFFFFEFF020000000000000000000100000000000000000000000000000000000000000001000100FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF01000200000001000000000001000100000000000000000000000000000000000000000000000800000000000000B043300000000000</Binary>
  </EventData>
  </Event>

- <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
- <System>
  <Provider Name="Server Agents" />
  <EventID Qualifiers="33845">1133</EventID>
  <Level>3</Level>
  <Task>4</Task>
  <Keywords>0x80000000000000</Keywords>
  <TimeCreated SystemTime="2011-06-30T10:07:05.000000000Z" />
  <EventRecordID>26678</EventRecordID>
  <Channel>System</Channel>
  <Computer>server.compromised.agents.com</Computer>
  <Security />
  </System>
- <EventData>
  <Data>2</Data>
  <Data>2</Data>
  <Data>=</Data>
  <Data>0</Data>
  <Data>5</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Binary>5E00100000050302030102000100020100000000000002000000000000000000000000000500000000000000000000006D0435840700000001000000FF000000E08631000000000010005E000300FFFFFEFF020000000000000000000100000000000000000000000000000000000000000001000100FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF01000200000001000000000001000100000000000000000000000000000000000000000000000800000000000000B043300000000000</Binary>
  </EventData>
  </Event>

- <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
- <System>
  <Provider Name="Server Agents" />
  <EventID Qualifiers="33845">1133</EventID>
  <Level>3</Level>
  <Task>4</Task>
  <Keywords>0x80000000000000</Keywords>
  <TimeCreated SystemTime="2011-06-30T10:07:05.000000000Z" />
  <EventRecordID>26679</EventRecordID>
  <Channel>System</Channel>
  <Computer>server.compromised.agents.com</Computer>
  <Security />
  </System>
- <EventData>
  <Data>2</Data>
  <Data>2</Data>
  <Data>=</Data>
  <Data>0</Data>
  <Data>6</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Data>0</Data>
  <Binary>5E001000000603020301020001000201000000000000020000000000000000000000000006000000D0EF1300000000006D0435840700000001000000FF000000F08631000000000010005E000300FFFFFEFF020000000000000000000100000000000000000000000000000000000000000001000100FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF01000200000001000000000001000100000000000000000000000000000000000000000000000800000000000000B043300000000000</Binary>
  </EventData>
  </Event>

 

 

10 REPLIES

Re: DL180 - SS 8.70 - Management Agents SNMP (6039) (6032) (6034)

Update

 

It appears this is coming from the "health agent"

 

SNMP TRAP: 6039 in CPQHLTH.MIB

 

And was related to the ilo.sys driver when a similar problem appeared on the ML350 back in 8.30

 

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c01916283&jumpid=reg_R1002_USEN

 

 

NT Event ID: 1133 (Hex)0x8435046d (cpqsvmsg.dll)
Log Severity: Warning (2)
Log Message: Fan Removed. A hot-plug fan has been removed from the system.
SNMP Trap: cpqHe3FltTolFanRemoved - 6039 in CPQHLTH.MIB
Symptom: A Fault Tolerant Fan has been removed from the specified chassis and fan location.
Supporting SNMP Trap Data:
• sysName
• cpqHoTrapFlags
• cpqHeFltTolFanChassis
• cpqHeFltTolFanIndex
Supporting SNMP Trap Description: “The Fan Removed on Chassis [cpqHeFltTolFanChassis], Fan
[cpqHeFltTolFanIndex].”

 

NT Event ID: 1126 (Hex)0x84350466 (cpqsvmsg.dll)
Log Severity: Warning (2)
Log Message: The Fault Tolerant Power Supply Sub-system has lost redundancy. Restore power or replace any
failed or missing power supplies.
SNMP Trap: cpqHe3FltTolPowerRedundancyLost - 6032 in CPQHLTH.MIB
Symptom: The Fault Tolerant Power Supplies have lost redundancy for the specified chassis.
Supporting SNMP Trap Data:
• sysName
• cpqHoTrapFlags
• cpqHeFltTolPowerSupplyChassis
Supporting SNMP Trap Description: “The Power Supplies are no longer redundant on Chassis
[cpqHeFltTolPowerSupplyChassis].”

 

NT Event ID: 1128 (Hex)0x84350468 (cpqsvmsg.dll)
Log Severity: Warning (2)
Log Message: Fault Tolerant Power Supply Removed. A hot-plug fault tolerant power supply has been removed
from the system.
SNMP Trap: cpqHe3FltTolPowerSupplyRemoved - 6034 in CPQHLTH.MIB
Symptom: A Fault Tolerant Power Supply has been removed from the specified chassis and bay location.
Supporting SNMP Trap Data:
• sysName
• cpqHoTrapFlags

• cpqHeFltTolPowerSupplyChassis
• cpqHeFltTolPowerSupplyBay
Supporting SNMP Trap Description: “The Power Supply Removed on Chassis [cpqHeFltTolPowerSupplyChassis],
Bay [cpqHeFltTolPowerSupplyBay].”

 

The file [ cpqsvmsg.dll ] is associated in the Microsoft Windows Event ID and SNMP Traps Reference Guide  with the [ Server agent ].

 

Agent descriptions
• Foundation/Host Agent—cpqhsmsg
• Sever Agent—cpqsvmsg
• Storage Agent—cpqstmsg
• NIC Agent—cpqnimsg

 

 

Re: DL180 - SS 8.70 - Management Agents SNMP (6039) (6032) (6034)

 

A solid update.

 

Began re-moving the [Insight Management Agents] and re-discovering them in HPSIM. WBEM providers were in place from the initial SmartStart or PSP installs.

 

Twenty servers yesterday, thirty more servers today, fifteen more tomorrow... this is small scale testing, if it works we'll deploy the procedure wider.

 

So far "crossed fingers" there have been no false positive error messages.

 

Learned a few things too.

 

WBEM pronounced "WB-EM" for "Web-Based"  "Enterprise-Mangement" requires accurate authentication credentials for discovery to work. I used Security - Global to set the domain\username and password for an Active Directory domain account with administrator rights to the systems. A normal user account did not work.

 

The SNMP discovery steps still detected the SNMP service, but because its namespace was not extended with the Insight Mangement Agents (SNMP Foundation Agents) they were classified as insufficient to declare the system "manageable". After these steps the WBEM steps began and "failed", but then succeeded when WBEM with a WMI proxy was used... one of the step further down the discovery rungs of the testing ladder.

 

The systems were discovered and registered and no warnings reported during the registration.

 

Another thing I learned was it was helpful to "re-discover" the Mangement Processors for the servers just before re-discovering the servers primary agent address. These always produce yellow cones (or triangles) warning that minor things occurred during registration.. but are of no consequence. After this the servers were rediscovered and produced solid green globes of success.

 

It takes a while doing this, but none are on the same subnet, and collections of individual servers from different subnets are not a discoverable "range" as far as I can tell.. it would be a nice feature to add to HPSIM.

 

The servers are all over the State of Texas on various private and public networks in hostile and friendly regions of netspace.

 

Came to this procedure after pouring over six years of release notes, guides and user manuals for the 100 series, 300 and 400 series servers and rather conflicting opinions in the documentation over how various servers "should be supported" after the G6 generation.

 

The best "guess" I came to was that the SNMP agents were not as well tested and would not be going forward in deference to the preference for WBEM. And since WBEM is HTTP(S) protected would be a more secure method of managing servers than the generally unauthenticated plain text methods available when using SNMP. As far as I know SNMP v3 is not supportable under HPSIM, only v1 and v2c

 

We still have a lot of Linux and Unix servers, and some Mac servers, so SNMP isn't totally out of the picture.. WBEM isn't as well supported on those platforms.. one curious document said specifically openWBEM "is not supported" but on those its easy enough to hi-jack an OID manually and "insert" a script or static value.

 

 

Re: DL180 - SS 8.70 - Management Agents SNMP (6039) (6032) (6034)

 

Had a distraction today that delayed re-discovering more servers. RODC domain controllers are brand new and have their instabilities, I'd recommend caution if you deploy these, even on a small scale.

 

Did not have any of the servers that had the [Insight Management Agents (SNMP)] removed report false positives in the last 48 hours. A huge improvement. Did have one storage event reporting a failed drive in a RAID5 array, which is encouraging because it was reported using WBEM alone.. so that can be handled proactively.. which is the intent of using HPSIM.

 

About 3 AM however got about 100 false postives from a single server that still had the [Insight Management Agents (SNMP)] still loaded... at least it was about 100 when I was rousted from sleep by all the alert chimes from my phone. The error numbers appear to have changed, but the error descriptions are the same.

 

I've yet to investigate, but suspect the agents were updated, and the server rebooted due to WSUS scheduled updates from the recent Microsoft patch Tuesday event.

 

(SNMP)  Power Supply OK (6048)

(SNMP)  Power Supply Inserted (6033)

(SNMP)  Fan Inserted (6038)

(SNMP)  Power Redundancy Restored (6054)

 

The first thing I did at 3:15 am was remove the [Insight Management Agents (SNMP)] from the server and went back to sleep. Have not received an alert from it since.

 

The server is a ProLiant DL 180 G6

 

No corresponding WBEM events were logged in the same time frame, so I don't trust the reliability of the [Insight Management Agnets (SNMP)] on the DL 180 G6 any more.

 

 

 

AStackpole_USCOURTS
Occasional Visitor

Re: DL180 - SS 8.70 - Management Agents SNMP (6039) (6032) (6034)

I have been having the exact same issue with SMH (8.7) alerts about power supply redundancy on DL180 G6 servers running Red Hat Enterprise Linux 5.6. THe alerts only come from some of the servers but not every one that we have. I definitely need SMH/HP-Health on the servers to monitor disk and controller health but would like to stop getting erroneous messages. Will try un-installing and install latest versions of utilities.

Ralph Frampton
Frequent Advisor

Re: DL180 - SS 8.70 - Management Agents SNMP (6039) (6032) (6034)

In my experience:

  • Yes it's annoying but these may not necessarily be erroneous alerts from the DL 180 G6 and could be by design (anyone from HP care to comment?). These alerts on our system(s) of this type only occur when we reboot the servers, it appears that these servers dynamically re-discover their hardware during the boot-up stage again.
  • WBEM is getting better but may not show you the whole picture of what is wrong with server (eg. Int. Maintenance Log major status may not show on System Mgmt Homepage if you aren't using SNMP mode).

I'm guessing you are using a system default email event handler, have you considered creating your own custom event handlers (you can pick/choose which alerts should be shown/processed)?

  • We manage servers all across Canada & a few in US/UK & I have an email event handler for each site, primarily so we can turn email alerts off for each site individually during known maintenance periods.
  • Each of these site handlers use the same custom event collection which has been adjusted for hardware failure events since that is what most matters to us for after hours reporting.
  • This way we get emailed about the critical hardware failures as soon as possible & otherwise the informational alerts are simply tracked in HPSIM for our review later when we are so inclined.

RF

 

 

 

CRidgley
Occasional Advisor

Re: DL180 - SS 8.70 - Management Agents SNMP (6039) (6032) (6034)

HP ProLiant 100-Series G6/G7 Servers - Windows Server May Display False Power Supply Status Messages. There will be a new document coming out that addresses the problem. There was a document that addresses the problem in Red Hat linux: http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c02499696&jumpid=reg_R1002_USEN  but we have found that the same fix works for Windows Server. In the mean time this is the fix:

download and install SoftPaq SP50423, CPLD Version 1.12 from the following ftp site:

ftp://ftp.hp.com/pub/softlib2/software1/pubsw-windows/p3231552/v64980. 

 

Any of the following ProLiant servers:

  • ProLiant DL160 G6 server.

  • ProLiant DL160se G6 server.

  • ProLiant DL160 G7 server.

  • ProLiant DL180 G6 server.

  • ProLiant DL180se G6 server.

  • ProLiant ML150 G6 server

 

Julie Barnes
Occasional Contributor

Re: DL180 - SS 8.70 - Management Agents SNMP (6039) (6032) (6034)

I've installed the softpaq, but still ocassionally receive the power supply error on a DL160 G6 Windows 2008 R2 server with only 1 power supply. Curious is this fix did not work for anyone else. In the meantime, we will just ignore the false error. Thanks!

pgustavsson
Frequent Advisor

Re: DL180 - SS 8.70 - Management Agents SNMP (6039) (6032) (6034)

Hi,

 

Have exactly the same issue on my Dl180 G6 , running HPC server 2008 R2 (x64)

 

I have just flashed the softpaq, and checking if this will solve the issue, however judging by the post above, it sounds unlikely.

 

i have two psu units in the server.

 

Will post back again if i see the issue in the future.

 

Any other new solutions to this issue out there?

 

Br

Patric

 

ryan_1212
Advisor

Re: DL180 - SS 8.70 - Management Agents SNMP (6039) (6032) (6034)

Did that update work for anyone? I am seeing this problem on a DL180G6 with PSP 9.0 on server 2008R2 SP1. I opened a support case with HP.

pgustavsson
Frequent Advisor

Re: DL180 - SS 8.70 - Management Agents SNMP (6039) (6032) (6034)

No, did not work for me.

 

I get the messages less frequently now, but around once every 7 days or so, i apparently loose both of my power supplies and all fans, but the server is still up and running.