Server Management - Systems Insight Manager
cancel
Showing results for 
Search instead for 
Did you mean: 

SIM continues to flag problems after they are fixed (false positive alerts)

 
zeroagemain
Frequent Advisor

SIM continues to flag problems after they are fixed (false positive alerts)

Hi All,

Since upgrading to HP Insight Control 7.1.2 recently, we have alerts which we cannot clear from SIM.

 

They are nearly always related to genuine issues that have occurred and been fixed, but although SMH and iLO confirm full health has returned and IML has been marked as repaired, SIM continues to alert with the root cause often being “One or more POST errors occurred”. Clearing these events in SIM makes no difference as they re-appear, and even if they don’t they continue to say an alert is being received via iLO even though iLO reports health OK.

 

It’s possible the alerts are originating in the OS and iLO is just forwarding them (affects G1-G7, Windows and Linux) but regardless the issue is FIXED and we just end up with false positives in SIM, and it only started after we upgraded to 7.1.2. We currently have 4 false positives of this kind and we only upgraded a month ago.

 

We have a call open with HP who’s advice at the moment is to reboot each time it happens which we clearly can’t do once a system is back in production.

 

Anyone else seen this, or anyone have any ideas?

 

We are setup so OS and iLO alerts are both passed through iLO, so we can’t really turn off SNMP forwarding in iLO.

 

Regards,

ZaM

6 REPLIES
shocko
Honored Contributor

Re: SIM continues to flag problems after they are fixed (false positive alerts)

On the servers flagging these events, what is in the IML log? Any uncleared events?

If my post was helpful please award me Kudos! or Points :)
zeroagemain
Frequent Advisor

Re: SIM continues to flag problems after they are fixed (false positive alerts)

Hi Shocko,
As I mentioned "...although SMH and iLO confirm full health has returned and IML has been marked as repaired, SIM continues to alert.."

So there's nothing un-repaired or uncleared in IML on any of the affected servers.
shocko
Honored Contributor

Re: SIM continues to flag problems after they are fixed (false positive alerts)

Apologies, I should have read that  little bit closer. Can you verify with a network trace at the SIM server if indeed, the server is sending the alerts over the network? It might possibly something to do with the clearing down of the events not being fully committed to the DB. What is your DB size/type?

If my post was helpful please award me Kudos! or Points :)
zeroagemain
Frequent Advisor

Re: SIM continues to flag problems after they are fixed (false positive alerts)

Hi Shocko,

I'm pretty sure the alerts are coming over the network as if we delete the systems from SIM entirely and at a later date discover them from scratch the error eventually returns on the affected systems.

 

I don't know what DB we are running or the DB size, we just have a default install (IC 7.1.2 provides SQL 2008 R2 Express???). If anyone knows how to check the size that would be appreciated.

 

I can see plenty of people have had this issue in the past and it was always the HP OS agents at fault. As it occured immediately after a recent upgrade of SIM and HP agents I suspect we'll be stuck with it until we upgrade again.

 

Frustrating thing is that a reboot fixes it but we don't want to keep rebooting production systems simply to clear false positive SIM alerts.

 

 

 

 

zeroagemain
Frequent Advisor

Re: SIM continues to flag problems after they are fixed (false positive alerts)

The error is: "One or more POST errors occurred. Power On Self-Test (POST) errors occur during the server restart process. Details of the POST error messages can be found in Integrated Management Log"

 

It only happens (post IC 7.1.2 upgrade) on systems that HAVE had faults but the faults have been fixed and IML completely addressed so no issues remain, and it happens after the systems have gone back into production.

 

This info is also provided: "The associated MIB File Name for this trap is cpqhlth.mib and the MIB identifier CPQHLTH-MIB"

 

....but the SMH and iLO are completely clear of faults. It is only SIM that keeps reporting them faulty, presumably in response to keep receiving the above alert.

Michael Leu
Honored Contributor

Re: SIM continues to flag problems after they are fixed (false positive alerts)

This error state seems to only disappear if you reboot when the server has zero issues.

 

We get this this all the time with HP techs unplugging one disk prior to some application maintenance that includes a reboot, then going back into production and plugging the second disk back in at a later date.

 

I've never found a way to fix this without a reboot and it pisses me off quite a bit having all these minor health states on ProLiants I know are fine...