Server Management - Systems Insight Manager
1832608 Members
2109 Online
110043 Solutions
New Discussion

Re: Clearing related events

 
MattLavallee2
Frequent Advisor

Clearing related events

Someone else brought this up last year and never really got a solid answer.

Is there a way to configure event handling to clear previous related events (the same way that the "system is reachable" event clears "system is unreachable")?

For example, Link Up/Link Down, Power Redundancy Lost/Power Redundancy Restored, etc.

We get a lot of spurious events when a system is rebooted that leave "critical" events needing to be cleared.

Thanks,
-Matt
6 REPLIES 6
marsh_1
Honored Contributor

Re: Clearing related events

matt,

AFAIK this is not possible in hp sim, you tend to find this capability only in the 'pay for it' products such as HP OV, IBM Tivoli, BMC Patrol etc.
hope this helps.

MattLavallee2
Frequent Advisor

Re: Clearing related events

Just seems strange that they would knowingly have this functionality and intentionally withhold it.

I honestly wasn't aware that there was something comparable to SIM in the OV portfolio... do you know what the direct analog is? You'd think if that were the case they'd have ads all over SIM.

-Matt
marsh_1
Honored Contributor

Re: Clearing related events

matt,

Openview , Tivoli et al encompass a huge portfolio of management software , at the core of most of them are basic infrastructure monitoring tools using ping, snmp and agents just like SIM, but these tools go beyond this by being able to deduplicate and correlate events by system or service and provide root cause analysis of issues....but this could go on forever !! that's why you pay for them, there are integrations for HP SIM into HP OVO, Tivoli netcool etc some of them are smarter than others e.g acknowledging an event at the top level - say HP OVO filters down into HP SIM without having to do it separately.

have a look at the openview pages on hp's site.


good luck



David Claypool
Honored Contributor

Re: Clearing related events

You should use 'Suspend Monitoring' during system reboots; that is what it is for.

There is a critical difference in your examples above. System reachable/unreachable are the result of an HP SIM action; that is, based on something HP SIM does (in this case status polling) the event is generated. All of the others are the result of receiving an event in the form of an SNMP trap or WBEM indication. It would be a non-trivial job to keep up with all of the possible events and would require some AI to attempt to correlate them; this is clearly out of the scope of a product like HP SIM.
MattLavallee2
Frequent Advisor

Re: Clearing related events

Thanks for the response, David. I understand your intent with the proactivity of suspending systems, but there are grander cases where it's still impractical. For example, we recently had one of our redundant whole-room UPSes serviced, which caused Type 3, Type 4, Critical/Power Supply Failed, Critical/Power Redundancy Lost, and Informational/Power Redundancy Restored events for every server in the room.

Clearly, this is a cascading event, but we certainly don't want to suspend all monitoring for three hours because of one known issue. Ideally, SIM would clear the other events when the power to that PS was restored, which is a genuine representation of the current state. Instead, we had to manually go in and clear the individual events dogmatically.

The same could be said for switch servicing, VM migrations, etc.

-Matt
marsh_1
Honored Contributor

Re: Clearing related events

matt,


the example you give is catered for in the OV/TIVOLI/PATROL software setups where infrastructure is tied together as services cooperating with change management so that planned outages are automatically catered for, but these are expensive and complex software solutions and are generally only found at larger sites where the management overhead and associated personnel savings can cost justify the solutions.