IT Operations Management (ITOM)
Showing results for 
Search instead for 
Do you mean 

How to tame the HP Operations Manager message storms! – Part 1

GirishMatti ‎05-04-2015 07:02 AM - edited ‎06-11-2015 10:26 PM

Co-written by Tobias Mauch, a very senior and much respected engineer on the HP OM team.

 

Generally when a storm hits, you simply have to weather it and hope it does not inflict damages to your property. In the context of HP Operations Manager, these storms often consist of a huge number of messages or events that hit Operations Manager (HP OM) in a short period of time. The source of these messages or events is the HP Operations agent which is part of the infrastructure monitoring software. In many cases, these storms are trigged by events which are reporting the same failure. 

 

Any customer with a large installation of agents has potentially faced message storms/floods. As you know, the cost of handling and weathering such floods in terms of time and effort is quite costly.

 

Here are three easy methods to detect and prevent these storms. The first two approaches work on the HP OM server and the last one is provided by the HP Operations agent.

 

  1. Event Correlation Services (ECS) based message storm detection
  2. HP OMU 9.20 Event Storm Filter
  3. HP Operations agent Message Storm Suppression

 

In this blog, I will introduce the first approach, in the next two blog posts I will explain more about the other options.

 

 

Event Correlation Services based message storm detection.

 

In this method, Event Correlation Services (ECS) circuits are used to prevent message storms (either message based or policy-based). This approach has been around the longest.

 

Message storm detection/suppression is done on the management server by an ECS policy. You will need to enable output of all messages to the MSI in Divert mode for this and you will need to assign the ECS policy to the management server itself. The configuration, including defining the rate of incoming events and the interval, is performed by changing lines in the ECS fact store file for the ECS policy.

 

 Message flow scenarios:

 

Figure A : Message flow when suppression is enabled.

 

Possible message flows:

• Normal flow 1 -> 2 -> 3

• Flow when detecting a message storm 1 -> 2 -> 4 -> 5 -> 6 -> 7

• Flow after a message storm 1 -> 2 -> 3 & 3 -> 8 -> 9

 

   

  

  

 

 

 

  

 

 

Figure A

 

Figure B: Message flow when suppression is enabled.


  Possible message flows:

• Normal flow 1 -> 2 -> 3

• Flow when detecting a message storm 1 -> 2 -> 4 -> 5 -> 6 -> 7 & 2 -> 10

• Flow after a message storm 1 -> 2 -> 3 & 3 -> 7 -> 8

 

In addition to the steps described for ‘‘Suppression enabled’’, step 10 is performed where messages are sent to the message browser even when a message storm has been detected.

 

  

 

 

 

 

Figure B

  

You can configure the circuit so that it does not send the messages that are received by the management server to the message browser until the message storm is stopped. (Note that for the policy-based message storm: it is also possible to create exceptions, so some policies, nodes, or combinations of both are never disabled.)

 

 

There are two ECS circuits to choose from:

 

a) MsgStorm_Dectect : ECS policy will suppress messages if the number of messages from a particular node crosses the configured limit.

By default, the ECS policy will create an automatic action that will stop the agent on the affected managed node—but you can configure the action to do nothing.

 

b) PolicyStorm_Dectect : ECS policy will suppress messages if the number of messages from a particular policy on a managed node crosses the configured limit.

By default, this ECS policy will create an automatic action that will disable the affected policy on the managed node—but you can configure the action to do nothing.

 

For more information on this method, read the Message-Storm Detection White Paper here.

 

For more information on the ECS itself, you can find more information here:

 

For more information on how HP Operations Manager can help you with infrastructure monitoring visit with the product home page here.

 

 

About the Author

GirishMatti

Comments
INOC - Network Operations Center
on ‎05-11-2015 01:10 AM

Every situation definitely varies and one has to really assess the issue first before choosing  a preferred method that would potentially taem HP Operations messages. These are very easy-to-follow methods, however. Thank you for sharing!

GirishMatti
on ‎05-11-2015 01:29 AM

Agree with you, in our next posts we will point out two more methods for solving this problem.

Thanks for your feedback.

Events
June 6 - 8, 2017
Las Vegas, Nevada
Discover 2017 Las Vegas
Join us for HPE Discover 2017 in Las Vegas. The event will be held at the Venetian | Palazzo from June 6-8, 2017.
Read more
Apr 18, 2017
Houston, TX
HPE Tech Days - 2017
Follow a group of tech bloggers for a new HPE Tech Day, a full day of sessions about how to create a hybrid IT, from hyperconverged to Composable Infr...
Read more
View all
//Add this to "OnDomLoad" event