Switches, Hubs, and Modems
1753716 Members
4524 Online
108799 Solutions
New Discussion юеВ

PCM Alerts

 
numberaut
Occasional Advisor

PCM Alerts

I'm using Procurve Manager version 3. Alerts used to work by looking for "device is unreachable" in the events, but for some reason, over the past two days, these are not being generated, and it's as if PCM doesn't even see devices are down -- it shows them green and good. I check the events and "device is unreachable" is not in it. What would cause this? I confirmed the events was not paused, firewall off on the Windows box, etc. I need to get this fixed as it is bad when IT doesn't know a site is down.
7 REPLIES 7
Trevor Commulynx
Regular Advisor

Re: PCM Alerts

What is your discovery cycle? PCM is going to register a device down it is non responsive to the discovery process and preset timeout etc.
Tore Valberg
Trusted Contributor

Re: PCM Alerts

Do you get any events at all? If not, Do you see any error messages related to Mysql in event viewer?

If so contact support to get it sorted,
numberaut
Occasional Advisor

Re: PCM Alerts

Trevor,

"What is your discovery cycle? PCM is going to register a device down it is non responsive to the discovery process and preset timeout etc."

I set PCM's "Device Status Pollings" interval to 5 minutes after a reply here suggested that when it was set to an hour. This did help and we received alerts in a timely manner afterwards. I just checked and it is still set to 5 minutes.
numberaut
Occasional Advisor

Re: PCM Alerts

Tore,

"Do you get any events at all?

Yes, for the site that went down on 10/7/2010 and then the other site that went down on 10/8/2010, routers and switches all show events on those days before and after they went down. For the 10/7/2010 site, I had someone there recycle the UPS (which resolved) and a switch there generated, "ColdStart - Device has crashed or power plug removed or SNMP entity is re-initializing itself." It generated other events prior to this that morning that were informational, minor and major. It simply did not generate a "device is unreachable" which the alerts are looking for.


"If not, Do you see any error messages related to Mysql in event viewer?"

I see nothing in any of the events on any device that show's the word "Mysql.' Does PCM use Mysql?


"If so contact support to get it sorted"

I will contact support today sometime. I have found the forums helpful too.
numberaut
Occasional Advisor

Re: PCM Alerts

Ok, I forgot that the event filter comes from the "Agent Groups" log, and not the drilled-down-to individual switch or router logs. When I check those on the 2 dates in question (10/7 and 10/8), 5 events show for one site. These are all informational. The 2nd event that day says, "end note unreachable warning." I get nothing for when it went down becoming unreachable. _And_ the filter text I use for device up is "device is reachable." I guess it is different from device to device? Note: I followed HP's docs/support on creating these notices.

For 10/8, I do see "device unreachable warning." But the filter text I created way back -- and that still works as I tested it on a test 2610 switch today -- is "device is unreachable."

So, part of the problem is it never generated any "unreachable" text on 10/7, and on 10/8, generated _different_ unreachable text. I suppose I just need to keep adding/changing the filter text until it catches all, but honestly, I just get the feeling PCM is not the best alerting system. I also have PRTG running, and it is solid, doing a much better job with sensors, and it doesn't fail. I'm going to add more sensors to PRTG and rely on it.
numberaut
Occasional Advisor

Re: PCM Alerts

Let me correct this: on 10/7 and 10/8, it did generate "unreachable" text, but, different text than the alert was setup with, and what the alert was setup with, still works on certain switches -- just not on the sites in question. Note: the 2 sites that went down have 2626 and 2608 switches, and 7102 routers. I'm not sure if this is a simple problem of older equipment, but it was my thought that PCM generates it's own unique events, not gleaning any event language from the remote devices. One of the 2 sites does have a single 2610-POE switch. The only different btwn it and my test switch is it is on an older revision.

I'm thinking of just changing the alert text to simply have "unreachable" for down and "reachable" for up, but I fear this will generate spam and the IT staff won't like that.... Still, getting too many alerts will be better than getting none....
numberaut
Occasional Advisor

Re: PCM Alerts

I worked at length today with Procurve support on this. Despite the fact that an alert was indeed sent in August from one of the sites, support tells me the problem is due to SNMP authentication on the branch office router. I pointed out that it had never authenticated before, and that I didn't mind as long as the alerts worked, which did indeed when a switch at the site went down in August. They went on to say that even if SNMP authentication is resolved, alerts could not be guaranteed until we purchase an agent per branch. We currently have one agent. That would mean purchasing at least six more agents, just to get alerts.

We can use PRTG alerts which rely on ICMP I believe, and work great, so that's the route we'll probably go.