Server Management - Systems Insight Manager
1751700 Members
5191 Online
108781 Solutions
New Discussion юеВ

Re: Multiple Email messages - where to start troubleshooting?

 
Mike Kanakos_1
Advisor

Multiple Email messages - where to start troubleshooting?

I have HP SIM running on a ML570 server (4 CPU's) and montioring 300 servers. I am receiving multiple bogus pages (dates of insight events are from april and may).

Where do I start troubleshooting?

Already checked SQLServer.exe in services and it is not at a high CPU level. But pages are being genreated now.

Any ideas?

18 REPLIES 18
Dan Lynch
Advisor

Re: Multiple Email messages - where to start troubleshooting?

I've had that problem and it's driving me nuts. I haven't found a root cause, but it was suggested by someone else in the forum to assign all events to a 'workaround' user, then filter out that user in the notification task. This has worked for me without incident thusfar. I have spent hours troubleshooting this and am now resigned to using the fix...
I did get this in the sql event log after this happened:

2004-05-28 01:44:51.97 spid10 WARNING: EC 56df23c0, 0 waited 300 sec. on latch 4e1b5db0. Not a BUF latch.
2004-05-28 01:44:51.97 spid10 Waiting for type 0x4, current count 0xa, current owning EC 0x271FD538.
2004-05-28 01:45:00.26 spid82 Time out occurred while waiting for buffer latch type 2,bp 0x170ac80, page 1:7748), stat 0xb, object ID 17:2:0, EC 0x62BAB538 : 0, waittime 300. Not continuing to wait.
2004-05-28 01:45:00.26 spid82 Waiting for type 0x2, current count 0x80002a, current owning EC 0x56DF23C0.

Mike Kanakos_1
Advisor

Re: Multiple Email messages - where to start troubleshooting?

I spoke with HP support today and they feel that the problem has to do with old events in the log. They suggested that I create a task to delete the old events after XX days.

Since I am on 4.0 and I am runnign MSDE on thr same box, we decided that since it's broke, let's start fresh.

I will be installing 4.1 tonight.

PS - HP says the problems still exists in 4.1 and th eonly fix is to delete old events. I'll let the group know how i make out.
Jeff Westwood
Frequent Advisor

Re: Multiple Email messages - where to start troubleshooting?

Do you use Microsoft Exchange?

We had similar problems that mysteriously stopped after we upgraded to Exchange Server 2003.

Jeff
Mike Kanakos_1
Advisor

Re: Multiple Email messages - where to start troubleshooting?

No exchange... Notes..
Mike Kanakos_1
Advisor

Re: Multiple Email messages - where to start troubleshooting?

Just here to give everyone an update. This issue is still live for me.

I am working with HP senior level support. They have been able to reproduce the problem from debug on my server.

They are writing some new code to patch the system which I guess would be included in an upcoming build of HP SIM.

This problem happened for me in SIM 4.0 and 4.1.
Jeff_335
Occasional Advisor

Re: Multiple Email messages - where to start troubleshooting?

Has there been an update to this? We have the same problem but we also get multiple pages. A pager is annoying enough with out getting 100 some irrelevant pages.
Rob Buxton
Honored Contributor

Re: Multiple Email messages - where to start troubleshooting?

It would be interesting to know what the conditions are that trigger this.
I've not seen it, so I'm wondering what's different between the sites that are and those that are not.
Jeff_335
Occasional Advisor

Re: Multiple Email messages - where to start troubleshooting?

We are running SIM 4.1 with Windows 2K and SQL 2k. Our development and stage servers alert through email and production servers page us. The automated tasks are based on the Server Role field. The times where we have been inundated with emails or pages, much more annoying, are as follows.

1) A server's role changes. If a server was dev and changes to production we get paged with every non-informational event (we only receive critical, major, and minor) that SIM has for that server in the database. Same happens with email if it was listed as production but is down graded to dev or stage.

2) The rules for the automated task change. If someone is added to the list for alerting they receive all the alerts the database has for all the servers that match that rule. Example, a new person joins the group and needs to be notified if something happens will receive all the notifications on all the servers that match that rule, prod or dev.

It├в s as though SIM reevaluates the rule against the database and decides that it missed sending out a bunch of alerts so it sends them at that time. It├в s extremely costly, and more annoying than you can imagine, to get paged on every event for every production server since SIM was up and running. We have no problems as long as the alerts are current and relevant, but when ever a change is made SIM is trying to send out over 100 alerts for a server that took place a month
Mike Kanakos_1
Advisor

Re: Multiple Email messages - where to start troubleshooting?

There is an update to this issue... I posted a response to a question that was similar to this one.. here's the link to the thread:

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=687690

Quick answer is that HP was eventually able to reproduce problem in lab. They wrote a software patch for Java. the patch was put in place about 2 weeks ago and so far everything seems fine. I would guess that patch will make it into a future release of SIM.

Some action would trigger a problem (like a change to one of the Auto Event handlers properties or disabling an alert group). Server events would also trigger the alerts as well. In the middle of the night, the server would go nuts. Somehow SIM would get caught in a loop and start sending out old alerts.

To clarify, admin would change a SIM setting or an event would happen (like server unreachable) and the software would get caught in loop and send out barrage of alerts. Alert dates went back as far as two months. We would get anywhere from 150 - 1000 alerts that forwarded to about 15 different two way pagers the admin staff uses. Do the quick math and you'll realize how annoying, frustrating & expensive it was. Only fix was to shut down SIM service.