Server Management - Systems Insight Manager
1748180 Members
4025 Online
108759 Solutions
New Discussion

Re: System Unreachable False Positives Reported

 
Andrew Kaplan
Super Advisor

System Unreachable False Positives Reported

Hello --

 

We are running the 7.2.0 distribution with the following hotfixes on the server:

 

HOTFIX72_001.jar Tuesday, 5/7/2013, 2:54 PM EDT HOTFIX72_002.jar Tuesday, 5/7/2013, 2:55 PM EDT HOTFIX72_003.jar Tuesday, 5/7/2013, 2:57 PM EDT HOTFIX72_004.jar Wednesday, 5/8/2013, 2:37 PM EDT HOTFIX72_005.jar Tuesday, 5/7/2013, 3:01 PM EDT HOTFIX72_006.jar Tuesday, 5/7/2013, 3:02 PM EDT HOTFIX72_007.jar Tuesday, 5/7/2013, 3:06 PM EDT HOTFIX72_008.jar Tuesday, 5/7/2013, 3:12 PM EDT HOTFIX72_009.jarFriday, 4/26/2013, 9:36 AM EDT

 


Starting earlier this week, a number of false positives of systems being unreachable started to occur every morning. According the Insight Manager logs, the systems in question would be unavailable at 2:30 in the morning, and then become available

five minutes later.

 

I checked the uptime of the systems in question, and none of them indicated a reboot had occurred during this past week. I checked the facilities staff to see if any work had been done on the network infrastructure, and they reported

that no work was being done at the time indicated.

 

I checked the log files on the client systems, and there were no apparent error messages.

 

The client systems are all running the CentOS 5.3 64-bit distribution. Has anyone an idea as to why the false positives are occurring, and how they can be corrected?

 

Thanks.

A Journey In The Quest Of Knowledge
5 REPLIES 5
Andrew_Haak
Honored Contributor

Re: System Unreachable False Positives Reported

Hello Andrew,

What is the exact event ? Can it be that the agents on the servers crash and restart. So not the server itself but the agent. Do you get these events from all of your servers or just a few ?

Kind regards,

Andrew
Kind regards,

Andrew
Andrew Kaplan
Super Advisor

Re: System Unreachable False Positives Reported

Hi Andrew --

 

Thank-you for your reply, and my apologies for not responding sooner. The same event occurred this morning. I checked the servers that reported the problem, and while it is a majority of the systems, it is not all of them. I checked the cma.log file on several of the systems, and none of them had entries from the past week indicating the agents had crashed at that time. The same can also be said for the hpasmd.log file.

 

Are there any other log files that I can check for entries indiciating an agent crash? If not, how can I determine if the agent did crash on the systems?

A Journey In The Quest Of Knowledge
Andrew_Haak
Honored Contributor

Re: System Unreachable False Positives Reported

Hello Andrew,

 

 Did the error occur on all of the servers at the sames time? Or are they all on the same subnet or software version. As you can read i'm looking for some common cause. Can you attach a screenshot of the events you get for one example server?

 

kind regards,

 

Andrew

Kind regards,

Andrew
Andrew Kaplan
Super Advisor

Re: System Unreachable False Positives Reported

Hi Andrew --

 

The error did occur on all of the servers at the same time, and they are all on the same subnet. I went through my e-mail this morning, and there were no e-mails indicating the condition that I reported in my original posting. I will keep an eye on the systems, and if and when the next event occurs, I will post it here.

 

 

A Journey In The Quest Of Knowledge
jim goodman
Trusted Contributor

Re: System Unreachable False Positives Reported

same subnet, same time sounds like something is happening to that segment, may want to consider any Layer-2/3 devices in that path possibly being the culprit. Typically false positives are random occurances if they happen.