System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

EMS lanmond monitoring lan900 failures

Mark Fisher_4
Frequent Advisor

EMS lanmond monitoring lan900 failures

For over 3 years we continue to get ems lanmond monitoring failures. Versions are HPUX 11.23 and EMS is A.04.20.23

We get the following errors in the syslog. The lanmond fails to get the status of the lan900 device and then the service guard software package (which monitors the status of lan900) fails. There are absolutly NO network problems. I also ping the servers with another monitoring job and it never fails.

This seems to be a snmp trap problem with ems and not a network failure. We even modified the lanmond disctionary file and added
"-t 15 -r 15" to enable a 15 second timeout and 15 retrys and it failed 1 day later.

This is killing us as our sw package is constantly failing its checks and failing over.

Any help anyone can provide would be greatly appreciated.

Feb 25 08:48:26 ndctfa9 cmcld[12582]: Resource monitor for resource /net/interfaces/lan/status/lan900 is having problems.

Feb 25 08:48:26 ndctfa9 cmcld[12582]: Resource /net/interfaces/lan/status/lan900 is assumed to be unavailable.

Feb 25 08:48:26 ndctfa9 cmcld[12582]: Resource /net/interfaces/lan/status/lan900 does not meet package RESOURCE_UP_VALUE for package ndctprfe09.

Feb 25 08:48:26 ndctfa9 cmcld[12582]: Resource /net/interfaces/lan/status/lan900 in package ndctprfe09 does not meet RESOURCE_UP_VALUE.

Feb 25 08:48:26 ndctfa9 cmcld[12582]: Executing '/opt/vzb/share/cmcluster/service.ctl stop' for package ndctprfe09, as service PKG*53769.


2 REPLIES
Mark McDonald_2
Trusted Contributor

Re: EMS lanmond monitoring lan900 failures

Mark

Sorry I cannot offer an suggestions. But

-> Feb 25 08:48:26 ndctfa9 cmcld[12582]: Resource /net/interfaces/lan/status/lan900 is assumed to be unavailable.

I don't like the word "assumed" in that error. Surely the engineering team could come up with something better?

Rgs
Mark
z930405
Occasional Visitor

Re: EMS lanmond monitoring lan900 failures

I have been hit with exactly the same problem and am running the same version of HPUX 11.21 and EMS 04.20.23.

 

I have also confirmed the there was no issue with network and lan900 was definately up as is confirmed by

netfmt -v -f /var/adm/nettl.LOG000 

 

I am interested if there was a resolution to this issue. The serviceguard package is restarting and it is causing grief.

 

It appears to be an issue with EMS intermittantly giving valse value for resls /net/interfaces/lan/status/lan900

 

will appreciate your input