Operating System - HP-UX
1837253 Members
2460 Online
110115 Solutions
New Discussion

lan failure, service guard did not react, link problem

 
Klaus Bauer
Occasional Advisor

lan failure, service guard did not react, link problem

Hi,

i had the following problem:

on lan1 (primary lan) it was not possible to send any IP-Traffic, but MC/Service Gurard OPS-Edition did not switch from lan1 to lan2. At the time the problem occured i can see in the logfile of the Cisco Switch for the port of lan1 the message "LINK DOWN", but for the Server everything was fine.
I have the btlan-driver with an EISA-Card INP0500.

Any ideas what could have caused the problem and how to solve it?

thankx in advance,
klaus
Unix rocks
10 REPLIES 10
Christopher McCray_1
Honored Contributor

Re: lan failure, service guard did not react, link problem

Hello,

What is the output of syslog during that time?

What is in your cluster.ascii file?

Output of cmviewcl -v?

More info, please.


Chris
It wasn't me!!!!
Klaus Bauer
Occasional Advisor

Re: lan failure, service guard did not react, link problem

Hi Chris,

no interesting entries in syslog, lanadmin output showed that lan1 is fine, no messages in nettl.log, but pinging another host failed. The HP-Support Team has checked the system (logfiles, patches,....) and they told me the system is fine...

bye,
klaus
Unix rocks

Re: lan failure, service guard did not react, link problem

Klaus,

I experienced this problem once, but not on a EISA 100BT card, but on a PCI 100FX card - HP were able to provide me with a LAN diagnostic tool (I think called landiag?) which gave them a low level view of the LAN card (buffers/registers etc.) - the tool was run when the problem re-ocuured to give the HP guys something to go on... You should request this tool.

I assume HP have already reccomended all the latest LAN/ARPA/Streams patches etc?

HTH

Duncan


I am an HPE Employee
Accept or Kudo
Frank Slootweg
Honored Contributor

Re: lan failure, service guard did not react, link problem

It is important to realize that MC/ServiceGuard (et al) only monitor the
(primary) LAN status *within* the cluster, i.e. if, like seems to be the
case in your case, the nodes in the cluster can communicate with
*eachother* via the primary LAN, the LAN will *not* fail-over. See the
MC/ServiceGuard documentation for details.

If you want to monitor *external*, i.e. from the cluster to the outside
and vice versa, LANs, you have to do that 'yourself', which, for
example, can be done by a MC/SG package.
Klaus Bauer
Occasional Advisor

Re: lan failure, service guard did not react, link problem

Hello Duncan,

the tool is now called lanadmin, and i used it at the time when the problem occured but it said everything is fine . The server has a resonable patchlevel and HP clould not tell me if there is a patch that solves this problem.

regards,
klaus
Unix rocks
Klaus Bauer
Occasional Advisor

Re: lan failure, service guard did not react, link problem

Hi Frank,

well the Cisco switch detected a link down for the lan1 interface (and the interfaces lan1 and lan2 are monitored by Service Guard) and the status for port lan1 in the switch was not connected, so there was no communication possible on lan1 but the lan driver did not detect the problem and so did Service Guard.

bye,
klaus
Unix rocks

Re: lan failure, service guard did not react, link problem

Klaus,

No the tool I'm talking about wasn't lanadmin - I am familiar with lanadmin. The tool I am thinking about isn't a standard executable that is part of HP-UX, I had to get a copy of it e-mailed to me by HP. Now I've thought about it a bit longer I seem to recall the tool may have been called 'lanshow'. Speak to your Response Centre engineer - they should be able to find this for you...

Cheers,

Duncan

I am an HPE Employee
Accept or Kudo
Frank Slootweg
Honored Contributor

Re: lan failure, service guard did not react, link problem

> Hi Frank,
>
> well the Cisco switch detected a link down
> for the lan1 interface (and the interfaces
> lan1 and lan2 are monitored by Service
> Guard) and the status for port lan1 in the
> switch was not connected, so there was no
> communication possible on lan1 but the lan
> driver did not detect the problem and so did
> Service Guard.

Since you do not say *where* the Cisco switch is located, it is impossible to tell if what you experienced is normal or not.

As I mentioned, as long as the nodes *in* the cluster can communicate with *eachother*, MC/SG will think everything is OK and will *not* fail-over the primary LAN to the standby.

If you want more help, then please give some more details about your configuration, i.e. for example a 'picture' of the hardware (+ network) layout, and a description of exactly which component 'failed'.

In general, see the "Managing MC/ServiceGuard" manual, Chapter 3 "Understanding MC/ServiceGuard Software Components" -> "How the Network Manager Works" -> " Monitoring LAN Interfaces and Detecting Failure" (for example at <>).



Steve Lewis
Honored Contributor

Re: lan failure, service guard did not react, link problem

You said it was not possible to send any IP traffic over the primary lan, but that does not necessarily mean that the primary lan was down at the link layer (sorry I am not an expert at Cisco switches).

Did you try the lan switchover test by saying 'ifconfig lanN down'? Or (even better) by unplugging the main lan cable from the back of the server? If not, then try these methods.

I think that Serviceguard doesn't monitor lan interfaces at the IP level using ping, but at a lower level (linkloop?). This is the reason your cluster hosts have to be on the same lan segment/subnet. Serviceguard has a lot of low-level lan monitoring code, for negotiation of failover, failure detection and so on, IP level monitoring/negotiation is not guaranteed to produce the desired results.

I think that may be the reason your lan didn't switch over.

Another reason could be the configuration. Maybe post the output of cmviewcl -v, cmquerycl or the lan portions of your cluster and package definition files.



Citibank - HP (Unni)
Occasional Advisor

Re: lan failure, service guard did not react, link problem

Hi,

As per my understanding on your problme it can be either few reasons for the MCSG not swing the LAN.

01. MCSG configuration net switching may be disbaled.

02. The switching intervell set on the cluster is too high.

Pls do a test by unplugging the cable , pls dont tru by ifconfig lanx down , it may not work.
Unni