System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

Lan getting failed in the cluster

SUDHAKAR_18
Trusted Contributor

Lan getting failed in the cluster

I have a MC serviceguard cluster with two RX2660 servers.
One of the lan is getting failed every day and lan is getting recover after some minutes. we changed lan cable and also checked for connectivity.
But problem is occuring every day.

--------------------------------------------
Jul 6 04:32:17 easiapp1 cmnetd[8074]: lan2 failed
Jul 6 04:32:17 easiapp1 cmnetd[8074]: lan2 switching to lan0
Jul 6 04:32:17 easiapp1 cmnetd[8074]: Subnet 192.168.1.64 switching from lan2 to lan0
Jul 6 04:32:17 easiapp1 cmnetd[8074]: Subnet 192.168.1.64 switched from lan2 to lan0
Jul 6 04:32:17 easiapp1 cmnetd[8074]: lan2 switched to lan0
Jul 6 04:32:17 easiapp1 cmcld[8066]: Local switch has occurred since net_id 0x3 was not found on subnet 192.168.1.64.
Jul 6 04:32:21 easiapp1 inetd[1679]: registrar/tcp: Connection from easiapp1 (192.168.1.81) at Mon Jul 6 04:32:21 2009
Jul 6 04:32:31 easiapp1 cmnetd[8074]: lan2 recovered
Jul 6 04:32:31 easiapp1 cmnetd[8074]: Subnet 192.168.1.64 switching from lan0 to lan2
Jul 6 04:32:31 easiapp1 cmnetd[8074]: Subnet 192.168.1.64 switched from lan0 to lan2
Jul 6 04:32:31 easiapp1 cmnetd[8074]: lan0 switched to lan2
Jul 6 04:32:37 easiapp1 cmnetd[8074]: lan2 failed
Jul 6 04:32:37 easiapp1 cmnetd[8074]: lan2 switching to lan0
Jul 6 04:32:37 easiapp1 cmnetd[8074]: Subnet 192.168.1.64 switching from lan2 to lan0
Jul 6 04:32:37 easiapp1 cmnetd[8074]: Subnet 192.168.1.64 switched from lan2 to lan0
Jul 6 04:32:37 easiapp1 cmnetd[8074]: lan2 switched to lan0
Jul 6 04:32:37 easiapp1 cmcld[8066]: Local switch has occurred since net_id 0x3 was not found on subnet 192.168.1.64.
Jul 6 04:32:38 easiapp1 inetd[1756]: registrar/tcp: Connection from localhost (127.0.0.1) at Mon Jul 6 04:32:38 2009
Jul 6 04:32:44 easiapp1 inetd[1867]: hacl-cfg/udp: Connection from localhost (127.0.0.1) at Mon Jul 6 04:32:44 2009
Jul 6 04:32:44 easiapp1 inetd[1868]: hacl-cfg/tcp: Connection from localhost (127.0.0.1) at Mon Jul 6 04:32:44 2009
Jul 6 04:32:44 easiapp1 inetd[1869]: auth/tcp: Connection from localhost (127.0.0.1) at Mon Jul 6 04:32:44 2009
Jul 6 04:32:48 easiapp1 inetd[1911]: hacl-cfg/tcp: Connection from localhost (127.0.0.1) at Mon Jul 6 04:32:48 2009
Jul 6 04:32:48 easiapp1 inetd[1912]: auth/tcp: Connection from localhost (127.0.0.1) at Mon Jul 6 04:32:48 2009
Jul 6 04:32:51 easiapp1 cmnetd[8074]: lan2 recovered
Jul 6 04:32:51 easiapp1 cmnetd[8074]: Subnet 192.168.1.64 switching from lan0 to lan2
Jul 6 04:32:51 easiapp1 cmnetd[8074]: Subnet 192.168.1.64 switched from lan0 to lan2
Jul 6 04:32:51 easiapp1 cmnetd[8074]: lan0 switched to lan2
Jul 6 04:33:39 easiapp1 cmnetd[8074]: lan2 failed
Jul 6 04:33:39 easiapp1 cmnetd[8074]: lan2 switching to lan0
Jul 6 04:33:39 easiapp1 cmnetd[8074]: Subnet 192.168.1.64 switching from lan2 to lan0
Jul 6 04:33:39 easiapp1 cmnetd[8074]: Subnet 192.168.1.64 switched from lan2 to lan0
Jul 6 04:33:39 easiapp1 cmnetd[8074]: lan2 switched to lan0
Jul 6 04:33:39 easiapp1 cmcld[8066]: Local switch has occurred since net_id 0x3 was not found on subnet 192.168.1.64.
Jul 6 04:33:42 easiapp1 inetd[2321]: hacl-cfg/udp: Connection from localhost (127.0.0.1) at Mon Jul 6 04:33:42 2009
Jul 6 04:33:42 easiapp1 inetd[2322]: hacl-cfg/tcp: Connection from localhost (127.0.0.1) at Mon Jul 6 04:33:42 2009
Jul 6 04:33:42 easiapp1 inetd[2323]: auth/tcp: Connection from localhost (127.0.0.1) at Mon Jul 6 04:33:42 2009
Jul 6 04:33:48 easiapp1 inetd[2387]: hacl-cfg/tcp: Connection from localhost (127.0.0.1) at Mon Jul 6 04:33:48 2009
Jul 6 04:33:48 easiapp1 inetd[2388]: auth/tcp: Connection from localhost (127.0.0.1) at Mon Jul 6 04:33:48 2009
Jul 6 04:33:53 easiapp1 cmnetd[8074]: lan2 recovered
Jul 6 04:33:53 easiapp1 cmnetd[8074]: Subnet 192.168.1.64 switching from lan0 to lan2
Jul 6 04:33:53 easiapp1 cmnetd[8074]: Subnet 192.168.1.64 switched from lan0 to lan2
Jul 6 04:33:53 easiapp1 cmnetd[8074]: lan0 switched to lan2
Jul 6 04:34:21 easiapp1 cmnetd[8074]: lan2 failed
Jul 6 04:34:21 easiapp1 cmnetd[8074]: lan2 switching to lan0
Jul 6 04:34:21 easiapp1 cmnetd[8074]: Subnet 192.168.1.64 switching from lan2 to lan0
Jul 6 04:34:21 easiapp1 cmnetd[8074]: Subnet 192.168.1.64 switched from lan2 to lan0
Jul 6 04:34:21 easiapp1 cmnetd[8074]: lan2 switched to lan0
Jul 6 04:34:21 easiapp1 cmcld[8066]: Local switch has occurred since net_id 0x3 was not found on subnet 192.168.1.64.
Jul 6 04:34:21 easiapp1 inetd[2641]: registrar/tcp: Connection from easiapp1 (192.168.1.81) at Mon Jul 6 04:34:21 2009
Jul 6 04:34:27 easiapp1 inetd[2666]: registrar/tcp: Connection from localhost (127.0.0.1) at Mon Jul 6 04:34:27 2009
Jul 6 04:34:37 easiapp1 cmnetd[8074]: lan2 recovered
Jul 6 04:34:37 easiapp1 cmnetd[8074]: Subnet 192.168.1.64 switching from lan0 to lan2
Jul 6 04:34:37 easiapp1 cmnetd[8074]: Subnet 192.168.1.64 switched from lan0 to lan2
Jul 6 04:34:37 easiapp1 cmnetd[8074]: lan0 switched to lan2
Jul 6 04:34:40 easiapp1 inetd[2779]: hacl-cfg/udp: Connection from localhost (127.0.0.1) at Mon Jul 6 04:34:40 2009
Jul 6 04:34:40 easiapp1 inetd[2780]: hacl-cfg/tcp: Connection from localhost (127.0.0.1) at Mon Jul 6 04:34:40 2009
Jul 6 04:34:40 easiapp1 inetd[2781]: auth/tcp: Connection from localhost (127.0.0.1) at Mon Jul 6 04:34:40 2009
----------------------------------------------
15 REPLIES
Steven E. Protter
Exalted Contributor

Re: Lan getting failed in the cluster

Shalom,

The LAN connection in one of the nodes in the cluster has gone offline.

Could be:
* Hardware LAN
* Cable
* Network switch port
* Network switch port configuration

The only thing to do is attack the problem, identify it and correct it.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
SUDHAKAR_18
Trusted Contributor

Re: Lan getting failed in the cluster

Hi,

We have checked
* Cable
* Network switch port
* Network switch port configuration
but still problem exists. please tell me how i can check the health of server's lan port?
Mel Burslan
Honored Contributor

Re: Lan getting failed in the cluster

If the problem is occurring at about the same time every day, contact your network team and have them check the routers, switches and more than anything else, firewalls in between these two servers, for possible, daily equipment reboots. Some older firmware versions on some Cisco equipment were known to require daily reboots to keep functioning. Someone might have forgotten to update firmware of left the daily reboot there without thinking much.

________________________________
UNIX because I majored in cryptology...
Sunny123_1
Esteemed Contributor

Re: Lan getting failed in the cluster

Hi

It is showing that your lan2 is getting failed and recoverd.

You already changed the cable.Did you check your NIC card???

Also use lanadmin to check the lan status for more details.


Regards
Sunny
SUDHAKAR_18
Trusted Contributor

Re: Lan getting failed in the cluster

No.. Its not happening at same time of a day.

I suspect about the EMS services.
Mel Burslan
Honored Contributor

Re: Lan getting failed in the cluster

If you checked the NICs on both sides as well as the cabling between the servers and they all checked out fine, I'd insist on hecking with the network folks to see if their equipment is causing disconnects due to a bug or something. They should have detailed logs of their equipment operation and should be able to pinpoint events taking place at a certain time, correlating your timestamp when this event takes place.
________________________________
UNIX because I majored in cryptology...
Vishu
Trusted Contributor

Re: Lan getting failed in the cluster

Hi,

It seems from the logs that lan2 is failing very frequently. it may be the cause of the loose cable connection.

*You have checked the cable. Have you tried replacing it.

*Involve your network team and try switching the connection to new port on switch.

*Run lanadmin to see the Hardware and Operational Status of the lan2.

*If these are done already, then network team can further help you out by checking some stuff at their end.

Thanks
SUDHAKAR_18
Trusted Contributor

Re: Lan getting failed in the cluster

We have changed cable. checked with networking team. Lan card is not changed as of now.
Jitesh purohit_1
Regular Advisor

Re: Lan getting failed in the cluster

Hi Sudhakar ,

Did you checked the /var/adm/nettl.LOG000 logs , You can read it with netfmt command

Jitesh

SUDHAKAR_18
Trusted Contributor

Re: Lan getting failed in the cluster

-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Timestamp : Thu Jul 02 IST 2009 17:39:09.925945
Process ID : [ICS] Subsystem : IETHER
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 2 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2000> 1000Base-T in path 0/2/1/0/6/0
Detected a faulty or disconnected cable.

-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Timestamp : Thu Jul 02 IST 2009 17:48:42.969381
Process ID : [ICS] Subsystem : IETHER
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 2 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2000> 1000Base-T in path 0/2/1/0/6/0
Detected a faulty or disconnected cable.

-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Timestamp : Thu Jul 02 IST 2009 20:11:25.236822
Process ID : [ICS] Subsystem : IETHER
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 2 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2000> 1000Base-T in path 0/2/1/0/6/0
Detected a faulty or disconnected cable.

-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Timestamp : Thu Jul 02 IST 2009 20:26:38.507654
Process ID : [ICS] Subsystem : IETHER
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 2 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2000> 1000Base-T in path 0/2/1/0/6/0
Detected a faulty or disconnected cable.

-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Timestamp : Thu Jul 02 IST 2009 20:27:04.577416
Process ID : [ICS] Subsystem : IETHER
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 2 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2000> 1000Base-T in path 0/2/1/0/6/0
Detected a faulty or disconnected cable.

-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Timestamp : Thu Jul 02 IST 2009 20:27:15.996185
Process ID : [ICS] Subsystem : IETHER
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 2 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2000> 1000Base-T in path 0/2/1/0/6/0
Detected a faulty or disconnected cable.

-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Timestamp : Thu Jul 02 IST 2009 20:27:44.076472
Process ID : [ICS] Subsystem : IETHER
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 2 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2000> 1000Base-T in path 0/2/1/0/6/0
Detected a faulty or disconnected cable.

-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Timestamp : Thu Jul 02 IST 2009 21:12:46.853906
Process ID : [ICS] Subsystem : IETHER
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 2 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2000> 1000Base-T in path 0/2/1/0/6/0
Detected a faulty or disconnected cable.

-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Timestamp : Fri Jul 03 IST 2009 00:54:41.409399
Process ID : [ICS] Subsystem : IETHER
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 2 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2000> 1000Base-T in path 0/2/1/0/6/0
Detected a faulty or disconnected cable.

-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Timestamp : Mon Jul 06 IST 2009 04:32:15.519897
Process ID : [ICS] Subsystem : IETHER
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 2 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2000> 1000Base-T in path 0/2/1/0/6/0
Detected a faulty or disconnected cable.

-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Timestamp : Mon Jul 06 IST 2009 04:32:36.019125
Process ID : [ICS] Subsystem : IETHER
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 2 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2000> 1000Base-T in path 0/2/1/0/6/0
Detected a faulty or disconnected cable.

-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Timestamp : Mon Jul 06 IST 2009 04:33:38.240252
Process ID : [ICS] Subsystem : IETHER
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 2 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2000> 1000Base-T in path 0/2/1/0/6/0
Detected a faulty or disconnected cable.

-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Timestamp : Mon Jul 06 IST 2009 04:34:21.023888
Process ID : [ICS] Subsystem : IETHER
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 2 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2000> 1000Base-T in path 0/2/1/0/6/0
Detected a faulty or disconnected cable.

-------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#%
Timestamp : Tue Jul 07 IST 2009 13:12:33.257609
Process ID : [ICS] Subsystem : IETHER
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 2 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2000> 1000Base-T in path 0/2/1/0/6/0
Detected a faulty or disconnected cable.
likid0
Honored Contributor

Re: Lan getting failed in the cluster

You can also check what's your network polling config in the cluster.ascii.

NETWORK_POLLING_INTERVAL

you could give it, a higher value(not to high), and take a look if your lan failures continue.
Windows?, no thanks
Jitesh purohit_1
Regular Advisor

Re: Lan getting failed in the cluster

Hi Sudhakar,

Please check with network team, The nettlog log clearly says it could be a loose network cable connection issue...

2000> 1000Base-T in path 0/2/1/0/6/0
Detected a faulty or disconnected cable

Thanks
Jitesh
Mel Burslan
Honored Contributor

Re: Lan getting failed in the cluster

If your network team says their equipment is not rebooting or doing anything funky at the time these errors get logged to the console, there is only one way to explain this phenomena: A NIC or Network Switch port that is intermittently failing. Since the problem is not periodic or following any pattern as far as the timing is concerned, my best suggestion is to start with using two different, known good, switch ports and connect these cables coming from two servers, into the new switch ports. If the problem does not go away, then I hope you have a spare NIC on each of these cluster members. You will need to recompile your cluster binaries with a different set of NICs. replacing ones you are using.

This problem sounds anything but trivial. Good luck.
________________________________
UNIX because I majored in cryptology...
melvyn burnard
Honored Contributor

Re: Lan getting failed in the cluster

take the cable out of lan2, and plug it into lan0, then plug teh cable that was in lan0 into lan2.
This should give a better indication of where the issue may lie.
If it follows the cable, then it is a cable/switch issue. If it stay with lan2, you have a faulty NIC
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
SUDHAKAR_18
Trusted Contributor

Re: Lan getting failed in the cluster

After changing lan cables (active & standby) problem is resolved.