Operating System - HP-UX
1834512 Members
3848 Online
110068 Solutions
New Discussion

syslog - cmcld: lan0 failed - intermittent messages

 
SOLVED
Go to solution
Omar Alvi_1
Super Advisor

syslog - cmcld: lan0 failed - intermittent messages

Hi,

I have this issue with a fully functional 2 node serviceguard configuration. Sine the beginning we're getting the following messages on one of the nodes related to one of its three subnets. This subnet is configured as a heartbeat subnet.

The utilization on this subnet is insignificant so that shouldn't be an issue.

We note that the time between every failure and the lan coming back up again is 52 seconds usually.

Our NETWORK_POLLING_INTERVAL is the default 2 seconds.

Jul 24 02:30:00 afgprd1 cmcld: Subnet 172.16.3.0 up
Jul 24 02:36:41 afgprd1 cmcld: lan0 failed
Jul 24 02:36:41 afgprd1 cmcld: Subnet 172.16.3.0 down
Jul 24 02:38:17 afgprd1 cmcld: lan0 recovered
Jul 24 02:38:17 afgprd1 cmcld: Subnet 172.16.3.0 up
Jul 24 02:38:43 afgprd1 cmcld: HB connection to 172.16.3.46 not responding, closing
Jul 24 02:38:43 afgprd1 cmcld: GS connection to 172.16.3.46 not responding, closing
Jul 24 02:42:43 afgprd1 cmcld: HB connection to 172.16.3.46 is responding
Jul 24 02:42:43 afgprd1 cmcld: GS connection to 172.16.3.46 is responding
Jul 24 02:45:26 afgprd1 cmcld: lan0 failed
Jul 24 02:45:26 afgprd1 cmcld: Subnet 172.16.3.0 down
Jul 24 02:46:18 afgprd1 cmcld: lan0 recovered
Jul 24 02:46:18 afgprd1 cmcld: Subnet 172.16.3.0 up
Jul 24 02:49:30 afgprd1 cmcld: lan0 failed
Jul 24 02:49:30 afgprd1 cmcld: Subnet 172.16.3.0 down
Jul 24 02:50:24 afgprd1 cmcld: lan0 recovered
Jul 24 02:50:24 afgprd1 cmcld: Subnet 172.16.3.0 up
Jul 24 02:54:21 afgprd1 cmcld: lan0 failed
Jul 24 02:56:43 afgprd1 cmcld: GS connection to 172.16.3.46 not responding, closing
Jul 24 02:54:21 afgprd1 cmcld: Subnet 172.16.3.0 down
Jul 24 03:00:43 afgprd1 cmcld: GS connection to 172.16.3.46 is responding
Jul 24 02:56:07 afgprd1 cmcld: Subnet 172.16.3.0 up
Jul 24 02:56:07 afgprd1 cmcld: lan0 recovered
Jul 24 03:08:24 afgprd1 cmcld: lan0 failed
Jul 24 03:08:24 afgprd1 cmcld: Subnet 172.16.3.0 down
Jul 24 03:09:18 afgprd1 cmcld: lan0 recovered
Jul 24 03:09:18 afgprd1 cmcld: Subnet 172.16.3.0 up
Jul 24 03:44:51 afgprd1 cmcld: lan0 failed
Jul 24 03:44:51 afgprd1 cmcld: Subnet 172.16.3.0 down
Jul 24 03:45:42 afgprd1 cmcld: lan0 recovered
Jul 24 03:45:42 afgprd1 cmcld: Subnet 172.16.3.0 up

The complaint is coming from the serviceguard daemon only .. Is it of any siognificance? How can it be corrected.

Appreciate any assistance

Thanks and Regards,

-Alvi
5 REPLIES 5
Mel Burslan
Honored Contributor

Re: syslog - cmcld: lan0 failed - intermittent messages

How are these two nodes connected to each other on this heartbeat lan ? Crossover or hub/switch ?
________________________________
UNIX because I majored in cryptology...
Mohanasundaram_1
Honored Contributor
Solution

Re: syslog - cmcld: lan0 failed - intermittent messages

Hi Omar,

As Mel suspects, this problem could be happening if it is connected using a cross-over cable.

Format and check the /var/adm/nettl.LOG000 for any cable/port diconnection messages.

to format the log use

# netfmt -Nlf /var/adm/nettl.LOG000 > /tmp/nettl.log

then view this /tmp/nettl.log.

If you find excessive disconnection errors the cable, Switch port or the LAN card may be having problem.

You can also check the duplex settings of the LAN card and switch port connecting it for any mismatch.

With regards,
MOhan.
Attitude, Not aptitude, determines your altitude
Omar Alvi_1
Super Advisor

Re: syslog - cmcld: lan0 failed - intermittent messages

Thanks for the replies,

The formatted nettl log gives the following error message

**********************Gigabit Ethernet LAN/9000 Networking******************@#%
Timestamp : Fri Aug 12 WAT 2005 08:09:18.250107
Process ID : [ICS] Subsystem : IGELAN
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 0 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2004> 1000Base-T in path 1/0/1/1/0/4/0
Detected a faulty or disconnected cable.

So it means now we'll investigate the LAN subsystem.

I had one query though, the error messages although reported in syslog are generated by the cmcld daemon. Whay and HOW is cmcld looging these messages to syslog? What is the intelligence involved?

-Alvi
Thayanidhi
Honored Contributor

Re: syslog - cmcld: lan0 failed - intermittent messages

Hi,
Once configured the cmcld will monitor the subnet/interfaces. That's why cmcld write these to syslog. If you halt the cluster then cmcld won't monitor the LAN interfaces. Posibly poor cable or settings. Check/fix the speed/duplex on switch/card.

Regds
TT
Attitude (not aptitude) determines altitude.
Stephen Doud
Honored Contributor

Re: syslog - cmcld: lan0 failed - intermittent messages

Serviceguard performs a DLPI-based (layer 2 link-level) packet transmission test between NICs listed in the cluster binary every 2 seconds (default NETWORK_POLLING_INTERVAL as listed in the cluster configuration file - See cmviewconf).
If any monitored NIC fails to complete a transmission, it is marked DOWN and logged as such in syslog.log by cmcld.
The NIC is marked up when another test completes successfully.