Operating System - HP-UX
1836444 Members
2312 Online
110100 Solutions
New Discussion

Re: Service Guard anomaly

 
Tom S. Michalek
Occasional Contributor

Service Guard anomaly

Have a 2 node cluster running on HP-UX 11.31. Two subnets are configured with failover nics defined for each subnet.

Lan0 fails to lan5
Lan3 fails to lan6

When unplugging/replugging cables, getting odd results. smrs252 fails over both interfaces and recovers both interfaces properly when they are re-plugged in. On smrs253a, lan0 fails over and recovers properly but LAN3 only fails over properly. When cable is plugged back in for lan3, it doesn't autorecover...???

Running cmhaltnode/cmrunnode seems to reset the LAN3 interface to up but generates the following message:

smrs253a.root /etc/cmcluster$ cmrunnode -v smrs253a
cmrunnode: Validating network configuration...
Gathering network information
Beginning network probing (this may take a while)
Completed network probing
smrs252a lan3 can communicate with smrs253a lan3 over subnet 10.42.52.0
on the IP level, but not on the DLPI level.
There is possibly a network component between the two interfaces
that does not allow any data link level traffic through, which violates
a Serviceguard requirement.
smrs253a lan3 can communicate with smrs252a lan3 over subnet 10.42.52.0
on the IP level, but not on the DLPI level.
There is possibly a network component between the two interfaces
that does not allow any data link level traffic through, which violates
a Serviceguard requirement.
Failed to evaluate network
cmrunnode: Failed to validate the network configuration as reported above but will try to start the nodes anyway.
cmrunnode: Network validation complete
cmrunnode: Validating cluster lock disk .... Done
Waiting for nodes to join ....... done


Details of hte config
smrs253a.root /etc/rc.config.d$ cmviewcl -v

CLUSTER STATUS
hrit_blade_cluster up

NODE STATUS STATE
smrs253a up running

Cluster_Lock_LUN:
DEVICE STATUS
/dev/disk/disk6011 up

Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 0/1/1/0 lan0
PRIMARY up 0/2/2/1 lan3
STANDBY up 0/3/0/0/0/0/2/0/0/ lan5
STANDBY up 0/3/0/0/0/0/4/0/0/ lan6

PACKAGE STATUS STATE AUTO_RUN NODE
hcm91prd up running enabled smrs253a

Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual

Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 130.247.204.0
Subnet up 10.42.52.0

Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled smrs253a (current)
Alternate up enabled smrs252a


NODE STATUS STATE
smrs252a up running

Cluster_Lock_LUN:
DEVICE STATUS
/dev/disk/disk6131 up

Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 0/1/1/0 lan0
PRIMARY up 0/2/2/1 lan3
STANDBY up 0/3/0/0/0/0/2/0/0/ lan5
STANDBY up 0/3/0/0/0/0/4/0/0/ lan6

PACKAGE STATUS STATE AUTO_RUN NODE
hcm9prd up running enabled smrs252a

Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual

Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 130.247.204.0
Subnet up 10.42.52.0

Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled smrs252a (current)
Alternate up enabled smrs253a
4 REPLIES 4
SoorajCleris
Honored Contributor

Re: Service Guard anomaly

Hi,

could you please run a cmquerycl for both nodes and check if the network interface Lan3 is shown in under both hosts.


Regards,
Sooraj
"UNIX is basically a simple operating system, but you have to be a genius to understand the simplicity" - Dennis Ritchie
SoorajCleris
Honored Contributor

Re: Service Guard anomaly

What is the status of lan3 in

1.ioscan
2. lanscan
3. ifconfig

Regards,
Sooraj
"UNIX is basically a simple operating system, but you have to be a genius to understand the simplicity" - Dennis Ritchie
Vijaykumar_1
Valued Contributor

Re: Service Guard anomaly

Looking at the following error,i would suggest you to compare the lan interface that is working and the other thats having an issue.

"smrs253a lan3 can communicate with smrs252a lan3 over subnet 10.42.52.0
on the IP level, but not on the DLPI level.
There is possibly a network component between the two interfaces
that does not allow any data link level traffic through, which violates
a Serviceguard requirement."

You would have to check the assignments of ip/subnet for lan0 and lan3. Hope there should be some differences, which can help resolving the issue
melvyn burnard
Honored Contributor

Re: Service Guard anomaly

This is usually caused by an intermediate piece opf hardware such as a switch, not allowing DLPI trafic through
As suggested, you can use cmquerycl as a troubleshooting tool
from each node run:
cmquerycl -v -C test.ascii -n smrs252a -n smrs253a
then review what you see on the screen, and go look at the test.ascii files and see if they show any issues.

I suggest you also contact your networking people to get them to verify all the relevant switch ports are set correctly
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!