Operating System - HP-UX
1834605 Members
4383 Online
110069 Solutions
New Discussion

Re: failed to run cmapplyconf

 
yc_2
Regular Advisor

failed to run cmapplyconf

Hi,

We have a 2 nodes cluster of MCSG running HP-UX 11.11 on rp4440. I need to change the cluster lock file info but after modifying the cluster ascii file, the system failed on cmapplyconf.

Attached is the scancl.out file.

Any help is very much appreciated.
9 REPLIES 9
smatador
Honored Contributor

Re: failed to run cmapplyconf

Hi
Could you post the cmcheckconf and cmapplyconf error messages?
yc_2
Regular Advisor

Re: failed to run cmapplyconf

# cmcheckconf -C cluster.ascii

Begin cluster verification...

Error: infxprd lan1 did not receive DLPI probe from itself.
Error: infxprd lan1 should not be included in configuration.
Failed to probe network
Error: infxprd lan1 can communicate with orafinp lan1 over subnet 152.226.69.32
on the IP level, but not on the DLPI level.
There is possibly a network component between the two interfaces
that does not allow any data link level traffic through, which violates
a Serviceguard requirement.
Error: Non-uniform connections detected,
infxprd lan6 successfully received from infxprd lan1
but infxprd lan1 did not receive from infxprd lan6.
This could be due to heavy network traffic, or heavy load on infxprd.
Error: Non-uniform connections detected,
orafinp lan1 successfully received from infxprd lan1
but infxprd lan1 did not receive from orafinp lan1.
This could be due to heavy network traffic, or heavy load on orafinp.
Error: Non-uniform connections detected,
orafinp lan6 successfully received from infxprd lan1
but infxprd lan1 did not receive from orafinp lan6.
This could be due to heavy network traffic, or heavy load on orafinp.
Failed to evaluate network
cmcheckconf : Unable to reconcile configuration file cluster.ascii
with discovered configuration information.

# cmapplyconf -v -C /etc/cmcluster/cluster.ascii

Checking cluster file: /etc/cmcluster/cluster.ascii Checking nodes ... Done Checking existing configuration ... Done Gathering configuration information ... Done Gathering configuration information ... Done Gathering configuration information ..
Gathering storage information ..
Found 40 devices on node orafinp
Found 37 devices on node infxprd
Analysis of 77 devices should take approximately 9 seconds 0%----10%----20%----30%----40%----50%----60%----70%----80%----90%----100%
Found 14 volume groups on node orafinp
Found 13 volume groups on node infxprd
Analysis of 27 volume groups should take approximately 1 seconds 0%----10%----20%----30%----40%----50%----60%----70%----80%----90%----100%
.....
Gathering Network Configuration ................. Done

Error: infxprd lan1 did not receive DLPI probe from itself.
Error: infxprd lan1 should not be included in configuration.
Failed to probe network
Error: infxprd lan1 can communicate with orafinp lan1 over subnet 152.226.69.32 on the IP level, but not on the DLPI level.
There is possibly a network component between the two interfaces that does not allow any data link level traffic through, which violates a Serviceguard requirement.
Error: Non-uniform connections detected, infxprd lan6 successfully received from infxprd lan1 but infxprd lan1 did not receive from infxprd lan6.
This could be due to heavy network traffic, or heavy load on infxprd.
Error: Non-uniform connections detected, orafinp lan1 successfully received from infxprd lan1 but infxprd lan1 did not receive from orafinp lan1.
This could be due to heavy network traffic, or heavy load on orafinp.
Error: Non-uniform connections detected, orafinp lan6 successfully received from infxprd lan1 but infxprd lan1 did not receive from orafinp lan6.
This could be due to heavy network traffic, or heavy load on orafinp.
Error: Non-uniform connections detected:
infxprd lan1 is assigned to bridged net 1 but it should be assigned to bridged net 2 to which infxprd lan6 belongs.
Error: Non-uniform connections detected:
orafinp lan1 is assigned to bridged net 1 but it should be assigned to bridged net 2 to which infxprd lan6 belongs.
Error: lan6 on node infxprd cannot be configured in the cluster because it does not have an IP address, and it is not a standby lan for any other lan.
Error: lan6 on node orafinp cannot be configured in the cluster because it does not have an IP address, and it is not a standby lan for any other lan.
Failed to evaluate network
cmapplyconf : Unable to reconcile configuration file /etc/cmcluster/cluster.ascii with discovered configuration information.
smatador
Honored Contributor

Re: failed to run cmapplyconf

Hi,
On the scancl, the first cluster vg is vg04?
So do you have in the cluster.ascii file the first lock pv
infxprd
/dev/dsk/c14t0d4
/dev/dsk/c19t0d2
orafinp
/dev/dsk/c14t1d0
/dev/dsk/c19t0d4
Matti_Kurkela
Honored Contributor

Re: failed to run cmapplyconf

According th the scancl.out, you are running ServiceGuard version A.11.16.

For that MCSG version, it is a requirement that the cluster is halted when changing the cluster lock parameters.

Please see the paragraph "Reconfiguring a Running Cluster" in the chapter "Cluster and Package Maintenance" in the "Managing ServiceGuard" manual. The link below points to the correct version of the manual for ServiceGuard A.11.16.

http://docs.hp.com/en/B3936-90079/ch07s04.html#ciibdgcg

Did you try to modify the cluster lock while the cluster was running? If so, you just discovered the reason why halting the cluster is required.


The scancl.out file is useful in describing the structure of your cluster, but it does not tell us what happened when you ran cmapplyconf.

The cmcheckconf and/or cmapplyconf output would be more helpful in analyzing the current problem. The syslog messages would be important too: if the system rebooted, the messages generated before the reboot would be moved to /var/adm/syslog/OLDsyslog.log.

---------

I also note some strange behaviour in your network connections: for example, the lan6 of orafinp can send linkloop packets to lan1 of orafinp, but not vice versa.

If lan6 is supposed to be the backup of lan1, the test should be successful in both directions.

If lan1 and lan6 are supposed to be separate networks, neither direction should work.

You should think about increasing the redundancy of your network connections, as you seem to have plenty of unused NICs and your current configuration seems to be a bit weak on NIC/LAN fault tolerance.

However, this LAN configuration probably *isn't* related to the cmapplyconf failure.

MK
MK
Matti_Kurkela
Honored Contributor

Re: failed to run cmapplyconf

Oh, you posted more information while I was writing my answer... and seems I was wrong: the network configuration strangeness is *definitely* the reason why cmapplyconf is failing.

You should definitely examine your network connections and the configuration of network switches (or any other network hardware) connecting your cluster nodes: the current network configuration is clearly in violation of ServiceGuard requirements.

When ServiceGuard is configured to use a network interface, it identifies the network segment it is connected to. All interfaces connected to a given segment *must* be able to connect each other, even if the interfaces are on the same node.

MK
MK
yc_2
Regular Advisor

Re: failed to run cmapplyconf

I halted the cluster before running cmapplyconf. Below is the extract from the cluster acsii file:
:
NODE_NAME orafinp
NETWORK_INTERFACE lan1
HEARTBEAT_IP 152.226.69.44
NETWORK_INTERFACE lan6 #Standby LAN
NETWORK_INTERFACE lan4
HEARTBEAT_IP 1.1.1.2
# Changed on 28 Jan09
# FIRST_CLUSTER_LOCK_PV /dev/dsk/c14t0d4
# SECOND_CLUSTER_LOCK_PV /dev/dsk/c19t0d2
FIRST_CLUSTER_LOCK_PV /dev/dsk/c14t1d0
SECOND_CLUSTER_LOCK_PV /dev/dsk/c19t0d4
# List of serial device file names
# For example:
# SERIAL_DEVICE_FILE /dev/tty0p0

# Warning: There are no standby network interfaces for lan4.
# Possible standby Network Interfaces for lan1: lan6.

NODE_NAME infxprd
NETWORK_INTERFACE lan1
HEARTBEAT_IP 152.226.69.46
NETWORK_INTERFACE lan6 #standby LAN
NETWORK_INTERFACE lan4
HEARTBEAT_IP 1.1.1.1
# Changed on 28 Jan09
FIRST_CLUSTER_LOCK_PV /dev/dsk/c19t0d2
SECOND_CLUSTER_LOCK_PV /dev/dsk/c14t0d4
# List of serial device file names
# For example:
:
smatador
Honored Contributor

Re: failed to run cmapplyconf

Look at this two thread, check your network config, even if linkloop are correct on cmscancl, someting made inconsistent the connexion between lan.
http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1056478
http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1106455
yc_2
Regular Advisor

Re: failed to run cmapplyconf

In the last linkloop command, what does the error mean?

root@orafinp [/etc/cmcluster]
# linkloop -i 1 0x00306E5D5844
Link connectivity to LAN station: 0x00306E5D5844
-- OK

root@orafinp [/etc/cmcluster]
# linkloop -i 4 0x0010837B7D6F
Link connectivity to LAN station: 0x0010837B7D6F
-- OK

root@orafinp [/etc/cmcluster]
# linkloop -i 6 0x00306E5DC6FF 6
Link connectivity to LAN station: 0x00306E5DC6FF
-- OK
Link connectivity to LAN station: 6
error: get_msg2 getmsg failed, errno = 4
-- FAILED
frames sent : 1
frames received correctly : 0
reads that timed out : 1
smatador
Honored Contributor

Re: failed to run cmapplyconf

Typo error,
# linkloop -i 6 0x00306E5DC6FF 6<==
6 is not a mac address.