Operating System - HP-UX
1833017 Members
2225 Online
110048 Solutions
New Discussion

Re: Failed to evaluate network

 
Aliben Mendoza
Occasional Contributor

Failed to evaluate network

I present the following problem when running the cmcheckconf. This is a MetroCluster configuration with VM and multiple sub-networks.

crpvm01:/etc/cmcluster>cmcheckconf -C cmclconfig.ascii
Defaulting MAX_CONFIGURED_PACKAGES to 300.
Defaulting MAX_CONFIGURED_PACKAGES to 300.
Non-uniform connections detected,
crpvm01 lan0 10.20.0.22 successfully received from crpvm02 lan1 10.20.0.21
but crpvm02 lan1 10.20.0.21 did not receive from crpvm01 lan0 10.20.0.22.
This could be due to heavy network traffic, or heavy load or routing configuration on crpvm01.
Non-uniform connections detected,
crpvm01 lan0 10.20.0.22 successfully received from crpvm03 lan1 10.20.1.20
but crpvm03 lan1 10.20.1.20 did not receive from crpvm01 lan0 10.20.0.22.
This could be due to heavy network traffic, or heavy load or routing configuration on crpvm01.
Non-uniform connections detected,
crpvm01 lan0 10.20.0.22 successfully received from crpvm04 lan1 10.20.1.21
but crpvm04 lan1 10.20.1.21 did not receive from crpvm01 lan0 10.20.0.22.
This could be due to heavy network traffic, or heavy load or routing configuration on crpvm01.
Non-uniform connections detected,
crpvm02 lan0 10.20.0.23 successfully received from crpvm01 lan1 10.20.0.20
but crpvm01 lan1 10.20.0.20 did not receive from crpvm02 lan0 10.20.0.23.
This could be due to heavy network traffic, or heavy load or routing configuration on crpvm02.
Non-uniform connections detected,
crpvm02 lan0 10.20.0.23 successfully received from crpvm03 lan1 10.20.1.20
but crpvm03 lan1 10.20.1.20 did not receive from crpvm02 lan0 10.20.0.23.
This could be due to heavy network traffic, or heavy load or routing configuration on crpvm02.
Non-uniform connections detected,
crpvm02 lan0 10.20.0.23 successfully received from crpvm04 lan1 10.20.1.21
but crpvm04 lan1 10.20.1.21 did not receive from crpvm02 lan0 10.20.0.23.
This could be due to heavy network traffic, or heavy load or routing configuration on crpvm02.
Non-uniform connections detected,
crpvm03 lan0 10.20.1.22 successfully received from crpvm04 lan1 10.20.1.21
but crpvm04 lan1 10.20.1.21 did not receive from crpvm03 lan0 10.20.1.22.
This could be due to heavy network traffic, or heavy load or routing configuration on crpvm03.
Non-uniform connections detected,
crpvm03 lan0 10.20.1.22 successfully received from crpvm02 lan1 10.20.0.21
but crpvm02 lan1 10.20.0.21 did not receive from crpvm03 lan0 10.20.1.22.
This could be due to heavy network traffic, or heavy load or routing configuration on crpvm03.
Non-uniform connections detected,
crpvm03 lan0 10.20.1.22 successfully received from crpvm01 lan1 10.20.0.20
but crpvm01 lan1 10.20.0.20 did not receive from crpvm03 lan0 10.20.1.22.
This could be due to heavy network traffic, or heavy load or routing configuration on crpvm03.
Non-uniform connections detected,
crpvm04 lan0 10.20.1.23 successfully received from crpvm03 lan1 10.20.1.20
but crpvm03 lan1 10.20.1.20 did not receive from crpvm04 lan0 10.20.1.23.
This could be due to heavy network traffic, or heavy load or routing configuration on crpvm04.
Non-uniform connections detected,
crpvm04 lan0 10.20.1.23 successfully received from crpvm02 lan1 10.20.0.21
but crpvm02 lan1 10.20.0.21 did not receive from crpvm04 lan0 10.20.1.23.
This could be due to heavy network traffic, or heavy load or routing configuration on crpvm04.
Non-uniform connections detected,
crpvm04 lan0 10.20.1.23 successfully received from crpvm01 lan1 10.20.0.20
but crpvm01 lan1 10.20.0.20 did not receive from crpvm04 lan0 10.20.1.23.
This could be due to heavy network traffic, or heavy load or routing configuration on crpvm04.
lan4 on node crpvm01 cannot be configured in the cluster
because it does not have an IP address, and it is not a standby lan for any other lan.
lan4 on node crpvm02 cannot be configured in the cluster
because it does not have an IP address, and it is not a standby lan for any other lan.
lan4 on node crpvm03 cannot be configured in the cluster
because it does not have an IP address, and it is not a standby lan for any other lan.
lan4 on node crpvm04 cannot be configured in the cluster
because it does not have an IP address, and it is not a standby lan for any other lan.
Failed to evaluate network
Node crpvm01 did not receive an ICMP REPLY message on 10.20.0.20
from the polling target 10.20.0.1
Node crpvm02 did not receive an ICMP REPLY message on 10.20.0.21
from the polling target 10.20.0.1
Node crpvm03 did not receive an ICMP REPLY message on 10.20.1.20
from the polling target 10.20.1.1
Node crpvm04 did not receive an ICMP REPLY message on 10.20.1.21
from the polling target 10.20.1.1
Failed to evaluate polling targets
cmcheckconf: Unable to reconcile configuration file cmclconfig.ascii
with discovered configuration information.
4 REPLIES 4
Matti_Kurkela
Honored Contributor

Re: Failed to evaluate network

Looks like all your lan1s can connect to all lan0s, but not vice versa. Also, all your lan1s seem to have problems reaching their respective polling targets...

Too bad you did not fully describe your network configuration. Especially the netmasks would have been important to know.

Please run "netstat -rnv" on each node and pipe or copy/paste the outputs to a text file. Then attach the file to this thread.

A physical network diagram that indicates all switches, gateways, firewalls etc. on each route between the nodes would also be very useful, if you're allowed to post one.

----guesswork-based deduction beyond this line ----

The polling targets in the "did not receive an ICMP REPLY" error messages look suspiciously like gateway addresses. If 10.20.0.1 and 10.20.1.1 are your gateways in the respective 10.20.0.x and 10.20.1.x network segments, I think your netmasks must be /27 or wider (= 27 or less "1" bits in the netmask).

If that is true, you seem to have either connected two separate NICs of each node to the same network segment, or chosen a very misleading IP addressing scheme.

Connecting two separate physical HP-UX NICs of one host to the same network segment is usually not recommended: it tends to cause asymmetric routing, which may trigger security features in your switches or routers, or otherwise behave in ways you might not expect.

The recommended solutions would usually be:

- if you only need two (or more) IP addresses, use only one physical NIC and assign the extra IP address(es) to it as IP aliases (lan0:1, lan0:2 etc.)

- if you need fault-tolerance and/or more bandwidth, use APA to join the necessary number of physical NICs into a single aggregate interface, then assign the extra IPs onto the aggregate as IP aliases.

If you really must use separate (non-aggregated) NICs in the same IP network segment, you *can* do it using the ndd parameter "ip_strong_es_model" and routes with the "source" option... but in my opinion, that makes your network configuration more complex and thus more prone to mistakes. With HA clusters, you normally want to Keep It Simple as much as possible.

This document may be useful in understanding HP-UX routing and ip_strong_es_model. (Note that on page 16 it suggests there might be a problem in using ip_strong_es_model with Metrocluster.)
http://www.mayoxide.com/presentations/Understanding_hpux_routing.pdf

MK
MK
Aliben Mendoza
Occasional Contributor

Re: Failed to evaluate network

Hi MK,

The network configuration that you are using this in the document. "Technical Considerations for a Serviceguard Cluster that Spans Multiple IP Subnets", page 7, Figure 3.

Matti_Kurkela
Honored Contributor

Re: Failed to evaluate network

A bit of googling told me you probably meant this whitepaper document, right?

http://bizsupport2.austin.hp.com/bc/docs/support/SupportManual/c02056225/c02056225.pdf

Note that in the Figure 3 of the whitepaper, each node has one NIC in the red network with 5.*.1.* addresses, and another in the blue network with 5.*.2.* addresses.

NodeA lanX 5.1.1.1 (red)
NodeA lanY 5.1.2.1 (blue)

NodeB lanX 5.1.1.2 (red)
NodeB lanY 5.1.2.2 (blue)

NodeC lanX 5.2.1.1 (red)
NodeC lanY 5.2.2.1 (blue)

NodeD lanX 5.2.1.2 (red)
NodeD lanY 5.2.2.2 (blue)

Based on the error messages you posted, your configuration looks like this:

crpvm01 lan0 10.20.0.22 ("zero")
crpvm01 lan1 10.20.0.20 ("zero")

crpvm02 lan0 10.20.0.23 ("zero")
crpvm02 lan1 10.20.0.21 ("zero")

crpvm03 lan0 10.20.1.22 ("one")
crpvm03 lan1 10.20.1.20 ("one")

crpvm04 lan0 10.20.1.23 ("one")
crpvm04 lan1 10.20.1.21 ("one")

In other words, if the physical cabling matches with the IP addresses, your crpvm01 and crpwm02 nodes have both NICs in the 10.20.0.* network, and crpvm03 and crpvm04 have both NICs in the 10.20.1.* network.

Since some traffic apparently can travel from your "zero" network to the "one" network and vice versa, there must be some sort of interconnect between the two networks. This is *not* required in the configuration described in the whitepaper.

Such an interconnect should not be directly harmful... but it might be a good idea to isolate the heartbeat network (the red subnets in the whitepaper) from all other traffic if possible.

Your attachment indicates all your netmasks are 255.255.255.0, or /24. This means you have only two separate network segments, not four as in Figure 3 of the whitepaper.

Your IP address configuration clearly does not match the configuration in Figure 3 of the whitepaper.

I do not have enough information to tell if your actual physical cabling matches the configuration in the whitepaper or not... but this should be something you (or someone else in your organization) can verify.

MK
MK
Aliben Mendoza
Occasional Contributor

Re: Failed to evaluate network

Annex output netstat:

crpvm01:/etc/cmcluster>netstat -rnv
Routing tables
Dest/Netmask Gateway Flags Refs Interface Pmtu
127.0.0.1/255.255.255.255 127.0.0.1 UH 0 lo0 32808
10.20.0.22/255.255.255.255 10.20.0.22 UH 0 lan0 32808
170.179.88.205/255.255.255.255 170.179.88.205 UH 0 lan1 32808
10.20.0.0/255.255.255.0 10.20.0.22 U 2 lan0 1500
170.179.88.0/255.255.255.0 170.179.88.205 U 2 lan1 1500
127.0.0.0/255.0.0.0 127.0.0.1 U 0 lo0 32808
default/0.0.0.0 170.179.88.1 UG 0 lan1 1500
default/0.0.0.0 10.20.0.1 UG 0 lan0 1500

crpvm02:/tmp/hp>netstat -rnv
Routing tables
Dest/Netmask Gateway Flags Refs Interface Pmtu
127.0.0.1/255.255.255.255 127.0.0.1 UH 0 lo0 32808
10.20.0.23/255.255.255.255 10.20.0.23 UH 0 lan0 32808
170.179.88.207/255.255.255.255 170.179.88.207 UH 0 lan1 32808
10.20.0.0/255.255.255.0 10.20.0.23 U 2 lan0 1500
170.179.88.0/255.255.255.0 170.179.88.207 U 2 lan1 1500
127.0.0.0/255.0.0.0 127.0.0.1 U 0 lo0 32808
default/0.0.0.0 170.179.88.1 UG 0 lan1 1500
default/0.0.0.0 10.20.0.1 UG 0 lan0 1500

crpvm03:/tmp/hp>netstat -rnv
Routing tables
Dest/Netmask Gateway Flags Refs Interface Pmtu
127.0.0.1/255.255.255.255 127.0.0.1 UH 0 lo0 32808
10.20.1.22/255.255.255.255 10.20.1.22 UH 0 lan0 32808
170.179.70.20/255.255.255.255 170.179.70.20 UH 0 lan1 32808
10.20.1.0/255.255.255.0 10.20.1.22 U 2 lan0 1500
170.179.70.0/255.255.255.0 170.179.70.20 U 2 lan1 1500
127.0.0.0/255.0.0.0 127.0.0.1 U 0 lo0 32808
default/0.0.0.0 170.179.70.1 UG 0 lan1 1500
default/0.0.0.0 10.20.1.1 UG 0 lan0 1500

crpvm04:/tmp/hp>netstat -rnv
Routing tables
Dest/Netmask Gateway Flags Refs Interface Pmtu
127.0.0.1/255.255.255.255 127.0.0.1 UH 0 lo0 32808
10.20.1.23/255.255.255.255 10.20.1.23 UH 0 lan0 32808
170.179.70.22/255.255.255.255 170.179.70.22 UH 0 lan1 32808
10.20.1.0/255.255.255.0 10.20.1.23 U 2 lan0 1500
170.179.70.0/255.255.255.0 170.179.70.22 U 2 lan1 1500
127.0.0.0/255.0.0.0 127.0.0.1 U 0 lo0 32808
default/0.0.0.0 170.179.70.1 UG 0 lan1 1500
default/0.0.0.0 10.20.1.1 UG 0 lan0 1500