1847519 Members
3699 Online
110265 Solutions
New Discussion

Re: ServiceGuard Mistery

 
角谷寿哉
Advisor

ServiceGuard Mistery

I use HP-UX11.23 on HP Server and run Serviceguard(11.18) for only LAN.

config ...

NODE_NAME hogehoge
NETWORK_INTERFACE lan2
HEARTBEAT_IP 10.160.18.129
NETWORK_INTERFACE lan8
NETWORK_INTERFACE lan3
HEARTBEAT_IP 10.160.2.65
NETWORK_INTERFACE lan9
NETWORK_INTERFACE lan4
HEARTBEAT_IP 10.160.3.65
NETWORK_INTERFACE lan10
NETWORK_INTERFACE lan5
HEARTBEAT_IP 10.160.8.65
NETWORK_INTERFACE lan11

I try to lan failure test.

No.1 lan2 up -> down
lan2 switching to lan8 on syslog it's OK.
No.2 lan2 down -> up
no message on syslog why?
No.3 lan3 up -> down
lan3 switching to lan9 on syslog it's OK.
No.4 lan3 down -> up
no message on syslog why?
No.5 lan8 up -> down
lan8 switching to lan10 on syslog Hey boy!

I got panic.

Why lan2, lan3 not recovered?

Why lan8 switched to lan10?

Help me!! of cource normally lan is OK.
9 REPLIES 9
likid0
Honored Contributor

Re: ServiceGuard Mistery

The best place to check, network connectivity is in /var/adm/nettl.LOG000 file

You can read it with:

netfmt -f /var/adm/nettl.LOG000

Windows?, no thanks

Re: ServiceGuard Mistery

How are you doing the LAN failure test? physically pulling cables, or doing something with your switches?

What do you have the parameters NETWORK_FAILURE_DETECTION and NETWORK_AUTO_FAILBACK set to?

It would also be interesting to see the output of "cmquerycl -k -l net -c "

HTH

Duncan

I am an HPE Employee
Accept or Kudo
角谷寿哉
Advisor

Re: ServiceGuard Mistery

Hi Daniel

Thanks a lot for rapid comment.

I check
netfmt -f /var/adm/nettl.LOG000

and found LAN failure timestamp.

Its timestamp concurrent to my action of LAN cable pull down from up.

But I don't understand Switching of LAN.
角谷寿哉
Advisor

Re: ServiceGuard Mistery

Hi Duncan.

I physically pulled cables.

output of "cmquerycl -k -l net -c "

Node Names: hogehoge

Bridged networks(local node information only - full probing was not performed):

1 lan2 (hogehoge)
lan8 (hogehoge)
2 lan3 (hogehoge)
lan9 (hogehoge)
3 lan4 (hogehoge)
lan10 (hogehoge)
4 lan5 (hogehoge)
lan11 (hogehoge)

IP subnets:

IPv4:

10.160.18.0 lan2 (hogehoge)

10.160.2.0 lan3 (hogehoge)

10.160.3.0 lan3 (hogehoge)

10.160.8.0 lan4 (hogehoge)

IPv6:

Possible Heartbeat IPs:

10.160.18.0 10.160.18.129 (hogehoge)

10.160.2.0 10.160.2.65 (hogehoge)

10.160.3.0 10.160.3.65 (hogehoge)

10.160.8.0 10.160.8.65 (hogehoge)

Possible Cluster Lock Devices:


角谷寿哉
Advisor

Re: ServiceGuard Mistery

One more information.


NETWORK_FAILURE_DETECTION INOUT

NETWORK_AUTO_FAILBACK not set.

角谷寿哉
Advisor

Re: ServiceGuard Mistery

Evrybody

I'm leaving now.

After 14 hours, I'll be back.

thank you.
melvyn burnard
Honored Contributor

Re: ServiceGuard Mistery

Well this shoud be set to YES:
NETWORK_AUTO_FAILBACK not set.
Also, you do not supply specific of the network topology.
what does a cmquerycl output show?
or even beter, cmscancl
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Stephen Doud
Honored Contributor

Re: ServiceGuard Mistery

The lan failure testing method may be invalid. Please indicate how you are testing.

The fact that lan8 switched to lan10 may indicate that either the network configuration has changed after the cluster binary file was created, or your expectation of which NICs provide standby LAN failover is incorrect.
Use cmgetconf /etc/cmcluster/cluster.ascii and inspect cluster.ascii.
Each node section in the file also contains a list of comments identifying which standby NICs are associated with each primary NIC.
They should match your expectation.
You can use cmviewconf to check 'bridged network' identification of each NIC to show which NICS the original cmapplyconf saw were on the same physical networks.
角谷寿哉
Advisor

Re: ServiceGuard Mistery

Hi, melvyn, Stephen

I think NETWORK_AUTO_FAILBACK is supported from MC/SG 11.19 (now 11.18)

Now, I try to check network on local network.

See you lator.