1753361 Members
5588 Online
108792 Solutions
New Discussion юеВ

Re: Cluster crash

 
BENABBOU
Advisor

Cluster crash

Hi guys
I have tow clusters all the machines are in the same Npar.

Aug 9 04:26:34 SUP350 cmcld: lan1 failed
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.228.0 switching from lan1 to lan5
Aug 9 04:26:34 SUP350 cmcld: lan1 switching to lan5
Aug 9 04:26:34 SUP350 cmcld: lan3 failed
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.226.0 switching from lan3 to lan7
Aug 9 04:26:34 SUP350 cmcld: lan3 switching to lan7
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.228.0 switched from lan1 to lan5
Aug 9 04:26:34 SUP350 cmcld: lan5 failed
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.228.0 down
Aug 9 04:26:34 SUP350 cmcld: Local switch has occurred since net_id 0x2 was not found on subnet 10.31.228.0.
Aug 9 04:26:34 SUP350 cmcld: lan1 switched to lan5
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.226.0 switched from lan3 to lan7
Aug 9 04:26:34 SUP350 cmcld: lan7 failed
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.226.0 down
Aug 9 04:26:34 SUP350 cmcld: Local switch has occurred since net_id 0x2 was not found on subnet 10.31.228.0.
Aug 9 04:26:34 SUP350 above message repeats 4 times
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.228.0 in package MACluster is down.
Aug 9 04:26:34 SUP350 cmcld: lan3 switched to lan7
Aug 9 04:26:34 SUP350 cmcld: Executing '/etc/cmcluster/MACluster/MACluster_control.sh stop' for package MACluster, as service PKG*19201.
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.228.0 in package XACluster is down.
Aug 9 04:26:34 SUP350 cmcld: Executing '/etc/cmcluster/XACluster/XACluster_control.sh stop' for package XACluster, as service PKG*19202.
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.228.0 in package PKG1 is down.
Aug 9 04:26:34 SUP350 cmcld: Executing '/etc/cmcluster/PKG1/PKG1_control.sh stop' for package PKG1, as service PKG*19203.
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.228.0 in package PKG2 is down.
Aug 9 04:26:34 SUP350 cmcld: Executing '/etc/cmcluster/PKG2/PKG2_control.sh stop' for package PKG2, as service PKG*19204.
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.228.0 in package PKG3 is down.
Aug 9 04:26:34 SUP350 cmcld: Executing '/etc/cmcluster/PKG3/PKG3_control.sh stop' for package PKG3, as service PKG*19205.
Aug 9 04:26:34 SUP350 cmcld: All cluster monitoring LAN interfaces have failed
Aug 9 04:26:34 SUP350 CM-MACluster[11562]: cmhaltpkg XACluster

I want to know thr root cause to avoid this problems

Thanks for help.

7 REPLIES 7
Tingli
Esteemed Contributor

Re: Cluster crash

It seems to be a network issue. And the network connection between the two members are lost.
BENABBOU
Advisor

Re: Cluster crash

Perhaps but not sure.
i want to be sure that there's no hardware problem or soft in the system.
note that cluster is mono node.

thanks for help

Re: Cluster crash

Well lan1, lan5, lan3 and lan7 all failed at the same time - which suggests some sort of catastrophic networking failure (such as all the power to all your LAN swicthes failed -OR- you have all your LAN ports plugged into a single switch which failed - not a very sensible configuration)

HTH

Duncan

I am an HPE Employee
Accept or Kudo
BENABBOU
Advisor

Re: Cluster crash

The parc contains more than 100 machines connected to AN switchs.
frankly i don't have more details how machines are connected to network.


But i suspect Npar which contains the toww clusters.

Re: Cluster crash

The logs indicate LAN failure, but you think it's a nPar problem??

Can you explain your logic there?

Maybe look at more detail on what happened with the network by looking at the last 50 messages in the network logging binary file:

netfmt -t 50 -f /var/adm/nettl.LOG000

HTH

Duncan

I am an HPE Employee
Accept or Kudo
BENABBOU
Advisor

Re: Cluster crash

Mr Duncan
I suspect network equipement too, but i want to be sure that there's no problem in NPAr (LAn interfaces for eg) or in the OS (perhaps need patchs)
I forward my request to Network team and i wait their update.
Tingli
Esteemed Contributor

Re: Cluster crash

You can use other network commands such as ping to check the network.