Online Expert Day - HPE Data Storage - Live Now
April 24/25 - Online Expert Day - HPE Data Storage - Live Now
Read more
cancel
Showing results for 
Search instead for 
Did you mean: 

Cluster crash

BENABBOU
Advisor

Cluster crash

Hi guys
I have tow clusters all the machines are in the same Npar.

Aug 9 04:26:34 SUP350 cmcld: lan1 failed
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.228.0 switching from lan1 to lan5
Aug 9 04:26:34 SUP350 cmcld: lan1 switching to lan5
Aug 9 04:26:34 SUP350 cmcld: lan3 failed
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.226.0 switching from lan3 to lan7
Aug 9 04:26:34 SUP350 cmcld: lan3 switching to lan7
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.228.0 switched from lan1 to lan5
Aug 9 04:26:34 SUP350 cmcld: lan5 failed
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.228.0 down
Aug 9 04:26:34 SUP350 cmcld: Local switch has occurred since net_id 0x2 was not found on subnet 10.31.228.0.
Aug 9 04:26:34 SUP350 cmcld: lan1 switched to lan5
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.226.0 switched from lan3 to lan7
Aug 9 04:26:34 SUP350 cmcld: lan7 failed
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.226.0 down
Aug 9 04:26:34 SUP350 cmcld: Local switch has occurred since net_id 0x2 was not found on subnet 10.31.228.0.
Aug 9 04:26:34 SUP350 above message repeats 4 times
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.228.0 in package MACluster is down.
Aug 9 04:26:34 SUP350 cmcld: lan3 switched to lan7
Aug 9 04:26:34 SUP350 cmcld: Executing '/etc/cmcluster/MACluster/MACluster_control.sh stop' for package MACluster, as service PKG*19201.
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.228.0 in package XACluster is down.
Aug 9 04:26:34 SUP350 cmcld: Executing '/etc/cmcluster/XACluster/XACluster_control.sh stop' for package XACluster, as service PKG*19202.
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.228.0 in package PKG1 is down.
Aug 9 04:26:34 SUP350 cmcld: Executing '/etc/cmcluster/PKG1/PKG1_control.sh stop' for package PKG1, as service PKG*19203.
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.228.0 in package PKG2 is down.
Aug 9 04:26:34 SUP350 cmcld: Executing '/etc/cmcluster/PKG2/PKG2_control.sh stop' for package PKG2, as service PKG*19204.
Aug 9 04:26:34 SUP350 cmcld: Subnet 10.31.228.0 in package PKG3 is down.
Aug 9 04:26:34 SUP350 cmcld: Executing '/etc/cmcluster/PKG3/PKG3_control.sh stop' for package PKG3, as service PKG*19205.
Aug 9 04:26:34 SUP350 cmcld: All cluster monitoring LAN interfaces have failed
Aug 9 04:26:34 SUP350 CM-MACluster[11562]: cmhaltpkg XACluster

I want to know thr root cause to avoid this problems

Thanks for help.

7 REPLIES
Tingli
Esteemed Contributor

Re: Cluster crash

It seems to be a network issue. And the network connection between the two members are lost.
BENABBOU
Advisor

Re: Cluster crash

Perhaps but not sure.
i want to be sure that there's no hardware problem or soft in the system.
note that cluster is mono node.

thanks for help
Duncan Edmonstone
Honored Contributor

Re: Cluster crash

Well lan1, lan5, lan3 and lan7 all failed at the same time - which suggests some sort of catastrophic networking failure (such as all the power to all your LAN swicthes failed -OR- you have all your LAN ports plugged into a single switch which failed - not a very sensible configuration)

HTH

Duncan

HTH

Duncan
BENABBOU
Advisor

Re: Cluster crash

The parc contains more than 100 machines connected to AN switchs.
frankly i don't have more details how machines are connected to network.


But i suspect Npar which contains the toww clusters.
Duncan Edmonstone
Honored Contributor

Re: Cluster crash

The logs indicate LAN failure, but you think it's a nPar problem??

Can you explain your logic there?

Maybe look at more detail on what happened with the network by looking at the last 50 messages in the network logging binary file:

netfmt -t 50 -f /var/adm/nettl.LOG000

HTH

Duncan

HTH

Duncan
BENABBOU
Advisor

Re: Cluster crash

Mr Duncan
I suspect network equipement too, but i want to be sure that there's no problem in NPAr (LAn interfaces for eg) or in the OS (perhaps need patchs)
I forward my request to Network team and i wait their update.
Tingli
Esteemed Contributor

Re: Cluster crash

You can use other network commands such as ping to check the network.