- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Service guard lan switching
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-01-2004 01:58 AM
10-01-2004 01:58 AM
Service guard lan switching
These days I have been testing my two node cluster with ServiceGuard 11.15 . In this node I have 2 fibre network cards , configured with APA , which I am not using for the cluster at all because I do not have fibre connectivity. Apart from this I have a 100 MB/s lan which is the one I use for the hearbeat and packages .
Athos:/opt/sgmgr/bin> lanscan
Hardware Station Crd Hdw Net-Interface NM MAC HP-DLPI DLPI
Path Address In# State NamePPA ID Type Support Mjr#
0/0/0/0 0x00306EC3B263 0 UP lan0 snap0 1 ETHER Yes 119
0/10/0/0 0x00306EF2B72A 1 UP lan1 snap1 2 ETHER Yes 119
0/12/0/0 0x00306EF2B719 2 UP lan2 snap2 3 ETHER Yes 119
LinkAgg0 0x000000000000 900 DOWN lan900 snap900 4 ETHER Yes 119
LinkAgg1 0x000000000000 901 DOWN lan901 snap901 5 ETHER Yes 119
LinkAgg2 0x000000000000 902 DOWN lan902 snap902 6 ETHER Yes 119
LinkAgg3 0x000000000000 903 DOWN lan903 snap903 7 ETHER Yes 119
LinkAgg4 0x000000000000 904 DOWN lan904 snap904 8 ETHER Yes 119
Athos:/opt/sgmgr/bin> ifconfig lan0
lan0: flags=843
inet 174.1.10.13 netmask ffffff00 broadcast 174.1.10.255
Athos:/opt/sgmgr/bin> ifconfig lan1
ifconfig: no such interface
Athos:/opt/sgmgr/bin> ifconfig lan2
ifconfig: no such interface
Athos:/opt/sgmgr/bin> ifconfig lan900
lan900: flags=1843
inet 174.1.51.33 netmask ffffff00 broadcast 174.1.51.255
This is the cluster configuration.
NODE_NAME Athos
NETWORK_INTERFACE lan0
HEARTBEAT_IP 174.1.10.13
FIRST_CLUSTER_LOCK_PV /dev/dsk/c12t0d1
SECOND_CLUSTER_LOCK_PV /dev/dsk/c9t0d2
NODE_NAME Porthos
NETWORK_INTERFACE lan0
HEARTBEAT_IP 174.1.10.14
FIRST_CLUSTER_LOCK_PV /dev/dsk/c8t0d1
SECOND_CLUSTER_LOCK_PV /dev/dsk/c14t0d2
I have configured also two packages with the IPs 174.1.10.15 and 174.1.10.16 . Both packages monitor the SUBNET 174.1.10.0..
This is the test scenario:
- Two packages running in the same node , Athos, and this node connected just by a sinlge lan to a second node , Porthos.
- Then , I take out the lan cable from the Athos NIC and athos losses every network connecvity .
- Then a single node is made in Athos , which do not have network , and Porthos makes a TOC
Is normal this behauviour? In my opinion as Athos do not have network at all should stop both packages and later make a TOC . Both packages should then be started in Porthos..
In include more information...
NODE_NAME Athos
NETWORK_INTERFACE lan0
HEARTBEAT_IP 174.1.10.13
# NETWORK_INTERFACE lan900
# HEARTBEAT_IP 174.1.51.33
FIRST_CLUSTER_LOCK_PV /dev/dsk/c12t0d1
SECOND_CLUSTER_LOCK_PV /dev/dsk/c9t0d2
# List of serial device file names
# For example:
# SERIAL_DEVICE_FILE /dev/tty0p0
# Warning: There are no standby network interfaces for lan0.
# Link Aggregate lan900 contains the following port(s): lan2
# Warning: There are no standby network interfaces for lan900.
#NODE_NAME dartanan
# NETWORK_INTERFACE lan0
# HEARTBEAT_IP 174.1.10.11
# NETWORK_INTERFACE lan900
# HEARTBEAT_IP 174.1.51.11
# FIRST_CLUSTER_LOCK_PV /dev/dsk/c29t0d1
# SECOND_CLUSTER_LOCK_PV /dev/dsk/c33t0d2
# List of serial device file names
# For example:
# SERIAL_DEVICE_FILE /dev/tty0p0
# Warning: There are no standby network interfaces for lan0.
# Link Aggregate lan900 contains the following port(s): lan1,lan2
# Warning: There are no standby network interfaces for lan900.
NODE_NAME Porthos
NETWORK_INTERFACE lan0
HEARTBEAT_IP 174.1.10.14
# NETWORK_INTERFACE lan900
# HEARTBEAT_IP 174.1.51.44
FIRST_CLUSTER_LOCK_PV /dev/dsk/c8t0d1
SECOND_CLUSTER_LOCK_PV /dev/dsk/c14t0d2
# List of serial device file names
# For example:
# SERIAL_DEVICE_FILE /dev/tty0p0
ATHOS
Oct 1 12:48:43 Athos cmcld: Timed out node Porthos. It may have failed.
Oct 1 12:46:22 Athos nmbd[28631]: find_response_record: response packet id 26898 received with no matching record.
Oct 1 12:48:43 Athos cmcld: Attempting to adjust cluster membership
Oct 1 12:48:44 Athos cmclconfd[21091]: Updated file /var/adm/cmcluster/frdump.cmcld.2 for node Athos (length = 123862).
Oct 1 12:48:44 Athos cmcld: lan0 failed
Oct 1 12:46:22 Athos nmbd[28631]: [2004/10/01 12:46:22, 0] nmbd/nmbd_responserecordsdb.c:(234)
Oct 1 12:48:44 Athos above message repeats 4 times
Oct 1 12:48:44 Athos cmcld: Subnet 174.1.10.0 down
Oct 1 12:46:22 Athos nmbd[28631]: find_response_record: response packet id 26899 received with no matching record.
Oct 1 12:48:44 Athos above message repeats 2 times
Oct 1 12:48:44 Athos cmcld: Subnet 174.1.10.0 in package pkg-oracle is down.
Oct 1 12:48:44 Athos cmcld: Executing '/etc/cmcluster/pkg-oracle/pkg-oracle.cntl stop' for package pkg-oracle, as service PKG*10241.
Oct 1 12:48:44 Athos cmcld: Subnet 174.1.10.0 in package pkg-bhs is down.
Oct 1 12:48:44 Athos cmcld: Executing '/etc/cmcluster/pkg-bhs/pkg-bhs.cntl stop' for package pkg-bhs, as service PKG*14082.
Oct 1 12:48:44 Athos cmcld: All cluster monitoring LAN interfaces have failed
Oct 1 12:48:45 Athos CM-pkg-oracle[21656]: cmhaltserv oracle-monitor
Oct 1 12:48:45 Athos su: + tty?? root-mad
Oct 1 12:48:46 Athos cmcld: Obtaining First Dual Cluster Lock
Oct 1 12:48:47 Athos cmcld: Obtaining Second Dual Cluster Lock
Oct 1 12:48:48 Athos cmcld: Turning off safety time protection since the cluster
Oct 1 12:48:46 Athos su: + tty?? root-mad
Oct 1 12:48:48 Athos cmcld: may now consist of a single node. If ServiceGuard
Oct 1 12:48:48 Athos cmcld: fails, this node will not automatically halt
Oct 1 12:49:49 Athos cmcld: 1 nodes have formed a new cluster, sequence #2
Oct 1 12:49:49 Athos cmcld: The new active cluster membership is: Athos(id=1)
Oct 1 12:49:49 Athos cmcld: Package pkg-oracle cannot run on this node because subnet 174.1.10.0 is not up
Oct 1 12:49:49 Athos cmcld: Package pkg-bhs cannot run on this node because subnet 174.1.10.0 is not up
Oct 1 12:53:04 Athos su: + tty?? root-mad
Oct 1 12:53:05 Athos CM-pkg-oracle[22098]: cmmodnet -r -i 174.1.10.15 174.1.10.0
Oct 1 12:53:06 Athos cmcld: Service scsupv terminated due to an exit(1).
Oct 1 12:53:06 Athos LVM[22129]: vgchange -a n vg01
Oct 1 12:53:06 Athos LVM[22137]: vgchange -a n vg04
Oct 1 12:53:06 Athos cmcld: Service PKG*10241 terminated due to an exit(0).
Oct 1 12:53:06 Athos cmcld: Halted package pkg-oracle on node Athos.
Oct 1 12:53:06 Athos cmcld: Package pkg-oracle cannot run on this node because subnet 174.1.10.0 is not up
Oct 1 12:53:09 Athos CM-pkg-bhs[22145]: cmhaltserv scsupv
PORTHOS
Oct 1 12:48:44 Porthos cmcld: Timed out node Athos. It may have failed.
Oct 1 12:48:44 Porthos cmcld: Attempting to form a new cluster
Oct 1 12:48:45 Porthos cmclconfd[4668]: Updated file /var/adm/cmcluster/frdump.cmcld.7 for node Porthos (length = 80688).
Oct 1 12:48:48 Porthos cmcld: Obtaining First Dual Cluster Lock
Oct 1 12:48:49 Porthos cmcld: First Cluster lock was denied. Lock was obtained by another node.
Oct 1 12:48:52 Porthos inetd[4851]: registrar/tcp: Connection from Porthos (174.1.10.14) at Fri Oct 1 12:48:52 2004
Oct 1 12:48:52 Porthos cmcld: Cluster lock has been denied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-01-2004 02:01 AM
10-01-2004 02:01 AM
Re: Service guard lan switching
In Service guard, in a two node cluster situation if there is network loss between two nodes, whichever node is able to get hold of the cluster lock disk first will stay up and the other node will do a TOC. It is very difficult to predict which node will do a TOC. It all depends on which node is able to grab the cluster lock disk first.
Hope this helps.
Regds
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-01-2004 02:05 AM
10-01-2004 02:05 AM
Re: Service guard lan switching
thanks for the answer but I do not agree with you . When a node from a service guard cluster has lost its LAN link it shouldn't go on working with the cluster servcice...
I have read it a long time ago in previous releases
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-01-2004 02:05 AM
10-01-2004 02:05 AM
Re: Service guard lan switching
Yes. That's the normal behaviour. In your case serviceguard treats it NOT really a 'network failure' but a 'heartbeat failure' as you configured the LANs as heartbeat LANs.
In such situation whichever the node acquires the cluster lock stays and the other gets TOC'ed.
Try configuring a second private heartbeat and try the same test.
-Sri
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-01-2004 02:08 AM
10-01-2004 02:08 AM
Re: Service guard lan switching
So what you are seeing is exactly right. PAckages running on athos, it notices a network problems, and keeps the packages. The other node does a TOC.
I am sure that if you start the packages on other node and pull out the cable, athos will do a TOC and other node will form single node cluster and keep packages running on it (is it owns the disk lock).
Anil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-01-2004 02:12 AM
10-01-2004 02:12 AM
Re: Service guard lan switching
By removing the cable from Athis, you have severed all communication (as far as SG is concerned) between the nodes, and hence they will both go for the cluster lock disc.
WHoever gets the cluster lock disc first will stay up, and the other node will TOC.
This is arbitrary, although usually the node that is the cluster co-ordinator.
to prevent this, you need at least one further lan, either a standby for the primary, or another heartbeat lan, or preferably both.
SG has behaved as expected is the bottom line.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-01-2004 02:16 AM
10-01-2004 02:16 AM
Re: Service guard lan switching
all my life I had been thinking that when a node do not have network at all it was not possible for it to have the cluster running and made a TOC after sttopping the packages.
I rely on you ....
Thanks a lot for your help..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-01-2004 02:53 AM
10-01-2004 02:53 AM
Re: Service guard lan switching
"HEARTBEAT" is the key here. If you have a heartbeat net and a production net and if production net fails on one node where the package is running but still the heartbeat is present, then the package will failover. There won't be a TOC. So, I suggest you configure your lan0 as STATIONARY instead of heartbeat and another interface as heartbeat and if you pull out lan0, then the behaviour will be different.
-Sri
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-01-2004 03:24 AM
10-01-2004 03:24 AM
Re: Service guard lan switching
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-01-2004 03:41 AM
10-01-2004 03:41 AM
Re: Service guard lan switching
But yes, since you have only one lan card configured under MC-SG, if this lan card goes down, both the nodes think that the other node is down and go for the cluster lock disk.
Whosoever was able to acquire the cluster lock, will reform the cluster. In the other node, safety timer will expire and the node will be TOCed.
Here, you LAN is the single point of failure.
My suggestion would be to add one more lan card in the same subnet, configure that as standby to the primary lan card.
In this case, if you unplug lan0, both HEARTBEAT and DATA will fail over to the standby lan card.
If you unplug the secondary lan card too, then both the nodes think the other node is down and the node that manages to grab the cluster lock disk with reform the cluster.
I dont know if cross-over using fibre cards are possible :-).
But that is what I have. I have a dedicated cross-over ethernet private network from node1 to node2 that serves are HEARTBEAT LAN.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-01-2004 03:48 AM
10-01-2004 03:48 AM
Re: Service guard lan switching
That was intended to differentiate between a heartbeat failure and a simple data network failure. In the first case if *all* the heartbeats fail (his original issue), then the node that cannot acquire the lockdisk will TOC itself. If the heartbeat is there and if there is a network failure on the subnet monitored by the package on the node running the package, then the package will simply failover and there would be no TOC.
-Sri