- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Service guard Network
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-26-2004 02:32 AM
04-26-2004 02:32 AM
Service guard Network
I have a SG environment with three nodes and 3 lan interfaces in each node . Two lan interfaces are grouped in an HP-APA :
0/0/0/0 0x00306EC3B259 0 UP lan0 snap0 1 ETHER Yes 119
LinkAgg0 0x00306EF2B715 900 UP lan900 snap900 4 ETHER Yes 119
NODE_NAME Athos
NETWORK_INTERFACE lan0
HEARTBEAT_IP 15.15.15.33
NETWORK_INTERFACE lan900
HEARTBEAT_IP 174.1.51.33
Two questions:
- today I was making some tests taking out network cables from the lan cards . First I had a 3 node cluster and when I removed all network links from one node I got a 2 node cluster . In this situation I did the same , I removed all the network links from one node and as I hoped there was a failure in the communication and I got a single node cluster . However the node which took the cluster was the one which didn't have network connectivity at all . How is this possible ?? . I thought that when a node loses the network it makes a TOC. In this situation when I made cmviewcl , SG hadn't realized that all network cards were down..?????
- How can I make that one of my network cards is used just for heartbeat and not for data?? I want to transfer the cluster to another node in case the APA is down , independently the state of the other network card.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-26-2004 03:29 AM
04-26-2004 03:29 AM
Re: Service guard Network
As far as your issue - do you have a cluster lock disk? Did the cluster reform as a cluster of one? If yes, then the server you removed all network cables got th cluster lock....
Rgds...Geoff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-26-2004 03:42 AM
04-26-2004 03:42 AM
Re: Service guard Network
Apr 26 09:56:55 Athos cmcld: lan0 failed
Apr 26 09:56:55 Athos cmcld: Subnet 15.15.15.0 downApr 26 09:58:34 Athos cmcld: Timed out node Porthos. It may have failed.
Apr 26 09:58:34 Athos cmcld: Attempting to adjust cluster membership
Apr 26 09:58:35 Athos cmclconfd[1866]: Updated file /var/adm/cmcluster/frdump.cm
cld.9 for node Athos (length = 402537).
Apr 26 09:58:36 Athos cmcld: Link level address on network interface lan900 has
been updated from 0x00306ef2b719 to 0x000000000000.
Apr 26 09:58:37 Athos cmcld: Obtaining First Dual Cluster Lock
Apr 26 09:58:38 Athos cmcld: Obtaining Second Dual Cluster Lock
Apr 26 09:58:39 Athos cmcld: Turning off safety time protection since the cluster
Apr 26 09:58:39 Athos cmcld: may now consist of a single node. If ServiceGuard
Apr 26 09:58:39 Athos cmcld: fails, this node will not automatically halt
Apr 26 09:58:57 Athos cmcld: GS connection to 15.15.15.44 not responding, closing
It seems that the system knows that lan0 has failed but if you make a cmviewcl in this moment you can see the interface up . Communication is lost but as the interface is up for SG the node without network gets the cluster
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-26-2004 03:53 AM
04-26-2004 03:53 AM
Re: Service guard Network
How to make one of your NIC cards heartbeat only:
Get a hub, make sure it has reliable electrical power, plug into that. Use those IP addresses exclusively for SG heartbeat. Don't assign host names, don't run any data through there.
SG requires link level connectivity for a heartbeat. No routers allowed.
I would think based on the loss of heartbeat on a three node cluster, two of the nodes should have gone TOC if you removed all of the network cables. I guess you didn't do them all at the same time and heartbeat was maintained.
Perhaps a quorum server would be in order.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-26-2004 04:38 AM
04-26-2004 04:38 AM
Re: Service guard Network
http://docs.hp.com/cgi-bin/fsearch/framedisplay?top=/hpux/onlinedocs/J4240-90021/J4240-90021_top.html&con=/hpux/onlinedocs/J4240-90021/00/00/50-con.html&toc=/hpux/onlinedocs/J4240-90021/00/00/50-toc.html&searchterms=apa&queryid=20040426-103102
That is good news for me.
I still say it's the fact that the node that got the cluster lock
Apr 26 09:58:37 Athos cmcld: Obtaining First Dual Cluster Lock
Apr 26 09:58:38 Athos cmcld: Obtaining Second Dual Cluster Lock
is the reason the node stay's up.
Now as to why a cmviewcl -v says lan0 is up when there are no cables - not too sure...
What version of MC/SG are you running?
Rgds...Geoff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-26-2004 06:57 PM
04-26-2004 06:57 PM
Re: Service guard Network
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-27-2004 09:32 AM
04-27-2004 09:32 AM
Re: Service guard Network
Now you have a two node cluster, and thats always a special situation, as there's no-one else apart from each other to arbitrate with.... so you pull all the connections from 1 node - now neither node can talk to the other BUT they neither node can form a quorum (1/2 = 50% - not a quorum). Neither node has a quorum of less than 50%, so no TOC straight away and as neither node can talk to the other how can they *know* that the other node doesn't still have network access? (the failure could have been on a network component in-between both nodes). In this situation, simply forming a 1-node cluster just cos you can still see the network could lead to 2 1-node clusters and a split brain situation - that means corrupt data! So the only safe thing to do is use the cluster lock - in this situation that means going for the cluster lock disk. According to your posted logs, it looks like Athos got there before Porthos, and Porthos was TOC'd. If Porthos was the node with 'good' network connections that's just bad luck I'm afraid - you have simulated multiple points of failure after all.
Now on to those cluster lock disks - are you using two disk arrays in some sort of stretch cluster - there are only very special situations where you should use dual cluster lock disks - in many standard scenarios this can actually reduce availability. Review what the manual has to say about dual cluster lock disks here:
http://docs.hp.com/hpux/onlinedocs/B3936-90073/B3936-90073.html
See Chapter 3, the section on how the Cluster Manager works.
As suggested above a quorum server may work better for you - it would certainly prevent the situation you had when losing network connectivity in a 2 node cluster.
How can you be sure that a NIC is used for heartbeat only? Don't use it! By which I mean don't bind you application to that IP, and don't allow your clients to connect to it (by not advertising it in DNS or configuring it on the clients).
HTH
Duncan
I am an HPE Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-27-2004 06:33 PM
04-27-2004 06:33 PM
Re: Service guard Network
I hope HP Labs will solve this problem as soon as possible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-27-2004 06:41 PM
04-27-2004 06:41 PM
Re: Service guard Network
I had the same problem..The server which gets the cluster lock disk first forms the single node cluster ... But it didn't check whether the network is available or not ...
The server was in production now ... so we couldn't do much of the testing on that ...