- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Two node SG cluster with one network
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-29-2009 11:23 PM
01-29-2009 11:23 PM
I have a two node cluster running HP-UX 11.31 and SG 11.18.
Their connected with only one vlan, of course also transporting heartbeats.
I'm trying to test failover by unplugging the lan cable on the node running the package, and expect the package to go down and start on the second node.
The only thing happening is that the second node reboots.
Can anyone help me with this config ?
Regards
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-29-2009 11:33 PM
01-29-2009 11:33 PM
SolutionYou have not properly configured SG.
SG needs to have a separate network for heartbeat or it can not respond normally to network problems.
A hub between two non-primary NIC cards is enough.
The second node rebooting is called TOC, transfer of control. This is a normal response to loss of heartbeat.
The two nodes race for control of the lock device and the second node is losing this race and gets booted to avoid data corruption.
Take a look at the logs and you will see the response is normal. Your configuration is not robust and is unreliable by design.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-29-2009 11:33 PM
01-29-2009 11:33 PM
Re: Two node SG cluster with one network
What you are seeing indicates to me that you are using a cluster lock disk, and not a Quorum server, and hence this is often a normal scenario given only one heartbeat network connection.
The server that gets the cluster lock will stay up,and the other node will be forced to TOC.
I also guess that the cluster lock disk is in a VG that the package uses, and so as the node running the package has the VG activated, it has the faster access to the lock.
Consider using more than one network for a stansby or additional heartbeat, or use a QS rather than a cluster lock disk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-29-2009 11:35 PM
01-29-2009 11:35 PM
Re: Two node SG cluster with one network
i guess this is for there is no LAN failover configuration.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-29-2009 11:35 PM
01-29-2009 11:35 PM
Re: Two node SG cluster with one network
by telling you its not a supported configuration:
http://docs.hp.com/en/B3936-90122/ch02s02.html
How on earth do you expect this sort of configuration to work? With only one network connection and a 2 node cluster, if the network connection is broken, the other node doesn't "know" the state of the first node - I'm assuming you're using a cluster lock disk - so in this case you'll get a race for the cluster lock as the only way to detemine cluster membership. Unfortunately the node that *you* know is good, loses the race (of course the cluster nodes have no way of knowing who is good, or at least no way of knowing who is *better*)
THis sort of config can be made to work a little better if you use a quorum server on a third node somewhere instead of a cluster lock disk - that way, only a node with a surviving network connection can win a race for a cluster lock.
HTH
Duncan
I am an HPE Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-29-2009 11:38 PM
01-29-2009 11:38 PM
Re: Two node SG cluster with one network
Somehow I thought this would be a logical setup.
I will try to get a dedicated heartbest lan connected.
Thank you, all.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-29-2009 11:41 PM
01-29-2009 11:41 PM
Re: Two node SG cluster with one network
You already got stern warnings that your ServiceGuard configuration is not designed
according to best practices.
In fact, you run unsupported setup.
By the way, HP ACSL lab has created a tool
which can be used by HP Support and
Consulting to optimize the Node Timeout and
Heartbeat Interval values used in
Serviceguard clusters.
The HELM (Heartbeat Exchange Latency
Monitor) tool runs on HP-UX 11iv1
(11.11), 11iv2 (11.23), and 11iv3 (11.31),
and measures latency for the cluster nodes
(which might be caused by network delays or
heavy system loads) over a user defined
period of time. When the HELM run is
complete, the tool outputs the measured
latencies and based on these measurements,
suggests optimized values for the
NODE_TIMEOUT and HEARTBEAT_INTERVAL cluster
configuration parameters for both standard
Serviceguard clusters and clusters utilizing
the Serviceguard Extension for Faster
Failover product.
When I teach ServiceGuard (coincidentaly,
I am teaching HP H6487 course next week
here in Australia), I always mention HELM
too. Pity not many people are aware of it.
Cheers,
VK2COT
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-29-2009 11:42 PM
01-29-2009 11:42 PM
Re: Two node SG cluster with one network
the syslog cn be referred to as to confirm that the Node on which the package was running that only happens to be Clutser manager during the cluster reform.
in the action of taking the Heart-beat cable out of the Primary node, a cluster reformation occurs, in which the Active node on which the Cluster manager had been sitting earlier happens to become the master and the coordinator and so it gives TOC to the other node as it no more can receive the Hearbeat from the other node.
This is i think what should happen nomally.
can refer to the syslog of both the nodes for this event.
Regards
Sujit
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2009 04:35 PM
01-30-2009 04:35 PM
Re: Two node SG cluster with one network
So this is how it works. If a node cannot stay in the cluster he tries for the lock disk (if it is a 2 node cluster). The node that gets the lock disk forms a cluster and continues. The node that does not get the lock disk cannot form a cluster. The reason that he cannot shutdown the package is he cannot tell the other node that the package is shutdown before the other node starts it so the only way the failed node can make sure that he is not writing to the disk is to panic.
The node that formed the cluster knows he is the only one to survive but cannot know when the other node finished the stop script but because of the assumption that the other node paniced if it was still alive allows the surviving node to just start the package.
I hope this makes sense and helps