- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- cluster reforming time
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2006 01:57 AM
05-16-2006 01:57 AM
I'm running SG 11.16 on HPUX 11i and I have a 2 node cluster.
I want to reduce cluster reforming time when server failure.
i hope it finish in 30 seconds.
Is it impossible..?
Thanks in advance
Rgds
zungwon
May 11 18:30:05 apollobk cmclconfd[5251]: Updated file /etc/cmcluster/cmclconfi.
May 11 18:33:31 apollobk cmcld: Communication to node apollo has been interruptd
May 11 18:33:31 apollobk cmcld: Node apollo may have died
May 11 18:33:31 apollobk cmcld: Attempting to form a new cluster
May 11 8:33:31 apollobk cmcld: Beginning standard election
May 11 18:33:32 apollobk cmclconfd[5251]: Updated file /var/adm/cmcluster/frdum.
May 11 18:33:44 apollobk cmcld: Obtaining Cluster Lock
May 11 18:33:45 apollobk cmcld: Turning off safety time protection since the clr
May 11 18:33:45 apollobk cmcld: may now consist of a single node. If Servicegud
May 11 18:33:45 apollobk cmcld: fails, this node will not automatically halt
May 11 18:33:45 apollobk cmcld: This will not affect the behavior of Package Fat
May 11 18:33:45 apollobk cmcld: or Service Failfast. If such a package or servi,
May 11 18:33:45 apollobk cmcld: safety timer will be re-enabled and this node l
May 11 18:33:45 apollobk cmcld: automatically halt.
May 11 18:35:00 apollobk cmcld: Link level address on network interface lan900 .
May 11 18:35:10 apollobk cmcld: Link level address on network interface lan900 .
May 11 18:35:49 apollobk cmcld: 1 nodes have formed a new cluster, sequence #2
May 11 18:35:49 apollobk cmcld: The new active cluster membership is: apollobk()
May 11 18:35:49 apollobk cmcld: One of the nodes is down.
May 11 18:35:49 apollobk cmcld: One or more packages may not be currently runni.
May 11 18:35:50 apollobk cmclconfd[5257]: Updated file /etc/cmcluster/cmclconfi.
May 11 18:35:50 apollobk cmclconfd[5257]: Updated file /etc/cmcluster/cmclconfi.
May 11 18:35:50 apollobk cmclconfd[5251]: Updated file /etc/cmcluster/cmclconfi.
May 11 18:36:34 apollobk cmcld: Link level address on network interface lan900 .
May 11 18:36:37 apollobk cmcld: Link level address on network interface lan900 .
May 11 18:36:52 apollobk su: + 2 root-tuxedo
May 11 18:39:11 apollobk cmcld: Link level address on network interface lan900 .
May 11 18:41:34 apollobk cmcld: New node apollo is joining the cluster
May 11 18:41:34 apollobk cmcld: Attempting to adjust cluster membership
May 11 18:39:09 apollobk cmcld: Link level address on network interface lan900 .
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2006 01:42 PM
05-16-2006 01:42 PM
Re: cluster reforming time
The cluster configuration file controls the timeout and reformation of the cluster.
Look for the following section. Your NODE_TIMEOUT maybe set to 30000000 (30 seconds)
I have reduced the cluster I manage to 5 seconds
HEARTBEAT_INTERVAL 1000000
NODE_TIMEOUT 30000000
Configuration/Reconfiguration Timing Parameters (microseconds).
AUTO_START_TIMEOUT 600000000
NETWORK_POLLING_INTERVAL 2000000
Thanks Darren
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2006 01:59 PM
05-16-2006 01:59 PM
Re: cluster reforming time
from the SG manual.
The cmquerycl command supplies default cluster timing parameters for HEARTBEAT_INTERVAL and NODE_TIMEOUT. Changing these parameters will directly impact the clusterâ s reformation and failover times. It is useful to modify these parameters if the cluster is reforming occasionally due to heavy system load or heavy network traffic.
The default value of 2 seconds for NODE_TIMEOUT leads to a best case failover time of 30 seconds. If NODE_TIMEOUT is changed to 10 seconds, which means that the cluster manager waits 5 times longer to timeout a node, the failover time is increased by 5, to approximately 150 seconds.
NODE_TIMEOUT must be at least 2*HEARTBEAT_INTERVAL. A good rule of thumb is to have at least two or three heartbeats within one NODE_TIMEOUT.
GOOD LUCK!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2006 09:28 PM
05-16-2006 09:28 PM
SolutionThe other parameters mention such as network polling interval DO NOT affact the failover time.
The other things which affect failover time are the cluster lock type with quorum server providing the fastest failover time and if standby lan cards are used. Having a standby lan increases the failover time since we have to factor in the time to allow a lan failover should the only remaining HB lan fail. Also the type of lans configured affects things if there is a standby since for example the lan failover of fddi is faster than with ethernet.
In order to minimise the failover times you should reduce the node timeout, heartbeat interval, use a quorum server and live without standby lans.
However, there is a cost to this since without standby lans you risk subnet outages after a single failure so your environment needs to tolerate this, and with a low node timeout you risk false failovers should you experience short hangs or network outages.
Doing this you should be able to get the failover time less than 30 seconds with a node timeout of 2 seconds and a heartbeat interval of 0.5 seconds.
With regards to changing the heartbeat interval and the rule of thumb previously given stating there should be at least 2 or 3 heartbeats per node timeout, I would not suggest having the heartbeat interval larger than 1 second. Since heartbeats are used to communicate between cluster nodes, increasing the interval can have adverse effects during cluster reformations and delays in other operations.
Lastly, if failover time is paramount, you should consider SGeFF (Serviceguard fast failover extension) which could give you a failover time as low as 6 seconds for the same configuration.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2006 09:35 PM
05-16-2006 09:35 PM
Re: cluster reforming time
So, for example, with a node timeout of 2, a hearbeat interval of 1, a GSC SCSI cluster lock and no standby lans you get a failover time of around 30 seconds. If you increase the node timeout from 2 to 10 the failover time increases not to 150 seconds but to around 120 seconds.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-17-2006 02:27 AM
05-17-2006 02:27 AM