- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- RHEL3AS u3 and SG cmrunnode failure
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-27-2005 03:23 AM
тАО01-27-2005 03:23 AM
RHEL3AS u3 and SG cmrunnode failure
Lets say that nodes hostA & hostB are halted. I start node hostA, it boots ok, but it won't start a one-node cluster. cmviewcl claims that status of cluster is 'unknown' and node hostA status is 'down' (node hostB status is unknown.) cmviewcl also reports that it can't talk to all nodes.
I start node hostB, it boots up ok, but when trying to join the cluster, it timeouts after ~10 minutes. AUTOSTART_CMCLD is 1 on $SGCONF/cmcluster.rc on both nodes. During that 10 minutes period, cmviewcl reports the status of the cluster as 'starting'.
In both nodes, the deadman module is loaded ok by OS before trying to start the cluster.
After that, I run cmruncl from command line and wohooo, cluster starts up ok.
Attachment contains cmviewcl outputs and syslog entries.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-27-2005 03:27 AM
тАО01-27-2005 03:27 AM
Re: RHEL3AS u3 and SG cmrunnode failure
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-27-2005 03:40 AM
тАО01-27-2005 03:40 AM
Re: RHEL3AS u3 and SG cmrunnode failure
The problem is SG might not like it.
Few things to remember(you may already know this).
SG is a High Availability system. You can not have a volume group activated on two nodes at the same time. You can't have a package running on two nodes at the same time.
Packages and volume groups pass back and forth from node to node when the node goes down.
My guess, based on information provided is that nodeb is trying to activate a volume group htat nodea has already activated.
Or: That the formation of the NIC bonding is confusing SG or is happening at the wrong point in the boot sequence.
I suggest you investigate these two possibilities. Please report back if you solve the issue without further assistance.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-27-2005 06:03 AM
тАО01-27-2005 06:03 AM
Re: RHEL3AS u3 and SG cmrunnode failure
And that works fine; if I change the AUTOSTART_CMCLD to 0 and boot the nodes, shared volume group gets scanned (to get the entry to /etc/lvmtab) but not activated on either node. And that's the right way.
I also have a lock LUN defined on MSA1000. That should decide which node activates the shared vg. And that works fine, i haven't seen any cases where both nodes are trying to concurrently activate the shared vg.
In fact, in RH you can activate the shared vg manually on nodeB even if it's activated on nodeA by SG. But as i can see, the lock LUN works ok so that which of nodes gets first the lock LUN, it starts the package and the other node stays standby. (Linux vgchange is missing that '-c' switch that you can use on HP-UX vgchange to mark each specified volume group as a member of the high availability cluster so that it can't be activated even manually eg. on nodeA if nodeB has already activated it)
Also before nodes are trying to form a cluster at bootup, SG reports on console that network verification is ok, so i don't think that this is bonding issue. (I'll check tomorrow if all bonds are up before SG tries to form the cluster but as far as i can remember, the bonds are up ok before SG commands)
And i think that this not a shared vg issue, because if nodeB is on halt when nodeA is trying to form a single-node cluster and activate that shared vg, there is definitely no possibility that nodeB has that vg activated.
Now it's 09:00PM here in Finland and I'd like to sleep a little bit....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-27-2005 06:30 PM
тАО01-27-2005 06:30 PM
Re: RHEL3AS u3 and SG cmrunnode failure
I believe you need to execute the CMRUNCL command (it may be either with -f or -n nodename).
As I understand it, SG doesn't want to start without all nodes. It is assumed that the user is starting all the nodes & will (therefore) know that the clsuter is not up and take appropriate action.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-27-2005 07:22 PM
тАО01-27-2005 07:22 PM
Re: RHEL3AS u3 and SG cmrunnode failure
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-27-2005 11:37 PM
тАО01-27-2005 11:37 PM
Re: RHEL3AS u3 and SG cmrunnode failure
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-01-2005 01:26 AM
тАО02-01-2005 01:26 AM
Re: RHEL3AS u3 and SG cmrunnode failure
Serviceguard REQUIRES 100% of the nodes configured in the cluster to be available at the time the cluster is trying to form to be able to work.
This is by design!
If you do wish to start the cluster after booting only one node, then you must manually intervene and do:
cmruncl -n
And bonding is spported wihin SG/Linux provided you do not use the load-balancing mode. It must be set to use faillover mode in the ifcfg-bond0 file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-04-2005 09:51 PM
тАО04-04-2005 09:51 PM
Re: RHEL3AS u3 and SG cmrunnode failure
Linux ServiceGuard Manual says:
"Although a cluster quorum of more than 50% is generally required,
exactly 50% of the previously running nodes may re-form as a new
cluster provided that the other 50% of the previously running nodes do
not also re-form. This is guaranteed by the use of a tie-breaker to choose
between the two equal-sized node groups, allowing one group to form the
cluster and forcing the other group to shut down. This tie-breaker is
known as a cluster lock. The cluster lock is implemented either by
means of a lock LUN or a quorum server. A cluster lock is required on
two-node clusters."
So I doubt that it wouldn't be possible to start a one-node cluster.
Here is again one simple test sequence that I tried:
- package on node AAA
- node BBB: shutdown -h now
- after node BBB has halted, node AAA:shutdown -r now
RESULT: cluster won't start up automatically as a single node cluster on node AAA.
Not even manually by command cmrunnode. Can be started by issuing command cmruncl -n AAA.
Node BBB joins to running cluster automatically ok after it's been powered up.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-04-2005 10:53 PM
тАО04-04-2005 10:53 PM
Re: RHEL3AS u3 and SG cmrunnode failure
The section you have quoted is ONLY applicable when the clsuter is running and you have a node(s) failing. This is completely different.
As stated, you reboot node A with node B unavailable, Node A tries to form a cluster automatically, but as the quorum of 100% (which is MANDATORY) is not met due to node B being down, this will fail after the default 10 minutes.
By then issuing a manual cmruncl -n nodeA, you will get your single node cluster riunning as you have seen, and nodeB will then join the running cluster once you reboot it.