Operating System - HP-UX
1833464 Members
2623 Online
110052 Solutions
New Discussion

Re: Turning off Quorum Server

 
James Bowman_3
Occasional Advisor

Turning off Quorum Server

Hi,

We need to physically relocate our ServiceGuard Quorum Server in the near future.

Will shutting it down affecting any of our running ServiceGuard clusters?

We had an issue recently where a Cluster tried to reform when the Quorum Service was off the air, the Cluster then shut itself down when it couldn't make contact. Is there any way to prevent this happening for the duration of the Server Move?

Any advice or assistance would be much appreciated,

Many Thanks
James
17 REPLIES 17
g3jza
Esteemed Contributor

Re: Turning off Quorum Server

Hi,
how long will take the outage of the quorum server? If it's going to take a few days, then you can make a new temporary QS within a few minutes (it can be installed on Linux also).

Since some version of SG ( 18+ or 19+ ), the reconfiguration of cluster quorum type can be done online. Or you can change the QS for lvm cluster lock disk for the time of movement.



James Bowman_3
Occasional Advisor

Re: Turning off Quorum Server

It's only due to be off the air for an hour.

We're not overly concerned with ServiceGuard not failing the Application over in the event of an issue, more so that the Quorum Service being unavailable might cause an issue itself.

I've been told that the SG Version we're running doesn't allow us to change the Quorum Service without taking down the Cluster which is unfortunate as we do have a second Quorum Server.

Thanks
James
g3jza
Esteemed Contributor

Re: Turning off Quorum Server

Only an hour should not be a problem, but you can never say what's gonna happen to the cluster during that one damn hour :) .

We had an issue in our environment once:
After the weekend outage, the cluster, althought got started after the outage, went down on Monday morning. The cause was simple:
one node of the 2node cluster was vPar and was being dynamically allocated / deallocated processors every morning . During this 'vparmodify' command, the node just 'freezed' for a few seconds and cmcld daemon was not running which caused a cluster reformation, and since the quorum server was down after the outage, the cluster couldn't reform :)

James Bowman_3
Occasional Advisor

Re: Turning off Quorum Server

Yeah, that's kind of our fear, that something will happen that would cause it to try and reform, and that without the Quorum Server present the Cluster would shut down.

Any ideas how to stop it trying to reform or are we stuck?

Thanks for the responses so far!

James
Michael Steele_2
Honored Contributor

Re: Turning off Quorum Server

Please describe your environment. You can have a 2 node cluster where the master is also the quorum server and you just failover. Are you describing a 3 node cluster with a 3rd node doing nothing? I mean, most quorum servers are doing something.
Support Fatherhood - Stop Family Law
g3jza
Esteemed Contributor

Re: Turning off Quorum Server

If you have 2 nodes in the cluster, you could probably also add another node into the cluster, which would be acting as Arbitrating node (doing just nothing). This decreases the risk of having equal or less than 50% of nodes that were running previously before some potential reformation occurs. And in this situation, the cluster lock, which is a quorum server in your case, is not going to be used.

But that's just a bit too much work to configure another node to a running cluster :) .
James Bowman_3
Occasional Advisor

Re: Turning off Quorum Server

Sorry, we've two Servers in the Cluster. The Quorum server is separate to the two, and also serves a different unrelated function.

We plan to shut it down and leave both Servers in the Cluster up and running.

And, naturally, we're hoping for a solution that wouldn't require an outage on the cluster to implement :)

Thanks again,
James
g3jza
Esteemed Contributor

Re: Turning off Quorum Server

You should be able to temporary add new node to the cluster online, acting as the arbitrator node. But as I was saying, the amount of work, if it is worth it.

Michael Steele_2
Honored Contributor

Re: Turning off Quorum Server

Need to know your version of MC/SG
Support Fatherhood - Stop Family Law
Michael Steele_2
Honored Contributor

Re: Turning off Quorum Server

Is there a reason why you can't fail everything over and then disable the failover?

If you create a single node cluster then you no longer have a quorum server.
Support Fatherhood - Stop Family Law
James Bowman_3
Occasional Advisor

Re: Turning off Quorum Server

We're running a few different versions across different clusters... One A.11.17.00, three A.11.18.00 and 4 A.11.20.00.01 Clusters.

The failover server in some cases wouldn't be as powerful (or might shut down less important services) so our preference would be stay as is.

If we removed the other node would it reform, or attempt to talk to the Quorum Server. Is that possible with the Cluster up and running?

Thanks
James

Re: Turning off Quorum Server

James,

So the quorum server is only contacted during a cluster reformation event, so as long as that doesn't happen whilst your quorum server is down, you are fine.

Are you changing the IP address of the quorum server? If not, then your clusters should just pick it up again when the quorum server comes back online - if that whole operation only takes an hour I would just go ahead and do it.

In a 2 node cluster there is no way to disable the quorum function, nor should there be, as a split brain situation would be much worse than just a cluster failing completely (it could result in corrupted data).

You could do as you suggested in your last post and shut down one of your cluster nodes in a controlled manner (using cmhaltnode). That would mean that you wouldn't have the remaining node TOC'ing itself, but even then if a failure took out that one remaining node the cluster would not come back up as it would be waiting for the other node to form the cluster.

You could of course re-instate a cluster lock disk, but as you have already indicated, you don't want to take a cluster outage.

HTH

Duncan

I am an HPE Employee
Accept or Kudo
Michael Steele_2
Honored Contributor

Re: Turning off Quorum Server

AGAIN - you avoid the problem of both nodes going doing when the quorum is pulled by failing over and disabling the failover switch.

CLUSTER STATUS
example up

NODE STATUS STATE
ftsys9 up running

PACKAGE STATUS STATE AUTO_RUN NODE
pkg1 up running enabled ftsys9
pkg2 up running enabled ftsys9

NODE STATUS STATE
ftsys10 up running


Do you see where AUTO_RUN is enabled / disabled?

http://h30499.www3.hp.com/t5/Serviceguard/Package-AUTO-RUN-disabled/m-p/2908130#M5083

Support Fatherhood - Stop Family Law
Stephen Doud
Honored Contributor

Re: Turning off Quorum Server

In the event of a loss of heartbeat communication with one or more nodes in the cluster, cluster reformation is performed. The reformation requires a 2-node cluster to seek the arbitration device (quorum server, lock VG etc).
If you can configure the system to avoid this heartbeat outage, the cluster will not need to contact the quorum server.
Check whether any shared networks on your cluster are configured to not pass heartbeat messages (cmgetconf | grep STATIONARY_IP). If any show up, change those references in the cluster ASCII configuration file to HEARTBEAT_IP and cmapplyconf the file (A.11.18 and earlier require the cluster be halted prior to the cmapplyconf).

Also, perhaps your clusters can afford a shared LUN to create a cluster lock VG on, and replace the quorum server functionality with a lock VG temporarily?
Michael Steele_2
Honored Contributor

Re: Turning off Quorum Server

Don't forget to assign points to all of the responses
Support Fatherhood - Stop Family Law
James Bowman_3
Occasional Advisor

Re: Turning off Quorum Server

Thanks for the responses.

After some discussion we've decided to live with the risk of a Cluster Reform while we move the Quorum Server.

We've two Quorum Servers, but due to the ServiceGuard versions we can't switch Quorum Servers with the cluster up - We've an action out to upgrade our older Clusters as soon as possible.

Failing over and disabling the Failover switch (as Michael recommended) was considered however the risk of a Reformation during the move wasn't considered great enough to warrant the associated outage.

Thanks again, I'll assign out points now,
James

James Bowman_3
Occasional Advisor

Re: Turning off Quorum Server

Closed - Thanks for the responses