HPE Community read-only access December 15, 2018
This is a maintenance upgrade. You will be able to read articles and posts, but not post or reply.
Hours:
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

QF_VOTES = No (Can I fix this?)

 
PhilHowes
Advisor

QF_VOTES = No (Can I fix this?)

Hi,
We've got 2, two-node VMS clusters on our network. They usually belong on different systems, but due to some network config work yesterday which I won't go into - they managed to temporarily form a single cluster of 4 nodes. I've sorted this out now, and the clusters are seperate again, but its taken a while to sort of the resulting problems. My last issue is that one of my clusters now has a QF_VOTES value of 'NO'. It always used to be 'YES'. I presume the quorum disk has got into some sort of state from when it formed a 4 node clsuter. I've rebooted, but no joy. How can I make it go back to voting?
Thanks in advance.
Phil
11 REPLIES
Heinz W Genhart
Honored Contributor

Re: QF_VOTES = No (Can I fix this?)

Hi Phil

How the 2 node cluster came up?
How are the Sysgen parameters expected_votes, votes and qdskvotes?
Is the Sysgenparameter disk_quorum set correctly?

Normally you have to do a conversational boot of one machine, with expected_votes =1. Then after mounting the quorumdisk, the quorumfile should be created

Regards

geni
PhilHowes
Advisor

Re: QF_VOTES = No (Can I fix this?)

Thanks,
The values are:
expected_votes = 3
votes = 1
qdskvotes = 1
disk_quorum = $1$dga13

From your reply, am I right in thinking that I should boot a node to conversational mode, re-mount the quorum disk and this will re-create the files that it needs to operate as normal?
FYI - The quorum disk is mounted currently, it just does not seem to have a vote.
Robert Gezelter
Honored Contributor

Re: QF_VOTES = No (Can I fix this?)

Mark,

As a guard against future incidents, check the Cluster Group Numbers in both systems.

If the systems comprising both clusters had had different Cluster Group Numbers, they would not have attempted to fuse.

- Bob Gezelter, http://www.rlgsc.com
Steven Schweda
Honored Contributor

Re: QF_VOTES = No (Can I fix this?)

> [...] due to some network config work [...]

I thought that cluster group numbers existed
to allow separate clusters to coexist on a
network without annoying each other. I
wouldn't expect any change in network
connectivity to cause distinct (well
configured) clusters to merge.
PhilHowes
Advisor

Re: QF_VOTES = No (Can I fix this?)

Yes - it was annoying. I'd changed the cluster numbers in one cluster, but forgot to reboot it before I booted the second cluster. The changes hadn't taken effect so it formed one big cluster - until I managed to shutdown and restart the first cluster. Won't do that again.
Hoff
Honored Contributor

Re: QF_VOTES = No (Can I fix this?)

Presuming your storage hasn't become corrupted here, then you should be able to shut down, resolve the matter of the cluster group number collision, reboot, and be back on your way.

I'd reset the cluster group number and cluster password as the last step before shutting the cluster down. And because I'm professionally paranoid, I'd change the group number and password on both clusters to unique (new) values.

Barring cases where somebody has been in and has been adjusting the cluster system parameters, the quorum disk and quorum file should still be present and should be detected and accessed once the two clusters are rebooted.

An OpenVMS cluster very intentionally does not appreciate having two quorum disks present, and which is very likely at the core of the current issue.
PhilHowes
Advisor

Re: QF_VOTES = No (Can I fix this?)

Thanks for the advice. I've sorted the clustering issues now, using the methods you ahev all suggested but despite the fact that my quorum disk looks and smells like a quorum disk, it doesn't seem to be voting.

I was thinking that I could shutdown one node completely, then dismount the quorum disk from the remaining node and 'init' it. Then use cluster config to add it back in as the quorum disk and mount it. The shutdown node should pick it up as normal once its switched back on. This should sort out the oddness. Any thoughts?
Robert Gezelter
Honored Contributor

Re: QF_VOTES = No (Can I fix this?)

Phil,

A clarification on my earlier posting in this thread.

The OpenVMS cluster Group Number and Password are managed using SYSMAN (see CONFIGURATION [SET,SHOW] CLUSTER_AUTHORIZATION, etc. in the HELP text and documentation set).

Steve,

re: "... allow separate clusters to coexist on a network without annoying each other."

You are indeed correct. However, I have seen sites where a lack of coordination between different groups, and a forgotten LAN bridge made systems mutually visible to each other in an unanticipated fashion. Luckily, the cluster passwords did not match, so the annoyance was limited to annoying console messages.

- Bob Gezelter, http://www.rlgsc.com
Hoff
Honored Contributor

Re: QF_VOTES = No (Can I fix this?)

The underlying cluster configuration error was exposed by the networking folks, and that's fairly typical of these sorts of cluster errors. I've seen similar sorts of errors cause clusters with incorrect VOTES or EXPECTED_VOTES values trigger massive data corruptions.

If the core cluster attributes are not correct, then implement the corrections, and reboot the cluster configuration at your earliest convenience. Or sooner.

There are no "in-flight repairs" for certain classes of errors with core settings within an OpenVMS cluster.

Given you have some control over this cluster environment and can likely keep both hosts operating and can schedule the reboot for less-critical time, or - as a work-around for the quorum disk that is the concern here - bring a third (voting) node into the cluster and adjust and ensure that the running quorum here is 2.

After the cluster reboot, work through how this case happened, and what steps can be performed to avoid a reoccurrence of this or of similar errors and (far more important) also to avoid cases where these sorts of core cluster configuration errors can lead to truly massive disk data corruptions.
Hoff
Honored Contributor

Re: QF_VOTES = No (Can I fix this?)

> Luckily, the cluster passwords did not match, so the annoyance was limited to annoying console messages.

Typical practice here is twenty or more and completely random characters as the cluster password.

Literally mashing the alphanumeric keys as the input within SYSMAN.

Not bothering to record the string, either.

Then replicate CLUSTER_AUTHORIZE.DAT to all system disks in the cluster, for cases where more than one system disk is present.
Steven Schweda
Honored Contributor

Re: QF_VOTES = No (Can I fix this?)

> Not bothering to record the string, either.

I mastered that technique many years ago.
Perhaps not intentionally, but very reliably.

ALP $ dire /date sys$system:CLUSTER_AUTHORIZE.DAT

Directory SYS$COMMON:[SYSEXE]

CLUSTER_AUTHORIZE.DAT;1
27-JUN-1996 15:05:27.00