HPE Community read-only access December 15, 2018
This is a maintenance upgrade. You will be able to read articles and posts, but not post or reply.
Hours:
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Partation for OpenVMS Cluster

 
SOLVED
Go to solution
Song, Charles
Frequent Advisor

Partation for OpenVMS Cluster

On my customer side, 2 ES40 and one RA3000 form a SCSI interconnected Cluster, two member connected used the NIC. Yesterday someone poweroff the Ethnet Switch, and I found the Cluster Partition occured, 2 ES40 wasn't together, they could just see itself,
from oracle, one rollback segment was conflicted, we couldn't use the SQLPLUS.
my question:
when the cluster partition orccured, should one member forced crash down?

Charles
工作着并享受生活
18 REPLIES
Ian Miller.
Honored Contributor

Re: Partation for OpenVMS Cluster

do you mean the cluster hung or did both nodes continue?
Is there a quorum disk? How many votes per node?
____________________
Purely Personal Opinion
Volker Halle
Honored Contributor

Re: Partation for OpenVMS Cluster

Charles,

if you had a correctly configured 2-node cluster running, you cannot get a partitioned cluster by just interrupting the SCA network connection. Each node will time out the other node and remove it from the cluster after RECNXINTERVAL seconds. What happens then is depending on your VOTES and EXPECTED_VOTES setup. If both nodes have VOTES=1 and EXPECTED_VOTES=2, then both nodes will hang indefinitely...

The dangerous moment could come, if you HALT and BOOT one (or both) of the systems. If both of them would have VOTES=1 and EXPECTED_VOTES=1, the partitioned cluster will be created and your data on the shared disks is in real danger of getting corrupted.

Please describe your cluster config and provide the SYSGEN values for:

VOTES, EXPECTED_VOTES, DISK_QUORUM, QDSKVOTES

Volker.
comarow
Trusted Contributor

Re: Partation for OpenVMS Cluster

Charles, the entire concept of votes, quorum and expected votes will prevent this from happening.

If you have a two node cluster, give each node 1 votes, expected votes 3, and quorum disk 1 vote.

Then if a node is gone long enough, it will clue exit.
Song, Charles
Frequent Advisor

Re: Partation for OpenVMS Cluster

Hi,
to configuat 2 nodes cluster, I just using the "CLUSTER_CONFIG" utility,
assign:
- SCSI connection YES
- CLUSTER NUMBER & PASSWORD
- sys$sysdevice is used a quorum disk,
- SCSI PORT ALLOCATION & CLUSTER ALLOCATION
without:
BOOT SERVER and DISK SERVER.
then AUTOGEN and reboot.
on other customer, I found the crash one member, while NIC connection broken.
I will check the CLUSTER paramters.

T H A N K S
Charles
工作着并享受生活
comarow
Trusted Contributor

Re: Partation for OpenVMS Cluster

While you can make the changes directly in sysgen

$mcr sysgen
sysgen>use current
sysgen>set vote 1
sysgen>set expected 3
sysgen>set qdskvotes 1
sysgen>write current
$
Also, make sure the values for
disk_quorum
qdksvotes
vote

are set in modparams.dat. Otherwise, you will lose the values when you run autogen. Upgrades for example, will automatically execute autogen.

How do you know you are using the quorum disk?

Issue the command

$show cluster/continuous

then, while it's displaying the results type
add qf_vote

It will make a little box with a yes or no.
For that matter, so will add cluster
and then it will show the votes as well as the quorum disk vote.

There is no reason you should ever have a partitioned cluster. It will not happen if expected_votes = 3.

Bob
Eberhard Wacker
Valued Contributor

Re: Partation for OpenVMS Cluster

Hi Charles,
if I consider it right then the source of the problem is that you use the SYSTEM disk as the quorum disk. In this case there is no chance to prevent a cluster partitioning case.
A valid minimum configuration consists of 2 nodes and an extra quorum disk or 3 nodes with the 3rd one having the quorum disk functionality or 2 nodes without any quorum disk where 1 node is the primary one which is only allowed to live/keeep on running in case of connectivity problems.
Cheers,
EW
Volker Halle
Honored Contributor
Solution

Re: Partation for OpenVMS Cluster

Charles,

there is nothing wrong with using the system disk as the quorum disk in a 2-node cluster. But you cannot specify SYS$SYSDEVICE as the DISK_QUORUM name, you MUST specify a physical disk device name and you must specify the SAME name on all cluster members (which have direct access to that disk).


on other customer, I found the crash one member, while NIC connection broken


This sounds like a possible reason for the partitioned cluster. If the cluster parameters are NOT correctly set up for this system, this can cause the crashed system to form it's own cluster when rebooting after the crash. So it's very likely, that the parameters of this system are WRONG.

Please note, that there is also a strong advise to configure a separate (point-to-point) additional network link in this kind of configuration to prevent cluster connectivity problems, if the main network is disrupted.

Volker.

comarow
Trusted Contributor

Re: Partation for OpenVMS Cluster

I'm sorry but we do need to add the reason we don't recommend the quorom disk be the system disk. During heavy activity to the quorum disk, especially backup, you will not receive the acknowldgement messages in time. Then you will get lost connection to quorum disk messages. This is expected behavior.

There is a special problem if the quroum disk in on a san in a large cluster, it will respond before messages from the nodes on the network, and it is possible for a small part of the cluster to remain and the rest to clue exit.

See the artical
Overview And Concerns Of Quorum Disks In A Cluster
Song, Charles
Frequent Advisor

Re: Partation for OpenVMS Cluster

Hi,
- q_disk must be assign with the physical
name, change the sys$sysdevice to DKB0,
but the system disk for each member are
their local disk, maybe member 1 was
DKB0, but member 2 was DKC0, could I
using DKB0 and DKC0 assign to each member?

- on each ALPHA, there were 2 NIC, one for
TCPIP, other maybe for DECNET, which NIC
should be used as cluster connectivity?
or if I had the third NIC, what should be
setup for this NIC, IP or DECNET address
should be assigned to the third?

T H A N K S A L L
Charles
工作着并享受生活
Bojan Nemec
Honored Contributor

Re: Partation for OpenVMS Cluster

Charles,

You must use a disk which is connected to both systems for quorum disk. So if you have two different local disks for system disks you could not use them for quorum disks.

Cluster will chose the best NIC for its communication. If one goes bad or slow down it will change to the second.

Bojan
Anton van Ruitenbeek
Trusted Contributor

Re: Partation for OpenVMS Cluster

Charles,

In this situation the only way to not get a split cluster is to add an exta NIC card dedicated for SCS, connection to each other without any switch/hub. (cross-ethernet cable). The nodes cannot be to far from each other because you are using SCSI. The reason is: The nodes always see the disks (using SCSI and RA3000) and SCS isn't going over SCSI. So this cannot be helped bij quorum disks or other options.
If you don't add the extra NIC card, the best option is to add 1 votes to one node and 1 to the other and get rid of the the quorum disk. In this case the cluster hangs until your network is working properly again (in this case: power on the switch).
Another option is to (still) get rid of the quorum disk, give one node 2 votes and the other 1 vote. So if the network disappears one node goes on, the other dies.

AvR
NL: Meten is weten, maar je moet weten hoe te meten! - UK: Measuremets is knowledge, but you need to know how to measure !
Jan van den Ende
Honored Contributor

Re: Partation for OpenVMS Cluster

Charles,

as I read this so far,
(ignore if I read wrong!!!)
you specify SYS$SYSDEVICE as the quorumdisk
_AND_
you are using different disks as the system disk for different systems.

THIS COMBINATION IS ____DANGEROUS____!!!

During normal operation, the different systems will __NOT__ see each others´ quorum disks, but they WILL each see their own.
THOSE are _NOT_ synchronised!
But, as long as the nodes see each other, the cluster wiil keep going, and synchronised.
Now, if the nodes loose connection, _EACH node will find that "_THE_" quorum disk is operational, and not seeing the other node eighther.
So, obviously, the other node is gone, and "I" (local node perspective) can continue.
But, if both nodes still can connect to (some of) the disks, _THAT_ access is allowed but uncoordinated.
THAT is the classical recipee for corrupted data, and THAT is why __ALL__ nodes __MUST__ specify __THE SAME__ __PHYSICAL__ disk!!

SYS$SYSDEVICE is a REAL danger!!

hth,

Proost.

Have one on me.

jp
Don't rust yours pelled jacker to fine doll missed aches.
comarow
Trusted Contributor

Re: Partation for OpenVMS Cluster

Charles, if you have an HP license I can send you some great papers on quorum disks.


You can send it prively to robert.comarow@hp.com
Please include your access number.

People often think they are using their quorum disk but aren't.

As the previous poster correctly pointed out, they must be the same disk. Access to the quorum disk will prevent partitioned clusters.

We recommend highly not to use a system disk, but a less used disk to prevent lost connection to quorum disk messages.

Adding qf_vote tp show cluster/cont will prove it. It will not use the quorum disk on the first boot however. I'll be glad to send you the white papers.


Proper use will prevent partitioned clusters. Partitioned clusters will corrupt our disks.
Song, Charles
Frequent Advisor

Re: Partation for OpenVMS Cluster

Hiï¼
- We use the extra NIC with the cross-
ethernet cable for SCS, for some reason,
if the cable was broken, which member
will be crashed? I think it is the best
way to connect the cable to Ethernet
switch or hub.
- I will change the quorum disk to the
share disk $1$dka0: on RA3000, with 1
vote on each member, is it right?

T H A N K S
Char
工作着并享受生活
Uwe Zessin
Honored Contributor

Re: Partation for OpenVMS Cluster

Charles,
if your server has two NICs, then you can VMScluster traffic on both of them.

You give both cluster members and the quorum disk one vote and set EXPECTED_VOTES=3. Bob Comarow has already explained this a few days ago, please scroll a bit up ;-)
.
comarow
Trusted Contributor

Re: Partation for OpenVMS Cluster

The crossover cable is a great strategy. It will help cluster performance. It it breaks, cluster performance will use the regular network.

Using the utility
$mcr scacp

you can set a higher priority for specific network card.

This was introduced in VMS 7.3

You whould be in 132 colum displays. A great place to start is
mcr scacp
scacp>show channel


You cean evel tell your cluster not to use a port for scs communication, if you have a busy network. That of course elimates failover. Thus setting higher priority for preferred paths is usually the best course of action.
SCAP will show the pathes and how well the channels are working. Don't be afraid of some errors, they are normal.

Bob
Jan van den Ende
Honored Contributor

Re: Partation for OpenVMS Cluster

Charles,

what several previous posts did simply imply for those who already know, but noone said so explicitly:

As long as the systems are "somehow" connected, cluster communication WILL continue.
Cluster communication will continue over ANY available path!

Of course, there may be performance degadation if you loose the high capacity path, and all you are left with is, eg, a 10Mb ethernet, but as long as there IS ANY connection, you WILL continue.

-- there ARE ways to exclude certain pathways, but that will have to be explicitly done, and you need very special circumstances for that to be advantaguous!

hth,

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Song, Charles
Frequent Advisor

Re: Partation for OpenVMS Cluster

Yes,

Cluster communication will continue over ANY available path, but it will degress the performance while the cross-ethernet cable was broken.

Very Thanks

Charles
工作着并享受生活