Operating System - OpenVMS
1752795 Members
5965 Online
108789 Solutions
New Discussion юеВ

Re: Partation for OpenVMS Cluster

 
SOLVED
Go to solution
Song_Charles
Frequent Advisor

Partation for OpenVMS Cluster

On my customer side, 2 ES40 and one RA3000 form a SCSI interconnected Cluster, two member connected used the NIC. Yesterday someone poweroff the Ethnet Switch, and I found the Cluster Partition occured, 2 ES40 wasn't together, they could just see itself,
from oracle, one rollback segment was conflicted, we couldn't use the SQLPLUS.
my question:
when the cluster partition orccured, should one member forced crash down?

Charles
х╖еф╜ЬчЭАх╣╢ф║лхПЧчФЯц┤╗
18 REPLIES 18
Ian Miller.
Honored Contributor

Re: Partation for OpenVMS Cluster

do you mean the cluster hung or did both nodes continue?
Is there a quorum disk? How many votes per node?
____________________
Purely Personal Opinion
Volker Halle
Honored Contributor

Re: Partation for OpenVMS Cluster

Charles,

if you had a correctly configured 2-node cluster running, you cannot get a partitioned cluster by just interrupting the SCA network connection. Each node will time out the other node and remove it from the cluster after RECNXINTERVAL seconds. What happens then is depending on your VOTES and EXPECTED_VOTES setup. If both nodes have VOTES=1 and EXPECTED_VOTES=2, then both nodes will hang indefinitely...

The dangerous moment could come, if you HALT and BOOT one (or both) of the systems. If both of them would have VOTES=1 and EXPECTED_VOTES=1, the partitioned cluster will be created and your data on the shared disks is in real danger of getting corrupted.

Please describe your cluster config and provide the SYSGEN values for:

VOTES, EXPECTED_VOTES, DISK_QUORUM, QDSKVOTES

Volker.
comarow
Trusted Contributor

Re: Partation for OpenVMS Cluster

Charles, the entire concept of votes, quorum and expected votes will prevent this from happening.

If you have a two node cluster, give each node 1 votes, expected votes 3, and quorum disk 1 vote.

Then if a node is gone long enough, it will clue exit.
Song_Charles
Frequent Advisor

Re: Partation for OpenVMS Cluster

Hi,
to configuat 2 nodes cluster, I just using the "CLUSTER_CONFIG" utility,
assign:
- SCSI connection YES
- CLUSTER NUMBER & PASSWORD
- sys$sysdevice is used a quorum disk,
- SCSI PORT ALLOCATION & CLUSTER ALLOCATION
without:
BOOT SERVER and DISK SERVER.
then AUTOGEN and reboot.
on other customer, I found the crash one member, while NIC connection broken.
I will check the CLUSTER paramters.

T H A N K S
Charles
х╖еф╜ЬчЭАх╣╢ф║лхПЧчФЯц┤╗
comarow
Trusted Contributor

Re: Partation for OpenVMS Cluster

While you can make the changes directly in sysgen

$mcr sysgen
sysgen>use current
sysgen>set vote 1
sysgen>set expected 3
sysgen>set qdskvotes 1
sysgen>write current
$
Also, make sure the values for
disk_quorum
qdksvotes
vote

are set in modparams.dat. Otherwise, you will lose the values when you run autogen. Upgrades for example, will automatically execute autogen.

How do you know you are using the quorum disk?

Issue the command

$show cluster/continuous

then, while it's displaying the results type
add qf_vote

It will make a little box with a yes or no.
For that matter, so will add cluster
and then it will show the votes as well as the quorum disk vote.

There is no reason you should ever have a partitioned cluster. It will not happen if expected_votes = 3.

Bob
Eberhard Wacker
Valued Contributor

Re: Partation for OpenVMS Cluster

Hi Charles,
if I consider it right then the source of the problem is that you use the SYSTEM disk as the quorum disk. In this case there is no chance to prevent a cluster partitioning case.
A valid minimum configuration consists of 2 nodes and an extra quorum disk or 3 nodes with the 3rd one having the quorum disk functionality or 2 nodes without any quorum disk where 1 node is the primary one which is only allowed to live/keeep on running in case of connectivity problems.
Cheers,
EW
Volker Halle
Honored Contributor
Solution

Re: Partation for OpenVMS Cluster

Charles,

there is nothing wrong with using the system disk as the quorum disk in a 2-node cluster. But you cannot specify SYS$SYSDEVICE as the DISK_QUORUM name, you MUST specify a physical disk device name and you must specify the SAME name on all cluster members (which have direct access to that disk).


on other customer, I found the crash one member, while NIC connection broken


This sounds like a possible reason for the partitioned cluster. If the cluster parameters are NOT correctly set up for this system, this can cause the crashed system to form it's own cluster when rebooting after the crash. So it's very likely, that the parameters of this system are WRONG.

Please note, that there is also a strong advise to configure a separate (point-to-point) additional network link in this kind of configuration to prevent cluster connectivity problems, if the main network is disrupted.

Volker.

comarow
Trusted Contributor

Re: Partation for OpenVMS Cluster

I'm sorry but we do need to add the reason we don't recommend the quorom disk be the system disk. During heavy activity to the quorum disk, especially backup, you will not receive the acknowldgement messages in time. Then you will get lost connection to quorum disk messages. This is expected behavior.

There is a special problem if the quroum disk in on a san in a large cluster, it will respond before messages from the nodes on the network, and it is possible for a small part of the cluster to remain and the rest to clue exit.

See the artical
Overview And Concerns Of Quorum Disks In A Cluster
Song_Charles
Frequent Advisor

Re: Partation for OpenVMS Cluster

Hi,
- q_disk must be assign with the physical
name, change the sys$sysdevice to DKB0,
but the system disk for each member are
their local disk, maybe member 1 was
DKB0, but member 2 was DKC0, could I
using DKB0 and DKC0 assign to each member?

- on each ALPHA, there were 2 NIC, one for
TCPIP, other maybe for DECNET, which NIC
should be used as cluster connectivity?
or if I had the third NIC, what should be
setup for this NIC, IP or DECNET address
should be assigned to the third?

T H A N K S A L L
Charles
х╖еф╜ЬчЭАх╣╢ф║лхПЧчФЯц┤╗