Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Swap Out or Replace Failed Quorum Disk

SOLVED
Go to solution

Swap Out or Replace Failed Quorum Disk

I have a two node SCSI cluster running OpenVMS 7.2-1 with a quorum disk that logged some unrecoverable errors and went into MountVerify mounted status. We are planning to swap the device out with a new one, but was not sure how to proceed. I just got off the phone with HP and it does not seem like a straight forward process. I favor replacing the disk in the current location. I need help with the following questions:
Do I have to shutdown the cluster?
Can I hot swap out the device, swap in the new one, and initialize it without shutting down the systems?
How do I create a new QUORUM.DAT?
HP thinks I need to be concerned about the new replacement drive being good, so I suggested that I dismount a shadow member from another set, initialize it as QUORUM_DISK, hot swap it out and then use it to replace the bad quorum disk. They also think I need to shutdown the cluster to make this work. I have about 3 hours until HP is on site. Thanks, Susan
12 REPLIES
Ian Miller.
Honored Contributor
Solution

Re: Swap Out or Replace Failed Quorum Disk

QUORUM.DAT will be created for you.

The problem is that you can't dismount the quorum disk because the QUORUM.DAT is open so I think you have to shutdown the cluster.

Your suggested plan sounds reasonable to me.
____________________
Purely Personal Opinion

Re: Swap Out or Replace Failed Quorum Disk

Ian,

Thanks for the information and vote (no pun intended) of confidence. I will do the shutdown and let the file get created automatically.

Regards,

Susan
Arch_Muthiah
Honored Contributor

Re: Swap Out or Replace Failed Quorum Disk

Susan,

Cluster reboot is reqd to reset quorum disk to another disk device.

when a quorum disk is specified well, the QUORUM.DAT file will be created automatically when OpenVMS is booted without also needing the votes from the quorum disk.

Make sure the two-node VMScluster with a shared storage interconnect, typically each
node has one vote, and the quorum disk also has one vote. EXPECTED_VOTES
is set to three.


Archunan
Regards
Archie

Re: Swap Out or Replace Failed Quorum Disk

Archunan,

Thanks. I found the article for moving it to another disk, but what if I am trying to leave it on the same device by dismounting it and swapping out the bad device with a new good device. I am not changing the name, physical location or the device number. When I mount it, will the two nodes that are up create the QUORUM.DAT file at that point?...if this will even work at all. Right now the disk cannot be dismounted because a DIR command to the device at the onset of the problem is in RWAST. Right now I only have the two votes from the two nodes...QF_VOTES was NO and the Expected is 3, quorum and votes are both at 2 right now.

Regards,

Susan
Andy Bustamante
Honored Contributor

Re: Swap Out or Replace Failed Quorum Disk


I have to disagree with Archunan. In this situation, you will need to boot one node with enought votes (expected votes=1 or votes=3) available to create quorum.dat on the new disk. You'll need to inflate the votes on 1 node temporarily.

You should not copy QUORUM.DAT from one disk to another. If you are removing a "known good" disk from a shadow set, you can use

BACKUP/IMAGE failing_disk: new_disk:

and then relocate the new_disk to be the quorum disk. The replacement disk can be installed into the shadow set. This assumes the old disk will stay available long enough.

Traditionally this is addressed by inflating the votes on a node to ensure quorum, booting once to create QUORUM.DAT, restoring the default votes and returning to normal operation.


Andy
If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net

Re: Swap Out or Replace Failed Quorum Disk

Andy,

Thanks, but I have already lost my quorum disk and its votes. So, if I attempt to mess with one node, the other will hang. I cannot get the image backup, either. The controller had the device status as misconfigured and now the unit cannot be added without first getting a good device.

Is there are soft way to get the QUORUM.DAT file in place?

Regards,

Susan
Ian Miller.
Honored Contributor

Re: Swap Out or Replace Failed Quorum Disk

If you do what you said you will have a disk labelled QUORUM_DISK with no files.

You shutdown the cluster. Install this disk in the place of the failed disk.

You boot the first node. It waits as it has one vote and expected vote =3 and therefore quorum=2. You boot the second node and now the total number of votes=2 and the cluster forms.
(without the vote from the cluster disk).

At this point the quorum disk will get mounted and QUORUM.DAT is created.
____________________
Purely Personal Opinion
Arch_Muthiah
Honored Contributor

Re: Swap Out or Replace Failed Quorum Disk

Susan,

> When I mount it, will the two nodes that > are up create the QUORUM.DAT file at that > point?...if this will even work at all.

No susan, quorum file won't get created this way. I feel better if you prefer to shutdown the system as your dev is in RWAST status (can not be initilaized too) problem also will be solved this way.

Archunan

Regards
Archie

Re: Swap Out or Replace Failed Quorum Disk

We swapped in the new disk and shutdown both nodes. The first one hung at enabled automatic tape serving and waiting to join a cluster and so I booted the other and it hung just after Half Duplex with. Am I doing it wrong? Node1:

%PKD0, Copyright (c) 1998 IntraServer Technology Inc. PKW V2.1.21 ROM V2.0
%PKD0, SCSI Chip is SYM53C895, Operating mode is SE Ultra SCSI
%CNXMAN, Using remote access method for quorum disk
%SMP-I-SECMSG, CPU #01 message: P01>>>START
%SMP-I-CPUTRN, CPU #01 has joined the active set.
%VMScluster-I-LOADSECDB, loadin
g the cluster security database
%EIA0, Twisted-Pair mode set by console
%EIA0, Half Duplex 10BaseT connection selected
%EIB0, Twisted-Pair mode set by console
%EIB0, Half Duplex 10BaseT connection selected
%SYSINIT-I- waiting to form or join an OpenVMS Cluster
%TMSCPLOAD-I-CONFIGSCAN, enabled automatic tape serving

Node 2:
%PKE0, Copyright (c) 1998 IntraServer Technology Inc. PKW V2.1.21 ROM V2.0
%PKE0, SCSI Chip is SYM53C895, Operating mode is LVD Ultra2 SCSI
%CNXMAN, Using remote access method for quorum disk
%SMP-I-SECMSG, CPU #01 message: P01>>>START
%SMP-I-CPUTRN, CPU #01 has joined the active set.
%VMScluster-I-LOADSECDB, loadin
g the cluster security database
%EIA0, Twisted-Pair mode set by console
%EIA0, Half Duplex 10BaseT connection selected
%EIB0, Twisted-Pair mode set by console
%EIB0, Half Duplex 10BaseT connection selected

Re: Swap Out or Replace Failed Quorum Disk

Never mind...I am good. Miscommunication error occured and node2 was powered off without my knowledge.

Susan

Re: Swap Out or Replace Failed Quorum Disk

Okay. All fixed. I went with the your advise, of course, and did the shutdown. Without the one power issue, we would have done great. Actually, with the distance and the language barrier, we did great! I am in Arizona and the systems are in Japan! We still met our SLA for restoration time, so I wanted to thank you all very much for your support. I would have been lost without you all! Now I just need to figure out how to close this thread :)

Regards,

Susan

Re: Swap Out or Replace Failed Quorum Disk

Found it :)

Susan