Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

cluster error

 
shehan_1
Occasional Visitor

cluster error

HI
can anybody help regarding this issue.
im using openvms 7.3-2.with alpha ds10 servers,cluster with two nodes .when i reboot the cluster it giving me the following error and it wont load futher more its stucks.
"%susinit-i-waiting to form or join an openvms cluster" so could any body can help me
13 REPLIES 13
Kris Clippeleyr
Honored Contributor

Re: cluster error

shenan,
Welcome to the VMS forum.
A number of things can be wrong here.
The system parameters, VOTES, EXPECTED_VOTES, DISK_QUORUM, et alia can have the wrong value(s).
The file SYS$COMMON:[SYSEXE]CLUSTER_AUTHORIZE.DAT can be corrupt.
Someone my have changed the cluster number and/or password.
Furthermore, since only 2 nodes are involved here, the connectivity between the two can be an issue.
So, a lot of guesses, and no real answer yet.
Regards,
Kris (aka Qkcl)
I'm gonna hit the highway like a battering ram on a silver-black phantom bike...
shehan_1
Occasional Visitor

Re: cluster error

ok,the cluster wont boot up.i do i have to
install the os??
Wim Van den Wyngaert
Honored Contributor

Re: cluster error

How can you boot if you don't have the os installed ???

Or 1) you boot both members 2) you boot converstaional and set the votes to a high value 3) replace the quorum disk and retry.

I suspect that your quorum disk is dead and 1 node alone will not be able to form the cluster.

Wim
Wim
Willem Grooters
Honored Contributor

Re: cluster error

To form a cluster, you'll need to adjust VOTES and EXPECTED_VOTES; these two define wether a system will actually boot or wait until suficient quorum is available. If improperly set, you may encounter problems like this.

First of all: give it some time; there are a few system parameters involved, actually time-out values in this respect (Sorry, don't know them by heart).
In your case (assuming you have just these two nodes), either start one machine, when it get to this point, start the other. If all is properly dimensioned, both sustem would now contunue the boot sequence.
Or perform a conversational boot on one; as has been stated by others, you can change the parameters in that case.

There is a way to force quorum from the console: Do CTRL-P when the system is waiting, so you get to the console. Then enter

>>> dep sirr c
>>> cont
IPC> q
IPC> ^z

but this is VERY LOW LEVEL and I don't know if this will work in your situation.

In a two-node cluster, when you want either system continue to work while the other is down, you'll need an extra vote: a third machine, or a disk that is directly accessable by both nodes. It's NOT a requirement, but if missing, the cluster will freeze when one node is down. Except you fiddle with VOTES and EXPECTED_VOTES, but then you'll have one system prevail over the other; if that one fails, the other will freeze.
Willem Grooters
OpenVMS Developer & System Manager
shehan_1
Occasional Visitor

Re: cluster error

ipc>^z command not working.
Volker Halle
Honored Contributor

Re: cluster error

shehan,

the IPC interrupt does NOT recalulate quorum, if the node was not a cluster member before. It does NOT help, if the node wants to join a cluster and fails.

Consider to boot the system conversationally and check VOTES, EXPECTED_VOTES, QDSKVOTES

Consider to obtain help from an experienced OpenVMS system manager.

Volker.
Jon Pinkley
Honored Contributor

Re: cluster error

shehan,

Has this cluster worked in the past or are you in the process of creating a cluster for the first time? If it worked in the past, what has changed?

Have you at least scanned "Guidelines for OpenVMS Cluster Configurations"

http://h71000.www7.hp.com/doc/732FINAL/DOCUMENTATION/PDF/aa-q28lg-tk.PDF

and Chapter 8 of "HP OpenVMS System Manager's Manual, Volume 2: Tuning, Monitoring, and Complex Systems"?

ftp://ftp.hp.com/pub/openvms/doc/AA-PV5NH-TK.PDF

How are the DS10 systems connected to each other, and the system disk (if a shared disk) or disks (if each has different system disk).?

Can you find a representative configuration that is similar to yours in the "Guidelines for OpenVMS Cluster Configurations"?

Do you have a quorum disk? Do you know what a quorum disk is?

It is hard to know what your level of experience is since this is your first post.

Jon
it depends
shehan_1
Occasional Visitor

Re: cluster error

Hi jon
this system was configure by myself earlier.this is in our text bed.each DS10 are connected each other buy SCSI cables ,and it has a shared storage .yes there is a quorum disk.now the system wont go futher more
marsh_1
Honored Contributor

Re: cluster error

shehan,

do these systems have any other communication medium attached such as ethernet and what is your shared storage ? are your boot disks local or on the shared storage ?
Kris Clippeleyr
Honored Contributor

Re: cluster error

shenan,
Is the only connection between the systems formed by the SCSI-cables?
If so, and one node is up, then the other node will not join the cluster, unless you have for instance a LAN connection between the 2 nodes.
Regards,
Kris (aka Qkcl)
I'm gonna hit the highway like a battering ram on a silver-black phantom bike...
Willem Grooters
Honored Contributor

Re: cluster error

Shared SCSI means port allocation class. This is also covered by the CLUSTER_CONFIG procedure.

If set up properly, the console should be able to locate your disks. Including the quorum disk:

>>> SHO DEV

Booting a cluster node on it's own should work without network, as long as the rest of the requirements (VOTES,EXPECTED_VOTES, QDISKVOTES, DISK_QUORUM) are set properly.
In the log of your startup sequence, it should be mentioned that access to this disk has been attempted.
Booting a next node into the cluster _requires_ a network connection: cluster traffic will not travel over SCSI (AFAIK).

If you boot conversational, you would be able to examine the following sysgen parameters:

VAXCLUSTER
EXPECTED_VOTES
VOTES
DISK_QUORUM
QDSKVOTES

What are the values for these?
(SHOW /CLUSTER once booted. In SYSINIT> you may have to access them one by one)
Willem Grooters
OpenVMS Developer & System Manager
Andy Bustamante
Honored Contributor

Re: cluster error

>>>this system was configure by myself earlier.. . .yes there is a quorum disk.now the system wont go futher

Is this the first boot since you configured the systems? A quorum disk needs the file QUORUM.DAT. This is created automatically when you start a cluster with enough votes.

For the first boot, you need to inflate the votes so this file is created, typically with an conversational boot. Set votes on one node to 2 (assuming votes=1,expected votes=3) boot that node, reset votes=1, reboot.

Andy

If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
Hoff
Honored Contributor

Re: cluster error

Would both AlphaServer DS10 boxes be booting from the same system root on the same system disk on the same shared SCSI bus?

If so, check the SRM environment variables for the boot flag settings. Specifically, the BOOT_OSFLAGS boot root setting; the leftmost value in the pair.

If you're booting from the same disk using the same system root, you'd see exactly this symptom -- when OpenVMS prevents you from corrupting your disks.