Operating System - OpenVMS
1839242 Members
2507 Online
110137 Solutions
New Discussion

Re: Open VMS 7.3-2, a problem in the quorum disk

 
khalid al-temimy
Occasional Contributor

Open VMS 7.3-2, a problem in the quorum disk

2 nodes cluster, 2XAlpha DS25, MSA1000 by 2XHBA/server. the log:

 

 

%%%%%%%%%%%  OPCOM   5-SEP-2012 08:38:39.68  %%%%%%%%%%%    (from node SMIQ12 at  5-SEP-2012 08:38:13.09)

08:38:13.09 Node SMIQ12 (csid 00010002) timed-out operation to quorum disk

 

%%%%%%%%%%%  OPCOM   5-SEP-2012 08:38:39.68  %%%%%%%%%%%    (from node SMIQ12 at  5-SEP-2012 08:38:13.09)

08:38:13.09 Node SMIQ12 (csid 00010002) lost "connection" to quorum disk

 

%%%%%%%%%%%  OPCOM   5-SEP-2012 08:38:40.18  %%%%%%%%%%%

08:38:40.18 Node SMIQ11 (csid 00010001) timed-out operation to quorum disk

 

%%%%%%%%%%%  OPCOM   5-SEP-2012 08:38:40.18  %%%%%%%%%%%

08:38:40.18 Node SMIQ11 (csid 00010001) lost "connection" to quorum disk

 

%%%%%%%%%%%  OPCOM   5-SEP-2012 08:38:40.18  %%%%%%%%%%%

08:38:40.18 Node SMIQ11 (csid 00010001) proposed modification of quorum or quorum disk membership

 

%%%%%%%%%%%  OPCOM   5-SEP-2012 08:38:40.18  %%%%%%%%%%%

08:38:40.18 Node SMIQ11 (csid 00010001) completed VMScluster state transition

 

 

please, what's behind that?

thanks in advance.

2 REPLIES 2
B Claremont
Frequent Advisor

Re: Open VMS 7.3-2, a problem in the quorum disk

Looks like you lost the quorum disk and the cluster did a proper state transistion.  Is the actual disk drive still visible/available to the system?

www.MigrationSpecialties.com
John Gillings
Honored Contributor

Re: Open VMS 7.3-2, a problem in the quorum disk

Both your nodes are "Quorum disk watchers". That means they poll the quorum disk every QDSKINTERVAL seconds by sending a WRITE I/O. If enough I/Os time out, the quorum disk connection is declared "lost".

 

There are many possible reasons for this, the most common is a BACKUP involving the quorum disk or something common to it (adapter, bus, controller, etc...), saturating the resource and blocking the quorum disk polls. In that case the connection is typically reestablished. In your case the connection appears NOT to have been reestablished. The cluster has completed a state transition, kicking out the quorum disk. That suggest the disk is no longer visible to the nodes. Check the I/O paths to the disk and any physical hardware involved.

 

The cluster will continue to run, but if one of the nodes is lost, the remaining node will hang, waiting for more votes (which is presumably undesirable, since you have a quorum disk).

 

Recovery will depend on the results of the investigation as to what's happend to the disk.

A crucible of informative mistakes