Open VMS 7.3-2, a problem in the quorum disk

khalid al-temimy · ‎09-06-2012

2 nodes cluster, 2XAlpha DS25, MSA1000 by 2XHBA/server. the log:

%%%%%%%%%%% OPCOM 5-SEP-2012 08:38:39.68 %%%%%%%%%%% (from node SMIQ12 at 5-SEP-2012 08:38:13.09)

08:38:13.09 Node SMIQ12 (csid 00010002) timed-out operation to quorum disk

%%%%%%%%%%% OPCOM 5-SEP-2012 08:38:39.68 %%%%%%%%%%% (from node SMIQ12 at 5-SEP-2012 08:38:13.09)

08:38:13.09 Node SMIQ12 (csid 00010002) lost "connection" to quorum disk

%%%%%%%%%%% OPCOM 5-SEP-2012 08:38:40.18 %%%%%%%%%%%

08:38:40.18 Node SMIQ11 (csid 00010001) timed-out operation to quorum disk

%%%%%%%%%%% OPCOM 5-SEP-2012 08:38:40.18 %%%%%%%%%%%

08:38:40.18 Node SMIQ11 (csid 00010001) lost "connection" to quorum disk

%%%%%%%%%%% OPCOM 5-SEP-2012 08:38:40.18 %%%%%%%%%%%

08:38:40.18 Node SMIQ11 (csid 00010001) proposed modification of quorum or quorum disk membership

%%%%%%%%%%% OPCOM 5-SEP-2012 08:38:40.18 %%%%%%%%%%%

08:38:40.18 Node SMIQ11 (csid 00010001) completed VMScluster state transition

please, what's behind that?

thanks in advance.

B Claremont · ‎09-06-2012

Looks like you lost the quorum disk and the cluster did a proper state transistion. Is the actual disk drive still visible/available to the system?

www.MigrationSpecialties.com

John Gillings · ‎09-06-2012

Both your nodes are "Quorum disk watchers". That means they poll the quorum disk every QDSKINTERVAL seconds by sending a WRITE I/O. If enough I/Os time out, the quorum disk connection is declared "lost".

There are many possible reasons for this, the most common is a BACKUP involving the quorum disk or something common to it (adapter, bus, controller, etc...), saturating the resource and blocking the quorum disk polls. In that case the connection is typically reestablished. In your case the connection appears NOT to have been reestablished. The cluster has completed a state transition, kicking out the quorum disk. That suggest the disk is no longer visible to the nodes. Check the I/O paths to the disk and any physical hardware involved.

The cluster will continue to run, but if one of the nodes is lost, the remaining node will hang, waiting for more votes (which is presumably undesirable, since you have a quorum disk).

Recovery will depend on the results of the investigation as to what's happend to the disk.

A crucible of informative mistakes

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Open VMS 7.3-2, a problem in the quorum disk

Open VMS 7.3-2, a problem in the quorum disk

Re: Open VMS 7.3-2, a problem in the quorum disk

Re: Open VMS 7.3-2, a problem in the quorum disk