Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

How to Get Disk Out of MntVerifyTimeout State

SOLVED
Go to solution
Jeffery D. Urmann
Regular Advisor

How to Get Disk Out of MntVerifyTimeout State

Hey gurus,

We have an OpenVMS v7.3-1 DS20 with two SWXCR RAID-0 sets that are mounted as follows...

$ Mount /System DSA8: /Shadow = ($1$DRA0:, $1$DRB0:) Arch1 Arch1

Several files are open on DSA8: by an Intersystems` Mumps database. Here`s the scenario...

DRB0: was dismounted from the shadow
A disk in DRB0: failed (bad luck)
A disk in DRA0: failed (really bad luck)
Mount verification has aborted for device DSA8:
DSA8: contains zero working members.
Both disk drives were repaired and both were made optimal with $SWXCRMGR.

Disk DSA8:, device type SWXCR, is online, mounted, mount verification timed out,
file-oriented device, shareable, available to cluster, error logging is
enabled, device supports bitmaps (bitmaps active).
Volume Status: ODS-2, subject to mount verification, write-back caching
enabled.

How do I get the device out of the MntVerifyTimeout state so I can remount the shadow with both members without rebooting OpenVMS?

Please advise.

Thanx in advance.

Enjoy,

--Jeff
10 REPLIES
Uwe Zessin
Honored Contributor
Solution

Re: How to Get Disk Out of MntVerifyTimeout State

I assume that DRA0: and DRB0: were JBODs? Else, it can be *V-E-R-Y* dangerous to set a disk to 'optimal'. Had a customer who did this as recommended by his service provider (not me) which resulted in RAID-1 inconsistencies and a corrupted system disk.

The usual way to get out of this is to dismount the disk. E.g.:
$ DISMOUNT /ABORT /OVERRIDE=CHECKS

but there might still be things like an installed pagefile that require a reboot.
.
Robert_Boyd
Respected Contributor

Re: How to Get Disk Out of MntVerifyTimeout State

$ DISMOUNT/ABORT/OVERRIDE=CHECKS DSA8: probably isn't going to work with zero members in the shadow set -- but you might give it a try.

Robert
Master you were right about 1 thing -- the negotiations were SHORT!
Lawrence Czlapinski
Trusted Contributor

Re: How to Get Disk Out of MntVerifyTimeout State

Jeffery, sounds like you needed hot spares. When a drive fails, "the controller will rebuild the data from the failed drive onto one of the hot spares".
First, don't use a multidisk set for a system disk. If any disk of a multidisk system disk set fails, your recovery chances are poor.
Second, after reviewing the Raid documentation, when a non-system disk in a multidisk raid set fails, try failing only the failed disk. While dismounting the drive seems like a logical thing to do, there are risks if a drive fails on the other set. Do not dismount the DR device. Do not fail a 2nd failed disk. Replace the first failed disk. Type: swxcr rebuild dra chan_number target_number, where dra is the controller device name. Then swxcr mkopt the drive.
Third, recovering from a mount verify with no shadow set members: I don't think an a DISMOUNT/ABORT with zero shadow set members will work either. That should have been done before removing the 2nd member of the set.
Fourth, you'll probably have to reboot to get DSA8: out of the mount verify. Then you will have to reinitialize a shadow member and restore from your most recent backup.
Lawrence
Uwe Zessin
Honored Contributor

Re: How to Get Disk Out of MntVerifyTimeout State

On a re-read of the initial message it looks like both disks are not protected via RAID. A single disk is sometimes called a JBOD, not a RAID-0.

So, a spare disk does not help if the data is on a JBOD / RAID-0.

It doesn't look like he is using a 'multivolume set' as a system disk. That's not only problematic when a member fails - it is equally worse if you try an upgrade and part of the operating system files end up on the not-root member.
.
Jeffery D. Urmann
Regular Advisor

Re: How to Get Disk Out of MntVerifyTimeout State

Thanks Ewe,

DRA0: and DRB0: are RAID-0 sets consisting of three members each across three channels of their own SWXCR.

>it can be *V-E-R-Y* dangerous to set a disk to 'optimal'.

I see what you are trying to say, but once a disk has been replaced in a RAID-0, a restore will be necessary. That`s why RAID-0+1. Right? Unfortunately for me, I lost both shadow members. Maybe I should add a third shadow member. Are you suggesting an initialization of RAID-0 before mount and restore?

I had already tried $ DISMOUNT /ABORT /OVERRIDE=CHECKS but it failed. But then I went into Mumps and killed all of the jobs with a file open on DSA8: and then reissued the dismount command and it succeeded.

Thanx for the help.

Enjoy,

--Jeff
Jeffery D. Urmann
Regular Advisor

Re: How to Get Disk Out of MntVerifyTimeout State

Robert, Lawrence, and Uwe,

Wow!! I am a really s l o w t y p e r. While I was responding to Uwe`s first message (sorry about the misspelling of your name Uwe) I come back to find more responses; thank you all. And there will likely be more great responses as I type this.

Sorry for not providing clear enough information. This is not a system disk and I do have hotspares. I`m not sure yet (investigation under way) why they didn`t work. The hotspares have more blocks than the failed disks. But this is a topic for another thread.

For further clarification, DRB0: was dismounted before the disk failures to perform a backup, then the two failures occurred all by themselves - hardware failures I suspect (overheating - our A/C quit); DRA0: was never dismounted. I am fully aware of the risks of dismounting a member of a shadow set consisting of RAID-0, JBOD, or other. It`s all about risk management.

Thanx again for your responses. My apologies again for making you all speculate. I will strive to provide better information.

Enjoy,

--Jeff
Uwe Zessin
Honored Contributor

Re: How to Get Disk Out of MntVerifyTimeout State

(Hm, ITRC didn't like me, yesterday - could not write a message for a long time.)

It is *dangerous* if you have a 'real' RAID level that offers data protection, you loose one disk and then make it 'optimal' instead of doing a 'rebuild'. In that case the controller takes the disk as-is, but its contents are outdated to an access can return data that is no longer valid.

I do remember that I have seen a warning / description in the product documentation, but the event I spoke about was in June 1997, so I don't recall any details.
.
comarow
Trusted Contributor

Re: How to Get Disk Out of MntVerifyTimeout State

There is a risky procedure.
The entire procedure must be done very quickly or the node will clue exit.

If you absolutely can't reboot it's an option.

This cancels a disk in mount verification


^P 1st (or halt system)
D SIRR C
CONT
C {disk in MV}
IPC> EXIT


A reboot protects your data.

Bob C
Uwe Zessin
Honored Contributor

Re: How to Get Disk Out of MntVerifyTimeout State

Bob,
I think you have missed that the mount verification has already timed out.
.
Jeffery D. Urmann
Regular Advisor

Re: How to Get Disk Out of MntVerifyTimeout State

Thank you for the replies everyone.

fyi...
I did attempt a rebuild with SWXCR and SWXCRMGR. Both failed.

$ swxcr rebuild dra 2 1
%SWXCR-E-DRVOPTIMAL, can't rebuild an OPTIMAL drive

huh? Not sure why. The lights on the drive show failed and SWXCRMGR shows them FLD. I`ve seen this before with SWXCRs. Very strange.

Then I made the disk drive optimal with SWXCRMGR. I then attempted to Dismount /Abort /Override = Checks; which failed. Then I asked here for advice, thank you. I then killed processes with open files, and reissued the Dismount /Abort /Override = Checks; which succeeded. I then restored the data from off-line media with backup /image.

All appears to be good now.

Thanks again, I appreciate your assistance.

Enjoy,

--Jeff