Operating System - OpenVMS
1832869 Members
8000 Online
110048 Solutions
New Discussion

Re: Free block count set to 0 on an almost empty drive

 
Ziggy Filek
Frequent Advisor

Free block count set to 0 on an almost empty drive

Under VMS v7.3-2 in a 3-node Fibre Channel ES80 cluster, I have a bunch of drives on an EVA that are almost empty. They are not shadowed. All of the sudden the free block counts on two of them started to read 0 (zero) for no apparent reason.
Unfortunately, a misguided soul immediately ran $ANAL/DISK/REPAIR on them before I could save them for posterity. The ANAL/REPAIR worked, i.e. discovered the wrong free block count and fixed it.
Has anybody seen anything like that before?
The volumes are initialized with /LIMIT qualifier, enabling their extension using $SET VOLUME/SIZE
Can one screw it up like that using $SET/VOLUME/SIZE improperly? Any other possible reasons for this scary behaviour?

Thanks for any insights!

Ziggy
9 REPLIES 9
Martin Hughes
Regular Advisor

Re: Free block count set to 0 on an almost empty drive

Was there anything in the operator log at the time of the problem?. Also, do you have dynamic lock mastering enabled? the problem may have been triggered by a lock remastering event.

ps: SET VOLUME/REBUILD=FORCE is a less intrusive method to fix free block drift than ANAL/DISK/REPAIR.
For the fashion of Minas Tirith was such that it was built on seven levels, each delved into a hill, and about each was set a wall, and in each wall was a gate. (J.R.R. Tolkien). Quote stolen from VAX/VMS IDSM 5.2
John Gillings
Honored Contributor

Re: Free block count set to 0 on an almost empty drive

Ziggy,

Although the free block count is *usually* accurate, you can't depend on it being "perfect" at all times. You may even be able to capture times when the number reported for a particular disk is different between nodes.

In some senses the number displayed by SHOW DEVICE is merely cosmetic. It's not considered when attempting to allocate space on the drive. Try allocating some space. If it's there, you'll get it. Discrepancies are most *unlikely* to have anything to do with SET VOLUME. The number itself is stored in the volume allocation lock value block, with nodes adding or subtracting any allocations or deallocations when they take out the lock.

One potential scenario is if a node has just made an allocation or deallocation, then crashes, the lock value block can be lost. Other nodes have a stale value. This could have happened months or years ago, because few people ever check what SHOW DEVICE says. So, if it happens the reported number was too low, you might reach 0 early. THEN you notice.

To fix the free block count use SET VOLUME/REBUILD. If that fails, try SET VOLUME/REBUILD=FORCE, or ANALYZE/DISK/REPAIR.

Back in early V5 days, drifting free space was fairly common (it was pressure from customers to be able to correct drift that resulted in /REBUILD=FORCE). Drift could occur as a result of lock tree migration between nodes, and various other events. Although many (most?) have been fixed, guaranteeing the number is 100% accurate at all times is far more expensive than it's worth. Consider that most of the time, all you care about is there is *some* space, and the number is within an order of magnitude.
A crucible of informative mistakes
Wim Van den Wyngaert
Honored Contributor

Re: Free block count set to 0 on an almost empty drive

An example you can test yourself.

Copy a file of substantial size to wim.lis

Then

$ open/read x wim.lis
$ sh dev d (to find how many free blocks)
$ del wim.lis.*
$ sh dev d (should give the same as in previous show command)
$ close x
$ sh dev d (now the free blocks are decreased)

Such an open file will be reported by anal/disk/rep with "wim.lis marked for delete" and nothing is repaired. If something happens to the system, the space is still allocated until you do an anal/disk/rep.

Wim
Wim
Ziggy Filek
Frequent Advisor

Re: Free block count set to 0 on an almost empty drive

Thanks for your responses.
Martin: How do I check if dynamic lock mastering is enabled? BTW, what is it?? :-)
Operator log does not show anything unususal except some Oracle DBAs doing some work (they swear they did not try to extend anything).

John: There was no crash of any kind, nor was there a cluster state transition. The block count was correct or close to correct just minutes before it suddenly went down to zero. I have a system monitoring tool (BMC Patrol), that reported the disk being 2% full at ten uneventful 5-minute intervals, and then suddenly going to 100% full, BMC issuing an alarm. So, it was not a creeping slow change, or something we did not notice before.
Hein van den Heuvel
Honored Contributor

Re: Free block count set to 0 on an almost empty drive

I would strongly suspect a bad device driver. Not a software driver, but a human 'ooops'. Did anal/disk/repair report anything as having been fixed to explain the discrepancy, or was it perhaps a matter of timing where the problem woudl have resolved itself? I would try to send out the nicest possible message asking folks to help explain the glitch in usage at a specific time.

If I really wanted to drill down on this and those devices were very little used, then I would mount the problem devices privately and DUMP headers in INDEXF.SYS to 'see' if a file had been there. Or maybe a DFU UNDELETE? but I suspect it is way to late for that.

fwiw,
Hein.
Ziggy Filek
Frequent Advisor

Re: Free block count set to 0 on an almost empty drive

If it's a human error, it's also a VMS bug, because the $ANAL/DISK/REPAIR gave the following message:

%ANALDISK-W-FREESPADRIFT, free block count of 0 is incorrect (RVN 1);
the correct value is 63342624

I'd say count of 0 is incorrect... 31 GB disappeared in a second! This was not a slow drift: Just a couple of minutes before the event the count was correct (I know it from monitoring software). Also, this was not a moment of instability: The zero count was not discovered for 18 hours!

There were no interactive users on the cluster at the time of the event.

Ziggy
GuentherF
Trusted Contributor

Re: Free block count set to 0 on an almost empty drive

This is a bug and has been fixed in remedials for V7.3-2 up to V8.3. Sorry, I don't have the remedial kit ID which includes it. But the fix was done last summer so any recent remedial kit for (I guess) SYS should include it.

/Guenther
Ziggy Filek
Frequent Advisor

Re: Free block count set to 0 on an almost empty drive

Thanks.
I do have a fairly recent update installed(UPDATE 7.0 + a whole bunch of individual fixes that add up more or less to update 8.0), but I will go to HP support site and try to find any newer stuff.

Ziggy
Peter Zeiszler
Trusted Contributor

Re: Free block count set to 0 on an almost empty drive

We have seen this issue when something like a backup fails but doesn't close down properly or backup job was forced to stop (i.e. stop proc/id=xxxxxx). This also pops up if we have a crash. Thats one reason we spend the time after the system is booted to analyze the disks and force a rebuild.

We are still on 7.3-1 with the very last set of patches available if that makes any difference.