1753905 Members
10031 Online
108810 Solutions
New Discussion юеВ

Corruption....

 
SOLVED
Go to solution
Willem Grooters
Honored Contributor

Corruption....

I have a big file (>1.280.000 blocks), that seems to be corrupted, it has been found that a program accessing it starts to loop.
ANA/RMS revealed a big problem:

*** VBN 1272006: Record at offset %X'25E0' has a missing or illegal RRV.

(on key 0, about 140 pages, different VBN (but majority of above one) and offsets)

and finally

*** VBN 1260806: The bucket chains for key #0 contain a loop.
Unrecoverable error encountered in structure of file.

The analysis uncovered 7697 errors.

The loop has been found in the output mentioned above: the same combination VBN and offset was found several times.

Simple CONVERT fails - at least, it seems there will be missing records:

%CONVERT-I-SEQ, record not in order

1000's of them...

The file has been created using BACKUP to tape and restored to disk different times; I don't know the exact qualifiers for neither of them, nor do I know of errors that have been reported.
Both restored versions revelead errors on ANALYZE/RMS but it is said that one could be converted. But I don't know of errors.

I'm still working on this (CONVERT/FDL is running but seems to loop as well....), but would just like to have your opinion:

* What does the message actually mean ("invalid or missing RRV)?
* Could it be introduced by tape errors (writing to, or reading from).
* Can the file-contents be saved by CONVERT (I do have a proper FDL) or would I need to access the file by alternate key (to save as much as possible)

Willem
Willem Grooters
OpenVMS Developer & System Manager
5 REPLIES 5
Bojan Nemec
Honored Contributor

Re: Corruption....

Willem,

RRVs are record reference vector used when a bucket split occurs. They are used for alternate keys.

Many years ago I have similar problems on a corupted disk (hardware error). Some index files have holes (missing blocks). It was a very long and hard work to restore as much as possible.

You can try to read by alternate key, but there is a little posibility to succed, because of the invalid RRVs. Another posibility is to write a program which reads the file by RFAs (I used this in the previous mentioned disaster). In this program You generate the RFA sequential and try to read with it, if the read succed save the record else generate next RFA. On a big file this will be a long process. So try to save as much as possible with sequential access, access by alternate keys, reverse reading by keys etc...

Bojan
labadie_1
Honored Contributor

Re: Corruption....

Zap, on the freeware, is the tool to help you recover a cirrupted file, but you have a huge number of errors.

I think Zap is in the rms_tools in the freeware.

Good luck

Gerard
Hein van den Heuvel
Honored Contributor
Solution

Re: Corruption....

>> it has been found that a program accessing it starts to loop.

Then convert is likely to loop also, as is uses vanilla RMS to read, and a private algoritme to write/fast.

>> The loop has been found in the output mentioned above: the same combination VBN and offset was found several times.

A loop means that the 'next' pointer in a bucket header points back to a bucket earlier in the chain.

Check out my RMS_TUNING presentaiton on the VMS Freeware under RMS_TOOLS. It describes teh basic indexed file internal structure, and in the second part has the details for a bucket header, record header and some hints how to deal with corrupctions.

>> %CONVERT-I-SEQ, record not in order

Well, if you do loop back, they will be out of order no?

>>The file has been created using BACKUP to tape and restored to disk different times; I don't know the exact qualifiers for neither of them

The biggest culprit we know of is /ingor=interlock on the input, as it allows backup to read inconsistent data and data 'on the move'. One backup read may catch the beginning of bucket before it is update, then next may catch the tail with an update IO between them.

>> Both restored versions revelead errors on ANALYZE/RMS but it is said that one could be converted.

Just use the one that can be converted!?
Sounds like tape read problems. Clean heads?

>> What does the message actually mean ("invalid or missing RRV)?

That the header of a record in a bucket is corrupted. Or that the record does not really start where is is believed to start.
If the record was not moved (split) then it's RRV should point to itself.. if not: badrrv. If the record is moved, then the remaining header, commonly referred to as 'the rrv', should point to a valid record. If not: badrrv. There is an early check that the VBN in each RRV (similar to an RFA) is 'reasonable', that is lower than the HIBLOCK of the file. Random ascii fails that test in general: badrrv

* Could it be introduced by tape errors (writing to, or reading from).

Yes, and by bad procedures (ignr=interlock)

* * Can the file-contents be saved by CONVERT (I do have a proper FDL) or would I need to access the file by alternate key (to save as much as possible)


There is no single answer.
Convert, like any SEQUENTIAL RMS read, does NOT use RRV's that's the good news.
Reportedly bad RRV's suggest serious other corruption. That's the bad news.
Clearly convert need 'help'. Perhaps a single 'patch' (with my zap tool?), to fix up one 'next' pointer can bring order back.
Focus on the FIRST error reported.
Understand that the first VBN with a reported corruption may be just a 'victim'. That is, it should not have been pointed to.
So the VBN before it (not reported) may actually be the place to fix!
We have also seen bad (corrupted) records in a file causing RMS to look places where it should not.
Try the other tools in my directory?
DUMP the FIRST VBN reported, for a length of the bucket, and the DUMP the bucket before it, hoping it is logically before it. Just eyeballing that may reveal problems.

ANAL/RMS ... POS/BUC badvbn ... DOWN
NEXT NEXT... watch that offset!

Use the INDEX bucket above it, to determine the prior bucket.

Make a big pot of coffee, and some scratch paper (Excell), and start to build a mental picture of the file. Which key value ranges live where, which vbn lives where, points to what other,

Patience!
(Oh... and don't forget that cost/benefit analysis: What is it worth to restore some data, most data, one more record,...)
Good luck,
Hein




Hein van den Heuvel
Honored Contributor

Re: Corruption....

>> So the VBN before it (not reported) may actually be the place to fix!

That was a bit short, and may need clarification.
It is not 'the vbn', but the 'the bucket identified by its starting vbn.
And it is not just 'before it', but LOGICALLY before it. In key primary key order, not in VBN order.

Willem, ik hoop dat je een beetje van puzzellen houd... dan kan dit een leuke oefening worden. Zo niet, dan wordt het afzien. Wellicht support inschakelen?!

Cheers,
Hein.
Willem Grooters
Honored Contributor

Re: Corruption....

My customer is quite lucky (and she knows it!) that this was just happening on the test system, in preparation of consolidation. I already expected BACKUP to be the cause of the problem - I urged her NOT to use /IGNORE=INTERLOCK and, although it will take twice as much time, /VARIFY, just to be sure the tape is Ok). A new backup has been taken and restored - and now the files are Ok.
All explanations have given me a good insight. Hein, thanks for the extensive education. I think we all learned from that.

Hein - ik heb het bestand gelukkig nog dus als er wat (eigen) tijd is, wordt het inderdaad een aardige oefening. Tijd dat ik er weer wat aan doe ;-)
We spreken elkaar nog wel

(Sorry guys. Just a short chat between two Dutchmen).


Willem
Willem Grooters
OpenVMS Developer & System Manager