Operating System - OpenVMS
1752793 Members
6057 Online
108789 Solutions
New Discussion юеВ

Re: File corruption by CONVERT?

 
Willem Grooters
Honored Contributor

File corruption by CONVERT?

On behalf of a customer....
Given environment: VMS 7.1-2

CONVERT has found an indexed-sequential file being corrupted:

convert BEHAND.ISM;1 behand.msi
%CONV-F-READERR, error reading DRA0:[000000]BEHAND.ISM;1
-RMS-F-CHK, bucket format check failed for VBN = 5458

ANALYZE/RMS found this error but it could not be repaired. The corrupted bucket was found in a data area.
ANALYZE/DISK didn't reveal any errors, nor was anything found in errorlog or operator.log.

We found mutiple files that had a similar problem. Since it occurred on a production system, solving the problem had higher priority than finding out what caused it.
First attempt was to make the file sequential.
Since CONVERT did not work, we first tried to achive this another way, but TYPE and EDIT failed as well. Restore from backup wasn't a solution either since some files carried the very same error!
However, we managed to recover the data, either from backup or by reading around the error spot (block-by-block).

Now that is setteld, we still like to find out what could have caused the error. My suspicion is XQP, I know it was causing errors in ealier versions and it was advised to turn it off.
I would like to know if this could have been the case and if so, how to switch it off and if that requires a reboot.

FYI: Learned from this: Although the documentation states you would have to recover from backup, this doesn't always solve the problem: The error could exist in the backup version as well...So beware.)
Willem Grooters
OpenVMS Developer & System Manager
8 REPLIES 8
Gary Sachs
Advisor

Re: File corruption by CONVERT?

Willem,

we had a similar problem with indexed files. It turns out that CONVERT was the culprit and that a patch was issued for the problem... but it was on VAX VMS 7.2. The patch was for RMS..
VAXRMS_072 and it required that VAXUPDATE01_072 was installed first.

gary
Martin P.J. Zinser
Honored Contributor

Re: File corruption by CONVERT?

Hello Willem,

one thing you might want to check is if the FDL for these files matches your current needs. If index levels get to deep and/or you have many extents on the file they do become more error prone.

Greetings, Martin
John Gillings
Honored Contributor

Re: File corruption by CONVERT?

When RMS buckets use the first and last byte as a sanity check. The check bytes are incremented before writing. This is a simple check for partial writes (due to power failure or system crash), or for other types of corruption.

Dumping the bad block (in your example VBN 5458) might shed some light on the source of corruption - for example if there is obvious data from another file.

Multiple files with the same type of error suggests a systemic cause. Find all the VBNs and map them to LBNs. If they're all in the same region, you may have a hard fault on the drive, or a single event that corrupted them all at the same time. Scattered blocks in space or time is more likely to be a software problem.

Make sure you're up to date with all RMS, EXP and SYS ECOs for your versions. Also consider other likely candidates - defraggers, anything that caches or "optimises" disk I/O, or rogue users.

To recover, first try a "plain" CONVERT, if that fails try CONVERT/KEY=n for all secondary keys (but beware null keys - not all records may be present on all secondaries). If all these fail, then restore backups or write a program to recover as much as you can (as you've already done).

Protect yourself from this type of error by taking frequent BACKUPs, regular CONVERTs and/or ANALYZE/RMS. Perhaps check important files before the backup so you don't backup corruption. For especially valuable data, it might be worth doing CONVERTs to a sequential file as an emergency backup - sequential files are less likely to suffer fatal structural corruption. This can be done "live" with CONVERT/SHARE.
A crucible of informative mistakes
Willem Grooters
Honored Contributor

Re: File corruption by CONVERT?

Thnaks to all - it tells me what I wanted to know.
Willem Grooters
OpenVMS Developer & System Manager
Antoniov.
Honored Contributor

Re: File corruption by CONVERT?

Hi Willen,
some months ago I'm gone in same problem.
During recovery operations I've discovered I can read record using direct primary key.
Example:
AA First Record
BB Second Record
--- VBN Error
CC Thirth Record
Reading (and converting) file received VBN error after record BB but after read using CC key I can restart sequential read to convert into sequential format.
Obviously it was very hard find next sequential key after error and I can't find the program used in past for furthermore help you.
Bye
Antoniov
Antonio Maria Vigliotti
Willem Grooters
Honored Contributor

Re: File corruption by CONVERT?

Antonio,

That's just the way it was done ;-)...
Willem Grooters
OpenVMS Developer & System Manager
Hein van den Heuvel
Honored Contributor

Re: File corruption by CONVERT?


Glad you got (most of) your data back.
Yes, silent corruption on a backup is scary.

Therefor I recommend to CONVERT the old production file instead of, or as a preluse to, backup.

To figure out the cause of the corruption you need to know details on the data found. DUMP is your friend... after a simple ANAL/DISK.

The easiest data recovery technique is indeed to find a new starting point for a sequential read. DCL READ/KEY can help. ANAL/RMS/INT can help to find the next key/rfa through the index, sometime as parallel applicaiton file exists with key values, and somethime a simple binary search suffices.

Well tuned files tend to have fewer IOs and corrupt less, but poor tuning is no excuse for corruption!

For both tuning and patchin advise be sure to check out my freeware powerpoint presentation: http://h71000.www7.hp.com/freeware/freeware50/rms_tools/rms_tuning.ppt
[feedback is appreciated: hein at hp dot com]

Cheers,
Hein.
Martin P.J. Zinser
Honored Contributor

Re: File corruption by CONVERT?

Hello Hein,

I perfectly agree that poor tuning is no >>excuse<< for corruption and if you look up your internal problem tracking systems you might find that the company I do work for takes these issues pretty seriously. I just mentioned it because I did encounter situations where files were corrupted after a multi-step file transfer and tunig the FDL (besides of generally being a good idea) made the problem go away. And yes, I do agree this is not a solution, but a workaround and one should make sure the problem is reported and escaleted properly in any case, but often a first and fast workaround is something you look for if you have a problem with a production system ;-)

Greetings, Martin