Operating System - OpenVMS
1839227 Members
3319 Online
110137 Solutions
New Discussion

file corruption with Vms 7.2-2

 
labadie_1
Honored Contributor

file corruption with Vms 7.2-2

Homogeneous Cluster of 2 DS20 E with Vms 7.2-2 and a disk DSA3: in a San(device type hsv110)

$ ana/disk/norepair or
$ mc dfu verify dsa3:
shows numerous files created today with
dsa3:file.dat has no valid file header. Many files come from the application (Cobol programs creating files) but I found some others files, like a ftp log from a basic user, with the same message.

This site has all the critical patches (fibre scsi, rms, f11x, sys, lan...). The only patches missing are Eco 5 for Tcpip 5.1 (no eco at all) , Eco 6 for Decnet Osi and update V2 (but as all other patches are applied; I think it is not relevant). No device error since the 60 days of uptime on both nodes. Any idea what may be causing that ? The directory where many files are created has been re-created today, and files with no valid file header came quickly after. I noticed that the disk dsa3 has highwater marking and will try to convince my customer to remove it.

Thanks for any hint
32 REPLIES 32
Garry Fruth
Trusted Contributor

Re: file corruption with Vms 7.2-2

I'm not sure about the DFU VERIFY; but the analyze/disk/norepair would not prevent other disk activity. It could be that the directory was checked around the same time that the file was being deleted; hence the directory entry would exist but the file would not. You could try analyze/disk/lock/norepair and see if you get the same error. The /lock will prevent updates to the disk until the analyze is complete; you should do this when you don't mind interrupting user activity.
Wim Van den Wyngaert
Honored Contributor

Re: file corruption with Vms 7.2-2

Where did you fin ddoc about /lock ?
It works but is not in help.

Wim
Wim
Volker Halle
Honored Contributor

Re: file corruption with Vms 7.2-2

Wim,

the new /LOCK_VOLUME qualifier is described in the V7.3-1 New Features manual.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: file corruption with Vms 7.2-2

Volker,

Very well but on my 7.3 ir is already understood. Even on my 6.2 ....
But I would advise not to use it on active production disks.

Wim
Wim
Ian Miller.
Honored Contributor

Re: file corruption with Vms 7.2-2

DFY VERIFY has a /LOCK qualifier too - its in the help.
____________________
Purely Personal Opinion
Ian Miller.
Honored Contributor

Re: file corruption with Vms 7.2-2

can you post the error messages?
"directory where many files are created has been re-created today" - how did you do this?

highwater marking is unlikely to be relevant - depends on your security requirement v. the overhead
____________________
Purely Personal Opinion
labadie_1
Honored Contributor

Re: file corruption with Vms 7.2-2

here is the
ana/disk/norepair dsa3:
labadie_1
Honored Contributor

Re: file corruption with Vms 7.2-2

Ian it is the application that creates a bunch of files (about 3000 /day) and the files are then copied.
Andreas Fassl
Frequent Advisor

Re: file corruption with Vms 7.2-2

Hi,

highwater file marking doesn't matter (as mentioned in a previous reply).

Is it possible to remove one member of the shadow set and analyze it without any activity?

I only had once the problem with file corruption, it was caused by a defective controller. The defect was a very vicious one, he wasn't even detected by the onboard diag utilities. And more worse - being in a shadow set, half the reads were correct.

Analyze/disk to a heavily used disk, even with /lock, isn't very advisable.
If nothing else is possible, try to get a maintenance window.
Convince your customer, that a potential corruption of his data can get more worser the longer he waits.

Another question: Has your customer got the magic "too many files in a directory, can't delete them, so I kill the directory a non-VMS-way with some of these nice tools" problem?

Regards

Andreas
Ian Miller.
Honored Contributor

Re: file corruption with Vms 7.2-2

See DFU DELETE for the fast way of deleting a directory containing lots of files.
____________________
Purely Personal Opinion
Jess Goodman
Esteemed Contributor

Re: file corruption with Vms 7.2-2

Almost all the errors that show up in your ANALYZE/DISK output are normal and to be expected since you ran it on an active volume without using the /REPAIR or /LOCK qualifiers.

If you investigate I'm sure you will find that the files mentioned in the BADDIRENT, BAD_FIDSEQ, and BAD_DIRFIDSEQ messags were being created and/or deleted while you were running ANAL/DISK. You might also get them for old files if the system crashed, etc. while those files were being created or deleted, but /REPAIR will fix those.

Likewise the errors ALLOCCLR, or its cousins ALLOCSET and DUALLOC, will occur on an active volume. I'm not sure about the ALLOCEXT message but it might also be normal when multitple file headers are involved.

Tbe DELHEADER message is normal even with the disk locked if a file has been marked for delete on close but is still open by an application.

On the other hand I'm not sure why you would get the BADNAMEFORM or BADHEADER messages unless those particular file headers were corrupted.

Bottom line: ANAL/DISK without /REPAIR or DFU VERIFY without /FIX (or /LOCK) are both pretty useless for detecting file corruption on a volume with file activity.

I've always wondered if it would be possible to first scan the volume without /LOCKing it, make a list of *potential* problems, and then do a quick lock/check/unlock for each problem in the list.

This approach wouldn't work for ALLOCCLR errors since you would have to rescan the entire index file to confirm the problem, but ALLOCLCR is just lost space, not real corruption anyway.

However it should allow a real quick check (in terms of how long the volume is locked) for whether ALLOCSET, DUALLOC, BADDIRENT, BAD_FIDSEQ and some other errors detected during an unlocked pass were real errors or not.

I realize this approach would give some false negatives since a real problem with the storage bitmap can, on an active volume, migrate between various files and from being detected as ALLOCSET or as DUALLOC. Still this option would be an extremely valuable tool for detecting real disk corruption without locking large active volumes for several minutes - which is the only option available now
I have one, but it's personal.
Willem Grooters
Honored Contributor

Re: file corruption with Vms 7.2-2

Gerard,
probabbly a long shot but it may be of use.
A customer has reported severe file corruption in SAN, and investigation by HP (including DFU's author!) revealed a bug in DFU as defragmenter: in rare cases, bits in the bitmap were marked as free where the blocks were actually allocated to files. (it seems, as has been explained, that a bit set states a block MAY be allocated to a file, and DFU made the wrong assumption sometimes).
This has surely been fixed but I'm not sure if this new version is already available. I'm quite certain it will appear on the next freeware CD.
Willem Grooters
OpenVMS Developer & System Manager
Wim Van den Wyngaert
Honored Contributor

Re: file corruption with Vms 7.2-2

I wouldn't like to be the one that has to tell management that a corruption was caused by usage of freeware ...

Wim
Wim
Ian Miller.
Honored Contributor

Re: file corruption with Vms 7.2-2

DFU V3.1-1 is the latest and greatest. The problem was only when using the DFU DEFRAG command not VERIFY. V3.1-1 only runs on Alpha VMS V7.3-1 and later and I64 VMS V8.1 and later.
____________________
Purely Personal Opinion
labadie_1
Honored Contributor

Re: file corruption with Vms 7.2-2

Hello

Willem: DFU (V 2.7) has only been used here by me with dfu verify, so this is not an issue. Thanks for the information.

Fassl: the directory are not so big, the biggest is 7 000 blocks, and a good number are between 500 and 3 000 blocks (which is very bad, but I have already seen much worse).

Goodman: it seems you are right, the files shown by ana/disk/norepair today were created today...
Ian Miller.
Honored Contributor

Re: file corruption with Vms 7.2-2

small note on DFU VERIFY.
I have noticed using
VERIFY/DIRECTORY_SCAN/LOCK
gives the most meaningful results but takes the longest to do. The author of DFU allows you the choice of tradoff. /NODIR is the default.
____________________
Purely Personal Opinion
labadie_1
Honored Contributor

Re: file corruption with Vms 7.2-2

I will have a $ ana/disk/norepair with n o activity soon.

Tahnsk for the info Ian
Garry Fruth
Trusted Contributor

Re: file corruption with Vms 7.2-2

I am unable to open the .CWK attachments. What application creates these files?
Uwe Zessin
Honored Contributor

Re: file corruption with Vms 7.2-2

It's a ClarisWorks (or AppleWorks) document. Could be a text document, a spreadsheet or a database, ...

http://filext.com/detaillist.php?extdetail=CWK
.
labadie_1
Honored Contributor

Re: file corruption with Vms 7.2-2

to make a long story short: with Mac os X and mozilla, I can't reply in the ITRC forums, unless I attach a document. So I attach always a marvelous file, created with appleworks, called empty.cwk, and that file is empty.
Ian Miller.
Honored Contributor

Re: file corruption with Vms 7.2-2

try firefox instead of mozilla
____________________
Purely Personal Opinion
labadie_1
Honored Contributor

Re: file corruption with Vms 7.2-2

here is
$ ana/disk/norepair with no activity on the disk.
Ian Miller.
Honored Contributor

Re: file corruption with Vms 7.2-2

(your .txt attachment appears to be a MS Word doc)

You appear to have lots of lost files and disk space which suggests a directory or two was lost as well as some improperly deleted files.
____________________
Purely Personal Opinion
labadie_1
Honored Contributor

Re: file corruption with Vms 7.2-2

Wim said

I wouldn't like to be the one that has to tell management that a corruption was caused by usage of freeware ...

Well, without starting a war, what is the difference when there is a big bug in a big company product ( Microsoft, HP, Digital, Sun, Ibm, Oracle...), and a big bug in a freeware ?
You use some software, and there is some risk using it. You can prove that a program is bug-free only when it is a very simple program (100 lines or so). When you have 10 000 or 100 000 or more lines of code, you can't be sure. Of course it is convenient to have somebody to blame
:-)