Operating System - OpenVMS
1753495 Members
4716 Online
108794 Solutions
New Discussion юеВ

Re: Has backup/image/ignore=interlock become useless?

 
Jon Pinkley
Honored Contributor

Re: Has backup/image/ignore=interlock become useless?

Bill,

I haven't seen any responses that try to explain the behavior you saw. Specifically, why any backup made with /FAST (which /IMAGE implies) would have files skipped by a backup/ignore=interlock. In fact, to me this sounds more like the effect of a backup/image without /ignore=interlock used on a disk with open files.

We still use 7.3-2 for production, and I have never seen the problem you describe.

I am not sure what the warning in the 8.3 documentation is really warning about. Can you provide a reference to the warning in the 8.3 documentation? I was unable to find it in the BACKUP chapter of the "HP OpenVMS System Management Utilities Reference Manual"

What does not make sense to me is that files would be missed in an image backup, since a /FAST file scan is implied, and this scans the INDEXF.SYS file and generates a list of FIDs to backup

This is what the BACKUP chapter of "HP OpenVMS System Management Utilities Reference Manual: A-L"

http://h71000.www7.hp.com/doc/83final/6048/ovms_83_sysman_util1.pdf

says about /IGNORE=INTERLOCK

Command Qualifier
Specifies that a BACKUP save or copy operation will override restrictions placed on files or will not perform tape label processing checks.

Note

--------------------------------------------------------------------------------
File system interlocks are expressly designed to prevent data corruptions, and to allow applications to detect and report data access conflicts.
Use of the INTERLOCK keyword overrides these file data integrity interlocks. The data that BACKUP subsequently transfers can then contain corrupted data for open files. Also, all cases in which these data corruptions can occur in the data that BACKUP transfers are not reliably reported to you; in other words, silent data corruptions are possible within the transferred data.
--------------------------------------------------------------------------------

INTERLOCK Processes files that otherwise cannot be processed due to file access conflicts. Use this option to save or copy files currently open for writing. No synchronization is made with the process writing the file, so the file data that is copied might be inconsistent with the input file, depending on the circumstances (for example, if another user is editing the file, the contents might change). When a file open for writing is processed, BACKUP issues the following message:

%BACKUP-W-ACCONFLICT, 'filename' is open for write by another user.

The INTERLOCK option is especially useful if you have files that are open so much of the time that they might not otherwise be saved. The use of this option requires the user privilege SYSPRV, a system UIC, or ownership of the volume.
See the Note before this table for more information about this keyword

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Unfortunately, there is no information of what conditions are necessary for the "silent data corruptions" to occur are. I have tried to create a case where a file is open for write while backup is backing the file up without getting a "%BACKUP-W-ACCONFLICT, 'filename' is open for write by another user." message, and I have not been successful. Just because I am not able, doesn't mean it isn't possible, but if it is, then why can't someone give us a reproducer, instead of just repeating the "silent corruption" dogma? I don't consider file corruption of a file for which a warning message stating that the file is open for write by another user, as "silent corruption".

If you do a backup/list/out=files.lis of the backup saveset, does this list not have the files? In an image backup, directory files are copied as is, so it is theoretically possible that a directory file in the process of being modified could be copied in an inconsistent state, but I would still have expected the files that existed at the time of the initial file scan, to have been copied to the saveset. These may show up as lost files if an image restore is done, and possibly would show up as being in the [] directory in a listing of the tape saveset.

Can you confirm that the account doing the backup has SYSPRV? I see that /ignore=interlock lists that as a requirement, although it isn't clear to me why this would be a requirement, i.e. why READALL would not be sufficient.

Here are other threads too read.

How to backup a shadowed system disk ?

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1191154

Backup/Restore system disk

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1209276

Restore System Disks

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=802643

BACKUP/IMAGE

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1028893

VAX/VMS image backup of system disk

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1312636

taking backup of disks of a production system

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=974326

Are this command the same ?

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=910829

How Vms backup works ?

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1094410

Process crashes while backup is activ

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=964756

Jon
it depends
GuentherF
Trusted Contributor

Re: Has backup/image/ignore=interlock become useless?

In contradiction to all the folklore.../IGNORE=INTERLOCK does not produce a corrupted save set.

It "CAN" create an inconsistent copy of a file in the save set.

Btw. if you use /IGNORE=INTERLOCK from one node in a cluster and a file is opened for write on another node you do not get any info or warning message (that's bad).

The /IMAGE backup reads directory files directly off the disk bypassing the file system. BACKUP does not lock down a directory file or synchronize this access with the file system. While BACKUP uses a copy of directory blocks the actual directory on disk can be modified by the file system. Hence BACKUP may copy a file that actually had just been deleted or, miss a file that has just been created. This is more troublesome when directories themselves are created/deleted while BACKUP walks down the directory tree.

And another piece: BACKUP/IMAGE processes INDEXF.SYS before any other task. From the index file it creates an in-memory list of file IDs to save (=/FAST). Again, INDEXF.SYS may change while BACKUP/IMAGE is running.

Even without /IGNORE=INTERLOCK these problems still exist.

Baseline: DO NOT USE VMS BACKUP TO BACKUP AN ACTIVE DISK VOLUME.

/Guenther
Korendyk
Advisor

Re: Has backup/image/ignore=interlock become useless?

I wish to thank everyone who responded. I understand and agree with virtually everything everyone has said. I was perhaps not as clear as I could have been in expressing my concern. Let me try a different approach, beginning with a summary of the motivation.

The backup procedures in question are for recovery after some catastrophic event, and intended for (hopefully) prompt recovery of the system and the disk volumes. The method involves performing the aforementioned image
backup to establish the foundation for the recovery. The errors, warnings, and informational messages during the saveset creation and as a result of the verification pass, serve to identify any parts (directories and files) that should be considered as missing or corrupt in the resulting saveset. Special procedures (as appropriate for system files and for any applications) then ensure that those parts can be restored.

In the event of the recovery, the image saveset restores the volume to the state at or shortly before the time of the backup. The special procedures then restore applications to the state saved shortly before then. This is the method I have suggested at a number of sites, and have even taken some site's "backup tapes" to a similiarly configured box and performed the "disaster recovery" to confirm that it can be done :-]

What raised my concern, and resulted in this thread, was that a random, accidental examination of one of these backups showed that a large number of files were missing from the saveset. Knowing how and why these files became missing is necessary to determine what special procedures are needed to ensure recovery from a catastrophic event.

With no disresepect to others, I really do appreciate all the comments, but as Jon Pinkley points out, I'm really only interested in why the files are missing from the backup. And why it appears that that fact is not reported. Any comments on the risks of /ignore=interlock or using HBVS will be quietly ignored in an attempt to stay on topic. :-}

John Gillings's comment is the one that also scares me:

"There are many ways that changes in a directory could prune off large branches in the directory tree, with no way to guarantee it will even be detected."

So I have to ask: What are the many ways that a directory can be pruned off? And if it occurs without detection (even during the verification pass?), how can one hope to establish a valid foundation for recovery from a catastrophic event?

I considered Backup/Image to be the best (and only?) way to establish a foundation, from which a recovery method can be built that addresses any of the known limitations. What do you do if your foundation can not be assured?

I'm hoping further investigation and testing will determine whether there is a flaw in the method or a peculiarity in the site configuration. I should point out that this is all related to a data disk. The system disk, which does remain "active" during the backup, uses the same process, and examination of those savesets (so far) show them to be complete.

Clues, suggestions, are always welcome!

\bill
Korendyk
Advisor

Re: Has backup/image/ignore=interlock become useless?


Respnding to comments from Jon Pinkley.

>>>> We still use 7.3-2 for production, and I have never seen the problem you describe.
<<<<

Nor have I; this is the first. Part of my investigation is to see if it is specific to 8.3.

>>>> I am not sure what the warning in the 8.3 documentation is really warning about. Can you provide a reference to the warning in the 8.3 documentation? I was unable to find it in the BACKUP chapter of the "HP OpenVMS System Management Utilities Reference Manual"
<<<<

The occurs twice in the "System Manager's Manual, Volume 1: Essentials" in Section 11.15.1 (Backing UP User Disks) and again in section 15.18.3 (Ensuring Data Integrity).

>>>> What does not make sense to me is that files would be missed in an image backup, since a /FAST file scan is implied, and this scans the INDEXF.SYS file and generates a list of FIDs to backup.
<<<<

Unless the fast file scan is no longer implied. Something else to check into...

>>>> If you do a backup/list/out=files.lis of the backup saveset, does this list not have the files?
<<<<

Sadly, the Site considered it sufficient to only retain the batch log (listing the problems and not the successes). I would have insisted on a journal file... which is what it does now. :-/

>>>>> Can you confirm that the account doing the backup has SYSPRV? I see that /ignore=interlock lists that as a requirement, although it isn't clear to me why this would be a requirement, i.e. why READALL would not be sufficient.
<<<<<

I'm also looking to see if there are issues around the process quotas.

thnx.
\bill
Korendyk
Advisor

Re: Has backup/image/ignore=interlock become useless?

Hi Guenther.

>>>> The /IMAGE backup reads directory files directly off the disk bypassing the file system. BACKUP does not lock down a directory file or synchronize this access with the file system. While BACKUP uses a copy of directory blocks the actual directory on disk can be modified by the file system. Hence BACKUP may copy a file that actually had just been deleted or, miss a file that has just been created. This is more troublesome when directories themselves are created/deleted while BACKUP walks down the directory tree.
<<<<

I understand this happens, seen it often, but can it occur in a way that the "difference" is not detected (and reported) either when the saveset is created or during the verification pass?

>>>>
And another piece: BACKUP/IMAGE processes INDEXF.SYS before any other task. From the index file it creates an in-memory list of file IDs to save (=/FAST). Again, INDEXF.SYS may change while BACKUP/IMAGE is running.

Even without /IGNORE=INTERLOCK these problems still exist.
<<<<

Again, can you think of how these changes might be undetected during the backup/verify process? I suppose that if the in-memory list is not refreshed prior to the verification, and changes are made just so...

It remains puzzling, since I have confirmed that the files and associated directories for those missing from the saveset still exist on the disk volume, in a state that appears to be "unchanged" in a long time.

>>>>
Baseline: DO NOT USE VMS BACKUP TO BACKUP AN ACTIVE DISK VOLUME.
<<<<<

There's a scary notion. Many an archiving solution uses VMS Backup as the underlying copy mechanism. So you're saying that Backup is only usable as standalone, or on private read-only volumes. In all other circumstances it may be unreliable.

I'll need to mull that over a bit...

thnx
\bill

/Guenther
AEFAEF
Advisor

Re: Has backup/image/ignore=interlock become useless?

Responding to Gunther:

>
Btw. if you use /IGNORE=INTERLOCK from one node in a cluster and a file is opened for write on another node you do not get any info or warning message (that's bad).
<

Despite the fact that it says this somewhere in the docs, it simply isn't true. When you open a file from another node, there is a FAL process on the local node that has the file open.

Example (merged and edited for clarity):

LOCAL> DIR FTEND.LOG;

Directory _DSA1:[FT]

FTEND.LOG;7 748/750 6-MAY-2009 23:07:00.48

Total of 1 file, 748/750 blocks.

LOCAL> SHOW DEV /FILES

Files accessed on device DSA1: on 11-MAY-2009 19:51:12.11

Process name PID File name
00000000 [000000]INDEXF.SYS;1

REMOTE> OPEN/WRITE/READ SPOOK node_x::FTTOP:FTEND.LOG

LOCAL> SHOW DEV /FILES

Files accessed on device DSA1: on 11-MAY-2009 19:52:07.43

Process name PID File name
00000000 [000000]INDEXF.SYS;1
FAL_16734 000004DB [FT-2-1-0]FTEND.LOG;7

LOCAL> BACK/LOG FTEND.LOG;7 NL:A.B/SAVE
%BACKUP-E-OPENIN, error opening _DSA1:[FT]FTEND.LOG;7 as input
-SYSTEM-W-ACCONFLICT, file access conflict
%BACKUP-W-NOFILES, no files selected from _DSA1:[FT]FTEND.LOG;7

LOCAL> BACK/LOG/IGNORE=INTERLOCK FTEND.LOG;7 NL:A.B/SAVE
%BACKUP-W-ACCONFLICT, _DSA1:[FT]FTEND.LOG;7 is open for write by another user
%BACKUP-S-COPIED, copied _DSA1:[FT]FTEND.LOG;7

REMOTE> CLOSE SPOOK

LOCAL> BACK/LOG/IGNORE=INTERLOCK FTEND.LOG;7 NL:A.B/SAVE
%BACKUP-S-COPIED, copied _DSA1:[FT]FTEND.LOG;7
LOCAL>

>
The /IMAGE backup reads directory files directly off the disk bypassing the file system. BACKUP does not lock down a directory file or synchronize this access with the file system. While BACKUP uses a copy of directory blocks the actual directory on disk can be modified by the file system. Hence BACKUP may copy a file that actually had just been deleted or, miss a file that has just been created. This is more troublesome when directories themselves are created/deleted while BACKUP walks down the directory tree.
<

Well, the doc says BACKUP does synchronize with the file system, but it does lock files, so I completely concur with your bottom line. And I know from experience (V5.5-2 long ago) that it does copy directory files as you say: it copies them block for block. So I'm not sure just exactly how "BACKUP opens the index file to synchronize with the file system (no update is made)" (see below) affects the directory-copy operation.

BACKUP doc:
"To use the /IMAGE qualifier, you need write access to the volume index file (INDEXF.SYS) and the bit map file (BITMAP.SYS), or the input medium must be write-locked. BACKUP opens the index file to synchronize with the file system (no update is made). Finally, you must have read access to all files on the input medium."

AEF
Hoff
Honored Contributor

Re: Has backup/image/ignore=interlock become useless?

>Despite the fact that it says this somewhere in the docs, it simply isn't true. When you open a file from another node, there is a FAL process on the local node that has the file open

The issue here is with remote file access within a cluster. Not with DECnet FAL-level access, which is itself (and specifically the FAT server) arguably local to the process running FAL.

And BTW, the fellow you're discussing this utility with here (GF) has worked on BACKUP itself for a while, adding various support and debugging various problems within that tool. While I do not know if that is still the case, GF is quite familiar with the tool.


Jon Pinkley
Honored Contributor

Re: Has backup/image/ignore=interlock become useless?

I had done some testing a while back, and tried different scenarios, like a file being opened for write and closed during the time that the file was being backed up. I had also tried the simple cases, like a file being open at the time the backup started backing up the file, but I must not have tried that simple case on a file opened on another node.

At any rate, I just verified that what GF said is true.

See attachment for log

Guenther, thanks for sharing a condition that doesn't get a warning message. Ian has also stated this in other threads, but I was "sure" I had tested that case.

Jon
it depends
Jon Pinkley
Honored Contributor

Re: Has backup/image/ignore=interlock become useless?

Sorry the last attachment has incorrect comments with the first backup command.

Correction attached.
it depends
Jon Pinkley
Honored Contributor

Re: Has backup/image/ignore=interlock become useless?

Sorry the last attachment has incorrect comments with the first backup command.

Correction attached (hopefully it will make it this time).
it depends