Operating System - OpenVMS
1753726 Members
4776 Online
108799 Solutions
New Discussion юеВ

Re: Has backup/image/ignore=interlock become useless?

 
Korendyk
Advisor

Has backup/image/ignore=interlock become useless?

I have recently become aware of a possible problem with long-standing procedures used to provide system and data backups. I am currently investigating and performing test, but thought I might raise the issue here in case someone has heard of a similar problem or might offer some insight during my investigations.

A simple site, with a system disk and a data disk, each a 2-drive shadow set. The procedures simply perform an image backup to tape using, for example, the following:

$ backup/ignore=(label,interlock)/image/verify -
dsa1: mkb600:dsa1.sav /media=compac/norewind

The backup is performed at idle times (no one logged in) and there is the occasional report of files marked for backup (all expected) and accessed for write (also expected). There appear to be no other errors or warnings reported, expected or not.

The problem is that when the tape is examined, the saveset is valid, but there are MANY files missing. The first detected instance was that an early part of a directory is copied, but not the rest of the directory. Entire subdirectories are also missing. And there does not (yet) appear to be a pattern.

As I mentioned, I am continuing to investigate and will provide additional information as it becomes available. And of course I should mention... OpenVMS/Alpha V8.3 on a DS20.

I am looking to see if there's any patches that might apply. What prompted this message was something that I did discover. I came across the following line in the V8.3 documents describing the "/IGNORE qualifier":

"Also, because of the way BACKUP scans directories, any activity in a directory (such as creating or deleting files) can cause files to be excluded from the backup."

Now, if this is what is happening here, then I am not impressed. For something like this to happen without any warnings, errors, or even informational messages is not what I've come to expect from OpenVMS!! I've been using OpenVMS for a lot of years, and I don't ever remember reading this before. A quick scan of previous (pre-8.x) documents appear not to include this statement, so I have to assume it is recent.

It leaves me wondering about the "way BACKUP scans directories", and if it is known to "cause files to be excluded from the backup" then why wasn't it addressed?!

Anyways. Any suggestions or insights are welcome.

thnx
\bill
35 REPLIES 35
Hoff
Honored Contributor

Re: Has backup/image/ignore=interlock become useless?

That /IGNORE=INTERLOCK has not been a reliable BACKUP is known.

This detail has been in the OpenVMS FAQ for a very long time, and I've made myself somewhat of a nuisance on this topic (see the other thread going here in the forums, and see the comp.os.vms newsgoup) pointing out the risks of the qualifier.

Silent data corruptions.

There has been a request to get the hazards more clearly documented, and it looks like the risks have finally made it into the manuals. (The older documentation tend to presume you knew that the interlocks were present for a reason; to flag questionable data access. This is the same basic reason why there's been a longstanding standalone BACKUP (OpenVMS VAX) or boot the CD (OpenVMS Alpha) or DVD (OpenVMS I64) or another system disk to get a backup of an OpenVMS system disk.
Korendyk
Advisor

Re: Has backup/image/ignore=interlock become useless?

Hi Hoff,

Thanks. Yes, I am well aware of the silent data corruptions possible with /ignore=interlock. I have dealt with it on many a system recovery. My concern is not with files being corrupted, since those are identified when the saveset is created, and can (should) be appropriately handled in the rare event of a recovery.

My concern is that files simply do not appear in the saveset. And when I said many, I means hundreds of small data files. Directories in the saveset contain varying numbers of the files that they should: many are there, many are not. Some entire directories are missing. And none of the files are open during the backup.

It is a puzzler. Sadly, backups are slow, idle time is rare, and so testing is a tad tedious.

\bill
John Gillings
Honored Contributor

Re: Has backup/image/ignore=interlock become useless?

\bill,

BACKUP/IGNORE=INTERLOCK has always been useless. Indeed, any attempt to BACKUP an active disk is mostly useless. This is not a fault in BACKUP, it's a fault in expectations.

There are many ways that changes in a directory could prune off large branches in the directory tree, with no way to guarantee it will even be detected. There are many ways files can change between the start of a backup operation and the completion. Some are detectable as potentially affecting the state of the backup, some are not.

BACKUP/IMAGE is really only useful for saving and restoring a static system disk. Any potentially changing files need to be saved independently. Any application data needs to be handled by the application, NOT the operating system. Only the application can know when the data is in a quiescent state. Backup should be an architecturally integral part of any serious application.

This is not the fault of OpenVMS or any other operating system, it's a simple issue of time. Things change many orders of magnitude faster than state can be saved, so it's simply not possible, even in theory to have a generic, covers-all-cases mechanism for creating a backup that can be restored with the system in a guaranteed known state.

There's an OpenVMS Technical Journal article (in V1?) covering some of the issues. The take home message is stop thinking in terms of getting the data off the system. Turn it around, think about how you will restore your system if something fails, work out what you'll need and work backwards to figure out how to save it.
A crucible of informative mistakes
Jan van den Ende
Honored Contributor

Re: Has backup/image/ignore=interlock become useless?

(sorry if double post; previous try seamingly failed)

bill,

of course, Hoff and John G. are very right!

Yet, your situation needs not be not as bleak as it obviously is now. And implicitly John G. indicated such:

>>>
or any other operating system, it's a simple issue of time.
<<<

And the main reason for my much more optimistic view you gave yourself:

>>>
each a 2-drive shadow set.
<<<

So, if you dismount one member of the set, mount that (process-private to avoid label conflict), and backup THAT drive, you will have brought the time issue down to only those activities that modify different locations on disk, and have already started but not yet finished.

Orders of magnitude less likely than such changes between reading a directory and procssing what has to be done according to that info. Or processing a (database, RMS, ...) index and processing the associated data. Or ... (any non-atomic activity or activity involving different disk locations.)

And HostBasedMiniMerge is fully integrated into VMS (patched 7.3-2 and) 8.x, so any pre-existing issues with merge performance have vanished.

Bottom line: modify your backup to profit from shadowing, and 99% +++ of your issues are past.

Success.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Hoff
Honored Contributor

Re: Has backup/image/ignore=interlock become useless?

You're certainly welcome to log a direct problem report if there's a support contract around. I would personally doubt you're going to get traction with HP via ITRC, and given the known restrictions around this particular command.

>Thanks. Yes, I am well aware of the silent data corruptions possible with /ignore=interlock. I have dealt with it on many a system recovery. My concern is not with files being corrupted, since those are identified when the saveset is created, and can (should) be appropriately handled in the rare event of a recovery.

I'd have to assume you're not familiar with /IGNORE = INTERLOCK because you're (still) using it. (I thought it was bad and was discussing getting the badness better documented, and while talking with the then-current maintainers of the BACKUP utility, I realized I hadn't understand half of the possible badness here.)

>My concern is that files simply do not appear in the saveset. And when I said many, I means hundreds of small data files. Directories in the saveset contain varying numbers of the files that they should: many are there, many are not. Some entire directories are missing. And none of the files are open during the backup.

Those interlocks were designed and implemented for a reason. (The same sort of model holds with the cluster quorum scheme; it wasn't implemented to cause folks boot or run-time problems, that stuff was implemented to prevent data corruptions.)

I'm not sure which I'd consider better here: entirely missing, or silently corrupt.

>It is a puzzler. Sadly, backups are slow, idle time is rare, and so testing is a tad tedious.

How to split an OpenVMS software RAID-1 shadowset volume is in the host-based volume shadowing manual, IIRC. That (greatly) reduces the window, but you can still have the potential for inconsistency corruptions.

With OpenVMS, the only way this archival stuff can be done (reliably) is either with the applications quiescent, or with application-integrated archival support. BACKUP /IGNORE=INTERLOCK can't reliably copy a system disk (which is how I realized there were problems early on), and HBVS might (though this is usually rare, we are looking at enterprise applications) miss part of a a multi-block or cached or inflight change. (StorageWorks disks could drop multiblock writes; that was the reason that the shelves and the controllers could optionally have batteries.)

I get reasonably good and consistent backups off the local OpenVMS and Unix databases because I use the databases, and because the databases have archival support. The applications - the databases, in this case - have archival processing integrated.
Robert Gezelter
Honored Contributor

Re: Has backup/image/ignore=interlock become useless?

Bill,

The issues here are as Hoff, John, and Jan have mentioned.

An amplification on what Jan commented about the RAID set, however, is in order. Actually, it is a combination of something John and Hoff noted and the splitting of the RAID set.

Often, the best solution is to backup issues is to add a scratch volume to the RAID set, temporarily increasing it (in this case, from two to three members). When the three members are fully up-to-date, disconnect the third member, remount it privately with writes disabled and make the backup from the private copy (NOINTERLOCK will not be necessary).

However, one must be careful that the volume is quiescent when disconnecting the temporary shadow set member. If a directory is being updated at the precise instant that the disconnect is happening, the disconnected shadow set member will also have the directory in an inconsistent state. There is no magic here.

That said, the pause in system activity is straightforward to architect, because the disconnect can be done very quickly.

Often, what allows people to "get away" with backing up system volumes with /IGNORE=NOINTERLOCK is that they "know" that the only a small set of files on THEIR system volume are actually ever modified (e.g., SYSUAF, error logs), and they separate steps to preserve those files (e.g., using CONVERT/SHARE and other utilities).

- Bob Gezelter, http://www.rlgsc.com
Hoff
Honored Contributor

Re: Has backup/image/ignore=interlock become useless?

To further Bob G's approach....

As for backing up the system disk on a regular schedule, I usually don't bother with that.

No point, really.

I do back up the system disk once in a while (after ECO kits or upgrades, or significant configuration changes), but I do archive the core files (see the SYLOGICALS.TEMPLATE file) regularly.

But the system disk in most OpenVMS configurations doesn't change all that often.
comarow
Trusted Contributor

Re: Has backup/image/ignore=interlock become useless?

All this just demonstrates how versatile
and ahead of it's time Host Based Shadowing
is.

Simply removing a disk and backing it up,
with host based mini merge, it should
go back quickly.

With the flexibility of adding and removing members, these operational issues have a simple solution.

That said, I have restored 100s of systems
backed up with /ignore=interlock and frankly,
they've always worked, though I always point out it's unsupported.

EMC says host based shadowing is obsolete, but it has nothing that solves operational
issues like host based shadowing. Me thinks it is they just don't want to bother coding
a long word.

If you want active backups, they generally
are part of an application, like Oracle has it's own backup, and RDB, with transaction
journals and such.

Bob
Martin Hughes
Regular Advisor

Re: Has backup/image/ignore=interlock become useless?

FWIW, note that MINIMERGE (HBMM) is being referenced in some of the above responses where I believe MINICOPY is what is meant.
For the fashion of Minas Tirith was such that it was built on seven levels, each delved into a hill, and about each was set a wall, and in each wall was a gate. (J.R.R. Tolkien). Quote stolen from VAX/VMS IDSM 5.2