Re: Backup taking locks ?

Wim Van den Wyngaert · ‎05-12-2009

I was experimenting with backup/image of a disk with a lot of activity.

While the backup was running, I created lots of files with names in decending order.

The file creation aborted with "file currently locked by another user". Tried it with ascending names. Idem. Without /image. Idem.

So, any backup can cause file creation failures ?

Wim (hoping I miss something)

Wim

Wim

Wim Van den Wyngaert · ‎05-12-2009

VMS 7.3 in a cluster and file creation done with
$ create xxx
a
$

Wim

Wim

Wim Van den Wyngaert · ‎05-12-2009

All tests redone with ignore=interlock. Then the file creation never fails.

So, interlock not only means "able to read all files" but also "don't lock" ?

Wim

Wim

Ian Miller. · ‎05-13-2009

/ignore=interlock means don't lock and hope

____________________
Purely Personal Opinion

Wim Van den Wyngaert · ‎05-13-2009

What is not what I read in the help text.

Wim

Wim

Jon Pinkley · ‎05-13-2009

Just curious, did it report which file was locked?

It seems the only one that could cause problems for creating files would be the .DIR file.

And yes, backup without /ignore=interlock can lock files and cause other processes to get file locked by another user errors when they attempt to open the files for write access.

There have even been reports of login failures due to SYSUAF.DAT being locked.

it depends

Wim Van den Wyngaert · ‎05-13-2009

It didn't say that it was the directory file.

Can it also happen that something gets locked when using /ign=int ?

Wim

Wim

Jon Pinkley · ‎05-13-2009

I am reasonably sure that if /ignore=interlock is used, only files that backup is creating, for example your journal file, log file, listing file, or output saveset will get locked. If they are on a different disk than you are backing up, and you use /ignore=interlock, backup should not interfere (locking wise) with other users of the source files being read by backup.

In other words, I am not aware of any locking that the BACKUP image would do to cause file locked errors for other processes.

But how useful the resulting backup would be is the question you have to ask, if there is a lot of activity on the disk. I assume you have been following the other backup thread:

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1338782

If you have a test system, try the SDA LCK extension and collect a trace of locks while you are doing your test.

I am not sure how a movefile operation (defragmentation assist) is synchronized with files that are opened with "explicit interlock ignore". I remember reading somewhere that you should avoid backups while defragmenting disks, but I don't remember where I read it, and whether it was due to performance effects, or consistency effects.

Jon

it depends

Hoff · ‎05-13-2009

AFAIK, online application backup is not available on OpenVMS with the integrated tools.

High- or continuous-uptime application data archiving is not an easy problem, and it usually involves the assistance of the application(s) and (usually) replication, and (increasingly) the direct assistance of the database and the file system.

You can (and usually should) (also) use database-integrated tools, such as mysqldump or RMU/BACKUP or such; these are synchronized with the data source, and can provide (more) consistent results.

It is possible to get rather close to this goal with OpenVMS with the use of the (optional) RMS Journaling pieces.

There are other issues with the current implementation of BACKUP (beyond the locking-related matters), not the least of which are the bandwidth limits inherent in the current design; the current tool is within a close percentage of the theoretical limit of the bandwidth of the underlying devices and I/O buses. Avoiding the increase in archival time then tends to point to data compression (which is latent but as yet unsupported) or reducing the quantity of data involved in getting the archival copy, or toward parallel archiving or faster hardware. Or a combination.

There's certainly fodder here for a Best Practices article or two, as the features and limitations of the OpenVMS tools are (clearly) not (widely) understood.

AEFAEF · ‎05-13-2009

Wim,

I had exactly this problem once. I was backing up our archive disk, and the backup ran long. The EOD job was running on another server. At the end of this job it copies trading-data files to the archive disk. Several files couldn't be created. I suspect that BACKUP had the directory locked (or one of its blocks?) at that time.

As recently mentioned here in the /ignore=interlock post, it is best to run BACKUP when things are quiet.

AEF

John Gillings · ‎05-13-2009

Wim,

Rerun your test with SET WATCH/CLASS=MAJOR enabled on the file creation thread - that may give you a clue to the exact sequence of events. If you have enough log file space, doing the same on the backup side may also be interesting (but without timestamps you may not be able to correlate the sequences)

However, your results will be only of academic interest. Maybe people will eventually realise that it is simply NOT POSSIBLE, even in theory to take a reliable, useful backup of any storage which is undergoing active, uncooperative, unsynchronised changes.

It would almost be better if BACKUP/IMAGE held a doorbell lock on the volume, at the first sign of any change it simply stopped with:

%BACKUP-F-USELESS, Volume has changed, no point in continuing

or maybe change /IGNORE=INTERLOCK to /WASTE_OF_TIME

Perhaps this would convince people who insist on taking these risks to develop a reliable backup strategy?

A crucible of informative mistakes

Jon Pinkley · ‎05-13-2009

Do you use a photo copier? No matter how good the copier is, the copies will never be perfect, but that doesn't necessarily mean the copies are not useful. The copies may not be admissible as evidence, but for many purposes, having an imperfect copy is better than no copy.

I am sure there are many people that have had disk failures that would be happy to have an imperfect copy of the drive instead of nothing.

I agree with you that any backup made of a disk that is mounted for shared write access that had active writers will have inconsistencies, as backup isn't instantaneous. Even splitting a shadow set member or using controller based "point in time" copies doesn't solve the problem of synchronizing with applications, and their in memory buffers, although any point in time method is better than backup/ignore=interlock of an active disk.

I claim that a backup/image without /ignore=interlock of an active write shared disk is more than a waste of time; it can cause locking problems for active applications. So while /ignore=interlock may be a waste of time, if your goal is a "perfect copy", at least it is less likely to cause other problems, and you will get a best try copy of the blocks used by files that were present at the time of the initial index file scan. No, it isn't "best practice", but not all sites have the budget for 3 member shadow sets, or EVA controllers with business copy licenses.

I am sure John Gillings has seen many cases where customers had useless backups, but my guess is that many of these backups weren't even made until after some other problem had occurred. For example, if a disk gets mounted in a partitioned cluster, any backup of that disk is still going to be corrupted. Likewise, if a drive is already getting parity errors and going into mount verification, any backup made of that disk is not going to be error free.

My point is that I am not convinced that all of the problems John Gillings has seen are due to the use of /ignore=interlock. More likely it was the system manager or operator ignoring the need for backups and verifying that the backups can actually be used to restore what is needed.

Also note that a backup/image of a live system disk is almost guaranteed to have more problems if /ignore=interlock is not used than if /ignore=interlock is used. Just for example these files wouldn't get copied unless /ignore-interlock is used:
SYS$COMMON:[SYSEXE]QMAN$MASTER.DAT
SYS$COMMON:[SYSEXE]SYS$QUEUE_MANAGER.QMAN$JOURNAL
SYS$COMMON:[SYSEXE]SYS$QUEUE_MANAGER.QMAN$QUEUES

I do agree that people should develop a reliable backup strategy. But blindly removing "/ignore=interlock" from your backups is not a solution to that problem.

it depends

Wim Van den Wyngaert · ‎05-13-2009

We have a almost perfect backup. We monitor backup size, missing files, etc.

Just a story ...

A DSM application of ours didn't have transaction log. So, they kept files on a different disk to redo the transactions in case the disks should fail and they had to start with yesterdays backup. But much later, disks were merged and the redo files were placed on the same disk as the DSM db ...

Wim (out of work by the end of June)

Wim

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Backup taking locks ?

Backup taking locks ?