Operating System - OpenVMS
1835183 Members
2285 Online
110077 Solutions
New Discussion

Re: Backup gets -RMS-E-FLK (file currently locked) on /LIST file

 
Ken McNulty
Advisor

Backup gets -RMS-E-FLK (file currently locked) on /LIST file

I submit a batch job to a queue which contains the following BACKUP command:-

$!
$ backup/record/ignore=interlock/block=16384/nocrc/group=0-
/list=sysdsk:[rolls.data]SUNDAY_AUTOMATIC_NEVR02_save.lis -
NEVR02:[ab_*...],[ashdale*...],[ftp_logs...],[hht_*...],[host...], -
[msg_*...],[pension*...],[psoft_*],[routemaster...], -
[scada_files]/since=backup -
$1$MKB500:SUN_NEVR02.bck/label=SUNSAV
$!
This job fails with the following error:-

%BACKUP-F-OPENOUT, error opening
SYS$SYSDEVICE:[ROLLS.DATA]SUNDAY_AUTOMATIC_NEVR02_SAVE.LIS; as output
-RMS-E-FLK, file currently locked by another user

This file is created by BACKUP, so how can anything else get a hold of it? Even if there was an previous version, which there isn't because we clear the old logs - I've checked -, it should simply create a later version, shouldn't it? After the failure I check the file location and the file isn't there.

These jobs have been running since 1993, with the latest incarnation being created in July 2004. This was to cater for the DSSI based disk subsystem being replaced with an RA230 (KZPAC?) RAID Array controller. Disks are now configured as 2-disk mirror sets. Could this be something to do with synchronising writes between the two disks? We have 11 sites with this configuration. So far I have only noticed this problem on one site - and even then, not every day. eg last week the saves were successful 3 days out of 7. This has been happening for a number of months.

Configuration
Alpha 1000a
TZ88 DLT Tape drive.
OpenVMS 7.2-1
17 REPLIES 17
Uwe Zessin
Honored Contributor

Re: Backup gets -RMS-E-FLK (file currently locked) on /LIST file

The RAID controller should hide the existence of two physical disks and keep them block for block identical. If you suspect that they are not (there have been error handling suggestions circulating around, even from Compaq service, that resulted in inconsistencies), I suggest you:

- run an ANALYZE/DISK_STRUCTURE when the volume is quiet

- and run a 'parity check' on the RAID controller (I know that mirrored disks don't have parity, but if I recall correctly, it is always called that way) to have it check for differences between the disks
.
Hein van den Heuvel
Honored Contributor

Re: Backup gets -RMS-E-FLK (file currently locked) on /LIST file


Are these jobs per-chance scheduled just around midnight? Do they start out by resubmitting themself? For example /after=tomorrow
Could there be a time difference in the cluster environment to cause a resubmitted job to clash with its submittor? How about submitting for "tomorrow + 1:0:0", or even somewhat randomize the start minute, perhaps based on the current second, or pagefault count or whatever.

Could yesterday's job still be running? Allthough I think that would just create a newer version...

How about the directory [rolls.data] could that be locked? The error from that is undistinguishable from the file being locked.
For example:
$ open/read/write x tmp.dir
$ create/log [.tmp]tmp.tmp
%CREATE-E-OPENOUT, error opening U$1:[HEIN.TMP]TMP.TMP; as output
-RMS-E-FLK, file currently locked by another user
$ close x
$ create/log [.tmp]tmp.tmp
Exit
%CREATE-I-CREATED, U$1:[HEIN.TMP]TMP.TMP;3 created

There is no file version mentioned in your error message, so it is not due to a poorly constructed logical name.

hth,
Hein.



Ken McNulty
Advisor

Re: Backup gets -RMS-E-FLK (file currently locked) on /LIST file

Hi guys, thanks for your input. The saves are submitted manually at various times depending on bakery production times. The "Automatic" bit refers to automatic selection of options. However, the saves that I raised the call on ran at 14:20. Individual days save logs are preceeded by the day name, thus yesterday's save log would have been SATURDAY_AUTOMATIC_NEVR02_SAVE.LIS. Last weeks file might have been hanging around, but, as you say, you would just expect a new version. I have placed various debug statements in the DCL and one of them is "DIR SYSDSK:[ROLLS.DATA]dayname*.*" and I can confirm that the file does not show up prior to the submission of the backup. It could be that the job is being submitted twice - on some days. I have placed "SHOW QUEUES" in the job and written time stamps to a trace file. The results indicate that no other job is running. I mentioned the RAID sets because I was wondering if this was something to do with caching writes. Last week's files are deleted immediately prior to the job submission. Might it be that the deletes are still being processed in cache and propogated to both members of the mirror set, as the backup job is trying to create its list file? SHOW DEV/FU shows that write-through caching is enabled, so that doesn't seem very likely, but it's a possibility.

Last night's save worked OK (at 19:20), so I can't try anything. I did make a slight amendment -
the original Input Qualifier had spaces between the commas and the line continuation character (-). I have removed these, but I admit I'm clutching at straws. I'll give the ANALYZE a go at the next failure.

Once again, thanks.

Ken
Uwe Zessin
Honored Contributor

Re: Backup gets -RMS-E-FLK (file currently locked) on /LIST file

The RAID controller does not interact with the OpenVMS lock manager, even if you were using writeback caching.
.
Willem Grooters
Honored Contributor

Re: Backup gets -RMS-E-FLK (file currently locked) on /LIST file

I think the situation is that the file is being accessed for backing it up (copying it to tape) and that BACKUP is trying to write to it the same instance at the same time. Could be a matter of disk activity (a performance problem?). You could try to expliciltly exclude the file from backup (/EXCLUDE=), that would prevent this to happen, or locate it on another disk, not effected with this backup.

Is there other system management software running checking for this type of file, or accessing the disk at the moment of the error?
Willem Grooters
OpenVMS Developer & System Manager
Hein van den Heuvel
Honored Contributor

Re: Backup gets -RMS-E-FLK (file currently locked) on /LIST file

Willem, that was my initial thought also, but then I read the original post in more detail and it is clear that the error is reported for the BACKUP LIST FILE, not about a file to be backed up.

I was also thinking that /record might be acting on the directory, but surely the list file is opened first?

Now it may be a problem with error reporting, and the real problem might be like you suspect.

Ken wrote:
" Might it be that the deletes are still being processed in cache and propogated to both members of the mirror set"

There is not such thing as propogation. Backup will do a single IO and it will affect both members. Rigth there, rigth then. Never will the members be out of sync from the eyes of the application.


Ken,
We assume this job dies right-away as the backup starts? Or is backup going for a while? Check the accounting data in the end of the bactch job, or get timestamps in he bathc log with for example:
$ SET PREFIX "(!5%T) "

Once (if!) you get a reproducer, you may want to try a run with SET WATCH FILE/CLA=MAJOR. Actually... you might just want to do a test run with that anyway, just to get a better impression of what backup is touching and in which order.

Like you said.. straws...

Hein.





Bojan Nemec
Honored Contributor

Re: Backup gets -RMS-E-FLK (file currently locked) on /LIST file

Ken,

Another idea.
Is the directory sysdsk:[rolls.data] version limited? If you set the version limit on the directory to 1 with $ SET DIRECTORY/VERSION_LIMIT=1 and try to open a second file with the same name, no file with new version will be created, but you receive the RMS-E-FLK error.

You can do a dir/full sysdsk:[rolls]data.dir and check if there is a Default version limit in the file attributes.

Bojan
Hein van den Heuvel
Honored Contributor

Re: Backup gets -RMS-E-FLK (file currently locked) on /LIST file

Bojan,

Good suggestion! That may well be the problem.
One refinement though... just the file being there would not stop it. It needs to be open to cause the error:

$ show def
U$1:[HEIN]
$ set dir/vers=1 [.tmp]
$ cop nl: [.tmp]a.tmp/log
%COPY-S-COPIED, _NLA0: copied to U$1: [HEIN.TMP]A.TMP;1 (0 records)
$ cop nl: [.tmp]a.tmp/log
%COPY-S-COPIED, _NLA0: copied to U$1:[HEIN.TMP]A.TMP;2 (0 records)
$ open/read/write x U$1:[HEIN.TMP]A.TMP;2
$ cop nl: [.tmp]a.tmp/log
%COPY-E-OPENOUT, error opening U$1:[HEIN.TMP]A.TMP; as output
-RMS-E-FLK, file currently locked by another user
%COPY-W-NOTCOPIED, _NLA0:[].; not copied
$ backup/list=[.tmp]backup.tmp *.tmp tmp.bck/save
$ backup/list=[.tmp]backup.tmp *.tmp tmp.bck/save


Hein.
Bojan Nemec
Honored Contributor

Re: Backup gets -RMS-E-FLK (file currently locked) on /LIST file

Hein,

I intended just that, but when writing the post I forgot to write that the file must be open.

You can have more than 1 in version limit, but you receive the error if the version limit is reached and the file with the smallest version is open (for writing).

And curious, when the file is open for reading, the file is deleted (marked for delete and removed from the directory) but you can read it unles you close it.

Bojan
Wim Van den Wyngaert
Honored Contributor

Re: Backup gets -RMS-E-FLK (file currently locked) on /LIST file

Note that he said that just before the backup the file didn't exist !

Could it be that not the file but the volume or directory is locked ?

Wim
Wim
Willem Grooters
Honored Contributor

Re: Backup gets -RMS-E-FLK (file currently locked) on /LIST file

Just got the idea, as an extension of Hein's suggestion, that there may be no room available on the disk for extending the listfile.
Willem Grooters
OpenVMS Developer & System Manager
Wim Van den Wyngaert
Honored Contributor

Re: Backup gets -RMS-E-FLK (file currently locked) on /LIST file

Willem,

"currently locked" caused by "out of diskspace" ? That would be a surprise ...

Wim
Wim
Ken McNulty
Advisor

Re: Backup gets -RMS-E-FLK (file currently locked) on /LIST file

Hi Guys,
Sorry I haven't been back for a while. I've (at least) two hats and I've had to look at some Windows CommVault Backup problems. I've also had a different problem on another Alpha. A Techie's work is never done! The backup is behaving itself still. However, I discovered we had the same problem on another server on Saturday, which nobody noticed. On this occasion, the batch file contained 3 backup jobs, each of which tried to create a different .LIS file and each of which failed with the same error. This server had just been swapped with a spare because the original developed a strange fault whereby it just started rebooting itself for no apparent reason. In other words, it started behaving like XP. Nothing in OPERATOR.LOG, nothing in ERRLOG.SYS, nothing in ACCOUNTNG.DAT. Nothing but normal operations and Time Stamps, then suddenly, reboot messages. We put it on a UPS, in case it was being spiked - it got worse. We had a spare so we connected the RAID disk enclosure to that and it all worked OK (except now I'm getting "SWXCR-DRA: Shelf error on channel 0" but that's another story). My reason for mentioning this, is that the first set of saves we tried after that got the -RMS-E-FLK error. As far as I know, they weren't having a problem until we "disturbed" the disks. However, it's beeen OK since (3 nights). Coincidence?

As far as I am aware, these problems did not occur before we changed from DSSI JBOD disks to SCSI RAID.
John Gillings
Honored Contributor

Re: Backup gets -RMS-E-FLK (file currently locked) on /LIST file

Ken,

Like Hein suggested, I think it's the enclosing directory. Something is locking it, preventing the creation of the listing file. Try enabling some AUDITs:

$ SET AUDIT/AUDIT
/ENABLE=(FILE=FAIL=ALL,CREATE)

and also place ALARM ACEs on the enclosing directories:

$ SET SECURITY DATA.DIR
/ACL=(AUDIT=SECURITY,ACCESS=WRITE+SUCCESS+FAIL)

After the event, go back over the audit trail to see who/what was messing with the directories.

When I simulate your symptom by opening a .DIR file exclusively and asking BACKUP to put a listing file in it, I see an "Object creation" event which fails with SYSTEM-W-ACCCONFLICT, followed immediately by a (successful?) "Object access" against my locked directory, both from BACKUP.EXE.

Unfortunately this doesn't give you a "smoking gun", but hopefully it will reveal more of what's going on.

Once you've confirmed that pattern, backtrack to try and identify which other processes have touched the directory in question - you may need to increase the scope of the AUDIT ACE, maybe to include READ access?
A crucible of informative mistakes
Ken McNulty
Advisor

Re: Backup gets -RMS-E-FLK (file currently locked) on /LIST file

Hi. The original server has stopped having problems, they've moved to the other server I mentioned. Sat, Sun failed, Mon, Tue, Wed OK, Thur fail. However, I've put debug statements into the one that is no longer failing. Now, the programmer in me tells me that if a fault goes away when you debug statements in your program, then you've probably got a timing problem. I like the idea of something locking the directory - my suspicion is still focussed on the pre-backup deletes. Nothing else should be using that directory. I'm going to put 10 second waits into the failing backup and I'll see what happens over the weekend. Thanks for all help so far.
John Gillings
Honored Contributor

Re: Backup gets -RMS-E-FLK (file currently locked) on /LIST file

Ken,

It can't be anything "normal" opening the directory, as any conflicting accesses would be queued. It would have to be something that explicitly locks the directory. I'm not aware of any OpenVMS components that would do that.

It may be worth leaving audits on that directory for a while to see what touches it.
A crucible of informative mistakes
Ken McNulty
Advisor

Re: Backup gets -RMS-E-FLK (file currently locked) on /LIST file

Hi John. I'll put some audits on as you suggested. Current status is:- Save with Debug Statements - Fri fail; Sat, Sun success. Save with 10 second wait - Fri, Sat,Sun success.

Oh, by the way, there are no version limits on the .LIS file or it's enclosing directories.
Thanks