Operating System - OpenVMS
1839230 Members
2964 Online
110137 Solutions
New Discussion

Re: What could cause this rename error?

 
Jess Goodman
Esteemed Contributor

What could cause this rename error?

%RENAME-E-OPENIN, error opening !AS as input

-RMS-F-RMV, ACP remove function failed

-SYSTEM-W-ACCONFLICT, file access conflict

 

This error is actually  generated by a LIB$RENAME_FILE call.  The user-error-procedure gets called with:

RMS-STS = RMS$_RMV

RMS-STV = SS$_ACCONFLICT

error-source = 2 (Error renaming file)

 

A simple test shows, as is well know, that even if a file is open for exclusive access (no-read-share, no-write-share) a rename operation will succeed.  So what could be causing a file access conflict on an ACP remove operation?

 

We get these errors occasionally from several unrelated applications.  They all create a new file with a temporary file name, write to it, close the file, and then rename the file to a final name.  Many of these appllications use LIB$RENAME_FILE, but some use a C RTL rename() call  (I cannot verify the RMS-STV value in the latter case).

 

Sometimes several different batch jobs running the same application will blow up from this same LIB$RENAME_FILE error at the same time.  They are using different temporary and final file names, but they do use the same directory.  So it could be that the file access conflict might be for the directory file itself, but this directory is not being accessed by Advanced Server, CIFS, VMS BACKUP, or a disk defragmenter; only by normal RMS operations.

 

VMS 8.4 Itanium with UPDATE 8.

I have one, but it's personal.
11 REPLIES 11
John Gillings
Honored Contributor

Re: What could cause this rename error?

Jess,

 

   I think I've seen this before (but don't have access to deep history at the moment). I don't think the conflict is with the file, it's the directory. The clue is the RMV secondary condition.

 

It might be interesting to build a test harness that runs parallel create-write-close-rename cycles to see if you can repeat a failure on demand.

 

If you can, then maybe try a retry loop for this specific condition, with sanity limit and possibly a randomised short delay between retries, (like ethernet ;-).

 

If that works, then workaround the issue with, a higher level RENAME function which hides the retries.

 

It may also be interesting to look at the DCL RENAME sources to see if they do anything special with the condition..

A crucible of informative mistakes
Hein van den Heuvel
Honored Contributor

Re: What could cause this rename error?

Hello Jess,

 

I do not know what is behind your problem, but it is easy enough to force this error:

 

$ open/read/write/share=read x tmp.dir
$ rename/log [.tmp]x.tmp y
%RENAME-E-OPENIN, error opening SYS$SYSDEVICE:[HEIN.TMP]X.TMP;1 as input
-RMS-F-RMV, ACP remove function failed
-SYSTEM-W-ACCONFLICT, file access conflict

 

Maybe you have a tool doing directory maintenance, or a tool doing direct, private, directory file reads while not allowing shared write ?/

 

 

Sometime  file access failure auditing can give you better insight in when/where.

If the error condition last for a short while, then maybe you can issue a SHOW DEV/FILE in time?

Maybe a quick tool can be written to monitor the directory file lock? (holder)

 

Hopefully this gives you some ideas,

Regards,

Hein.

 

John McL
Trusted Contributor

Re: What could cause this rename error?

Jess, how large is the .DIR file that being accessed?  How often are files created, renamed and presumably deleted or moved out of the directory? Please provide some example filenames used in the sequence.

 

I'm wondering if the problem lies in excessive .DIR size (causes extra disk I/O) and operations that might make things worse (e.g. rename to files with longer names, which would cause potential push-out on .DIR size)

Jess Goodman
Esteemed Contributor

Re: What could cause this rename error?

Thanks for the replies.  I've been busy attempting to create test programs that would reproduce the problem, as John Gillings suggested.  So far not much luck,  although on one (and only one) attempt I did get an error from my test program on a call to LIB$RENAME_FILE.  It was the "flip-side" error for RMS$_RMV...

 

%RENAME-E-OPENIN, error opening !AS as input
-RMS-F-REENT, file could not be renamed and recovery failed; file has been lost
-SYSTEM-W-ACCONFLICT, file access conflict

 

which pretty much confirms that the access conflict is for the directory file itself.

 

Hein, no we are not using any tool that opens a directory as a file in any mode.  We only use them as directories using standard RMS:

 

  • High-level language (Fortran, C, C++) opens of data files in the directory
  • LIB$ calls (LIB$FIND_FILE, LIB$RENAME_FILE, LIB$DELETE_FILE)
  • C RTL calls (OPENDIR, READDIR, RENAME, REMOVE, TEMPNAME)
  • DCL commands (DIRECTORY, CREATE, DELETE,  PURGE, COPY, RENAME, OPEN, TYPE, F$SEARCH)

 

John Mcl, the errors have occurred from different applications in different directories on different volumes.  Overall, our cluster has a very high rate  of file creation.  Picking one directory for which we seem to get  a RMS$_RMV error every week or so (and sometimes two or three at the same time)...

 

  • The directory file is 200 blocks.
  • It contains around 2700 files (all versions).
  • It has over a 1000 unique file names (top version).
  • A new version of every file is created every 15 minutes.
  • Oldest version of each file gets deleted due to version limits.
  • Final file names are 6-19 characters with 3 character file types.
  • Temporary file names contain the creating-process PID, or  the  last six digits of the PID, and  enough of the final file  name to be guaranteed unique.

Example temporary/final names:  INMSIRSE.GIF0BC2F3/INMSIRSE.GIF

 

Thanks for the ideas.  Any more?

 

Jess

I have one, but it's personal.
John Gillings
Honored Contributor

Re: What could cause this rename error?

Jess,

   Hein's observations have given me an idea...

 

>Maybe you have a tool doing directory maintenance, or a tool doing direct, private, directory file reads while not allowing shared write ?/

 

   A trick for forcing a file to be write shared is to open is SHARE=WRITE before whatever is trying SHARE=READ.

 

  So, for the duration of whatever processing is failing, have a process which holds the directory open:

 

$ OPEN/READ/WRITE/SHARE=WRITE FORCE_WRITEABLE dev:[dir]YOURDIR.DIR

$ !  wait for processing to complete

$ CLOSE FORCE_WRITEABLE

 

  This should keep the road clear for your RENAME operations. What it may also do is reveal who or what is opening the file for exclusive read:

 

$ open/read/write/share=write mydir temp.dir
$ rename  [.temp]afile.txt .new
$ open/read/write/share=read try1 temp.dir
%DCL-E-OPENIN, error opening DISK_GS:[GILLINGS_J]TEMP.DIR; as input
-RMS-E-FLK, file currently locked by another user
$ close mydir

 maybe enable file access failure auditing as well.

A crucible of informative mistakes
Jess Goodman
Esteemed Contributor

Re: What could cause this rename error?

John,

 

-RMS-E-FLK, file currently locked by another user

 

is a differnt error than

 

-SYSTEM-W-ACCONFLICT, file access conflict

 

The RMS$_FLK error is caught by RMS using its file serializtion lock (RMS$s...) before RMS attempts to open the file.  The SS$_ACCONFLICT is from the file system itself, which just checks bits in the File Control Block for files already open on the same node, and its arbitration lock (F11B$a...) for files that are open elsewhere in the cluster.

 

I am testing file access failure auditing.

 

Jess

 

I have one, but it's personal.
abrsvc
Respected Contributor

Re: What could cause this rename error?

I would look at the filenames where the error occurred if you have them logged. Perhaps a pattern will immerge. My thought here is that the problem may be happening whena directory block is freed up and the rest of the directory data is being "moved up" to fill in the gap. SImilar to deleteing an entire directory where you see pauses. I would only expect to see this with high activity rates and large directories though...

Dan
Hein van den Heuvel
Honored Contributor

Re: What could cause this rename error?

Jess, read John's suggestion again. It is a really good idea.

 

My earlier suggestion was there perhaps there is a process X locking the directory against write causing an innocent party Y to run into the RMV (or REENT) error.

By 'pre-locking' the directory allowing write access any opener which tries NOT to allow write (process X) will run into a file locked error and now the guilty party (X) will be 'punished' instead of the innocent bystander (Y).

 

Dan... looks like the application Jess refers to pre-creates with long file extension including a PID and then makes the files avaiable with a simple 3 char extension. It hard to imagine those those renames will cause a down-shuffle but I suppose it coudl happen. It would have to be a rename for the first=last=only entry in a block and the new name would have to just fit in the prior block when the original name did not. I support that coudl and would happen over time,

Delete's are more likely to outrigth empty a (directory) block.

The file creations could and would certainly cause shuffles up.

 

fwiw,

Hein

 

 

John McL
Trusted Contributor

Re: What could cause this rename error?

Your file creation etc doesn't look too excessive but you mention the use of "C RTL calls (OPENDIR, READDIR, RENAME, REMOVE, TEMPNAME)"

 

Surely you're also using CLOSEDIR?

 

Some thoughts ...

 

Have you checked for conflicts between OPENDIR/READDIR/CLOSEDIR and normal RMS operations?

 

Is there any chance that

(a) the CLOSEDIR is not being executed every time, or

(b) that you are opening the same directory stream multiple times but not closing the correct number of times, or

(c) that your are opening a second directory stream and overwriting the directory stream data before you've done a CLOSEDIR?

 

Also, is the problem specific to a language or runtime library, not just at the point of failure but what occurred in the recent past regards I/O and file operations? 

GuentherF
Trusted Contributor

Re: What could cause this rename error?

There are some backup/archiving tools which open directory files exclusively to scan for alias name entries.

 

/Guenther

Jess Goodman
Esteemed Contributor

Re: What could cause this rename error?

I think I tracked down what is causing the errors for one of my problem directories.  The graphic files in this directory are served by the OSU webserver.  In both web access logs at the times that the last two RENAME errors occurred for directory [SIR] I found this request:

 

HEAD /rooted_logical/sir

 

Apparently this request causes the webserver to open the directory file for read access with only read sharing allowed.

 

Jess

 

I have one, but it's personal.