HPE Community read-only access December 15, 2018
This is a maintenance upgrade. You will be able to read articles and posts, but not post or reply.
Hours:
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Incorrect file size returned in JRuby (Java in general?) for file open for write

 
BenAArmstrong
Frequent Advisor

Incorrect file size returned in JRuby (Java in general?) for file open for write

In JRuby when appending to a file where sync=true, I'm getting strange behaviour that makes me believe the file size isn't being read correctly on the currently open file. And indeed, in the Ant release notes at http://h30266.www3.hp.com/odl/axpos/progtool/hpant171/ANT_IGUIDE_RN.PDF, I find this:


Known Issues


• Failures with several org.apache.tools.ant.types.selectors that rely on obtaining the current file size from “C” functions fstat() or stat(). These functions fail to return the correct file size under the following condition:

 

    o Files still open for write and have been modified

 

This makes it sound like it's not a problem peculiar to JRuby, but that applies to Java on OpenVMS in general. So I re-read the JDK User Guide to see if anything is mentioned there about the problem. It isn't, but I (re-)discovered that if you define java$file_open_mode 2, every write is synced to disk. Well, that's what "sync=true" is supposed to be doing anyway, so maybe this points towards a solution ...

 

Any other workaround I may be missing?

 

Ben

 

 

15 REPLIES
John Gillings
Honored Contributor

Re: Incorrect file size returned in JRuby (Java in general?) for file open for write

Ben,

 

   An example would help understand your issue. What is being used to determine the size? What size is being returned? Are you reading it from the process that has the file open?

 

Remember that RMS is supremely lazy. It doesn't update anything unless it's absolutely necessary.

 

Please give more detail about what you're doing, what results you're getting and what results you want. Cut and paste exact text of commands issued and messages returned would be preferred.

A crucible of informative mistakes
Ph Vouters
Valued Contributor

Re: Incorrect file size returned in JRuby (Java in general?) for file open for write

Ben,

 

C runtime based languages such as Ruby or JRuby via Java use the lstat, stat or fstat C functions. The struct stat field being used in this case is size. Even Unix compatible ls bases its code upon stat and lstat. Have a look on your OpenVMS system at $ help  crtl {stat | lstat | fstat}. Query Google about these C functions examples. Querying Google, this may lead you to some of my articles on my Web site.

 

Philippe

Ph Vouters
Valued Contributor

Re: Incorrect file size returned in JRuby (Java in general?) for file open for write

Ben,

 

As a complement to your question.

 

In Ruby or JRuby you might have to indirectly use the C runtime fflush + fsync functions to force the underlying file system to a disk physical write. fflush + fsync are Ruby's IO clacc methods. Refer to http://www.ruby-doc.org/core/classes/IO.html Using fflush + fsync on an opened file may then return you a correct file size using {stat | lstat | fstat ] stat struct size field.

 

Just a guess about what you're doing and observing.

 

Philippe

BenAArmstrong
Frequent Advisor

Re: Incorrect file size returned in JRuby (Java in general?) for file open for write


John Gillings wrote:

Ben,

 

   An example would help understand your issue. What is being used to determine the size? What size is being returned? Are you reading it from the process that has the file open?

 

Remember that RMS is supremely lazy. It doesn't update anything unless it's absolutely necessary.

 

Please give more detail about what you're doing, what results you're getting and what results you want. Cut and paste exact text of commands issued and messages returned would be preferred.


I have been hard at work in parallel with this query trying to distil from the JRuby code a pure Java example that would demonstrate the issue. I have also enlisted the help of the JRuby developers to try to refine my Java example (which currently fails to reproduce the problem) into one that actually reproduces the problem. I could provide you with that code, but since it doesn't reproduce the issue, I'm sure it won't be much help. I will therefore provide you with the Ruby code, but I have my doubts about whether that will make things any clearer, as I don't expect many reading this to actually run JRuby on OpenVMS (well, with the exception of Philippe):

 

File.open('output.tmp','w'){|_|_.puts 'Line one'}
File.open('output.tmp','a'){|_|_.sync=true ; _.puts 'two'; _.puts '3'}
File.read('output.tmp')
=>  "Line one\n\n"
# expected output: => "Line one\ntwo\n3\n"

It appears that after every write, the file position is reset to the old end of file, not the new end of file, and that file writes are happening before and after each newline is emitted.

 

When I find the actual Java construct that does this I will assemble it into a working example which I will post into this thread. But since it has proven difficult to find so far, I had hoped to gain a bit more insight by posting here, drawing on the collective experiences of those who know OpenVMS Java better than I do.

 

Ben

p.s. One theory I have explored with the JRuby developers is that since JRuby does different things if the file is opened read/write/append vs. write/append, changing the mode from 'a' to 'a+' might change the behaviour. Unfortunately, this theory didn't pan out. The behaviour is the same regardless of which mode is used.

BenAArmstrong
Frequent Advisor

Re: Incorrect file size returned in JRuby (Java in general?) for file open for write


Ph Vouters wrote:

Ben,

 

As a complement to your question.

 

In Ruby or JRuby you might have to indirectly use the C runtime fflush + fsync functions to force the underlying file system to a disk physical write. fflush + fsync are Ruby's IO clacc methods. Refer to http://www.ruby-doc.org/core/classes/IO.html Using fflush + fsync on an opened file may then return you a correct file size using {stat | lstat | fstat ] stat struct size field.

 

Just a guess about what you're doing and observing.

 

Philippe


Yes, I did read this already:

 

         Also be aware that for st_size to report a correct value, you
         need to flush both the C RTL and RMS buffers.

 

But I did not know what to do with this information, as so far as I know, Java does not provide me with the means to directly flush RMS buffers. Also, since I have not yet found the exact line in the JRuby source responsible for misreading the file size, but have only inferred this is what is happening from the test results, I don't even know if the code may already be doing an fflush + fsync.

 

By the way, I did not decide to explicitly set sync=true in any of our application code. That is done for me by the standard Ruby library Logger class. Here is a simple failing test case using Logger:

 

require 'logger'

log=Logger.new 'output.tmp'
log.info 'Test'
log.info '2'
log.close
puts File.read('output.tmp')

failing output:

I, [2011-09-28T07:58:11.724000 #227180285]  INFO -- : 2
st
er.rb/99999

expected output:

# Logfile created on Wed Sep 28 07:55:37 ADT 2011 by /
I, [2011-09-28T07:55:41.610708 #539703601]  INFO -- : Test
I, [2011-09-28T07:55:44.973787 #539703601]  INFO -- : 2

 

When I examined the logger.rb code, I found that the line of code that makes the difference is "logdev.sync = true" in the following excerpt:

 

    def create_logfile(filename)
      logdev = open(filename, (File::WRONLY | File::APPEND | File::CREAT))
      logdev.sync = true
      add_log_header(logdev)
      logdev
    end

 

Since Rails uses Logger, the result was that every line of output Rails wrote to the log "piled up" at the beginning of the file, each line overwriting the previous one, as they all were apparently written with the file position reset to 0 each time.

 

Ben

 

BenAArmstrong
Frequent Advisor

Re: Incorrect file size returned in JRuby (Java in general?) for file open for write

By the way, I do have a workaround for this issue now, but one I'm not happy with. I have noticed that if I force synchronous writes with define java$file_open_mode 2, the bad file positioning bug vanishes. However, since this is a global control that not only applies to the log file but to all files I might write from Rails, and there's a performance hit to making all writes synchronous, it is a really ugly workaround. It will do for now, but I am seeking a better solution that I can target at just the failing case without such unpleasant side-effects.

 

Also, as I have mentioned in another thread, I have some code which needs to do file sharing. As a quick hack, I would have used java$file_open_mode = 3, except now that I need to put this java$file_open_mode = 2 workaround in place for the sake of correctly writing the log file, I need to use java$filename_match_list instead (though as discussed in that thread, it is a better solution anyway, since it is focused on just the files that need to be shared, and therefore has less of a chance of causing bad side-effects). OK, so now maybe I'm splitting hairs because I already have a workaround for this, but isn't it odd that the two settings have been made mutually exclusive, since it might have been useful to use both together?

 

Ben

 

H.Becker
Honored Contributor

Re: Incorrect file size returned in JRuby (Java in general?) for file open for write

In Ruby I would try an fsync. In your initial example:

 

File.open('output.tmp','w'){|_|_.puts 'Line one'}
File.open('output.tmp','a'){|_|_.puts 'two'; _.puts '3'; _.fsync}
File.read('output.tmp')
In the logger example, because the Logger object has no fsync method you may be able to create a logger from a file, like
file = File.open('output.tmp','a')
log = Logger.new(file)
...
file.fsync
log.close
BenAArmstrong
Frequent Advisor

Re: Incorrect file size returned in JRuby (Java in general?) for file open for write


H.Becker wrote:

In Ruby I would try an fsync. In your initial example:

 

File.open('output.tmp','w'){|_|_.puts 'Line one'}
File.open('output.tmp','a'){|_|_.puts 'two'; _.puts '3'; _.fsync}
File.read('output.tmp')
In the logger example, because the Logger object has no fsync method you may be able to create a logger from a file, like
file = File.open('output.tmp','a')
log = Logger.new(file)
...
file.fsync
log.close


Yes, I realize that by not making all writes to the file synchronous with sync=true and then explicitly fsyncing where needed I can get equivalent behaviour without the bug, but this is far too intrusive a change. I need something that makes 'sync=true' work properly on VMS, so that I don't need to hack the core ruby library & any third-party code that uses the construct with VMS-specific exceptions, but something that is not as drastic as forcing all Java writes to be synchronous.

 

Ben

 

H.Becker
Honored Contributor

Re: Incorrect file size returned in JRuby (Java in general?) for file open for write

As far as I understand the documentation, sync=true flushes the Ruby buffers, not the CRTL and/or RMS buffers. fsync calls the CRTL (or OS) supplied fsync, which ensures the data is written to the disk. 

 

Is there a VMS system on the net with JRuby to do some tests?

John Gillings
Honored Contributor

Re: Incorrect file size returned in JRuby (Java in general?) for file open for write

Ben,

 

   I don't know anything about JRuby, but in general, I like to open log files as SHARED WRITE. This means you can write to the file without worrying about flushes. When you want to check the file from another process, open the file for APPEND SHARED WRITE access and let RMS worry about synchronising with the writer, then close the file immediately. From DCL it looks like this:

 

$ OPEN/APPEND/SHARE=WRITE synch OUTPUT.TMP

$ CLOSE synch

 

The contents and EOF are now up to date, and can be accessed by readers up to the point of the most recent close. Repeat as necessary.

 

There should be similar constructs in most languages. 

 

There's a small cost associated with a shared write (but remember, most of the time, there are no sharing processes, so not much ongoing overhead), but you avoid the cost of synchronous writes for every output.

 

As I said, I don't know anything about JRuby, so I'm not sure of the semantics of your code. Are you reopening the file for every line of output? If you keep the file open and just write sequentuially to the channel (or whatever the file object is called), does it still overwrite the first record?

A crucible of informative mistakes
BenAArmstrong
Frequent Advisor

Re: Incorrect file size returned in JRuby (Java in general?) for file open for write

The point is not to find the Ruby construct that would accomplish synchronized writing, but rather to improve the JRuby port for OpenVMS so that when sync is turned on, it "just works". A correct solution will not change what the application programmer has to write, but should be transparent, have the same semantics as on other platforms, and have no undesirable side effects. Here is the documented behaviour of sync=true:

 --------------------------------------------------------------- IO#sync=
     ios.sync = boolean   => boolean
------------------------------------------------------------------------
     Sets the ``sync mode'' to true or false. When sync mode is true,
     all output is immediately flushed to the underlying operating
     system and is not buffered internally. Returns the new state. See
     also IO#fsync.

        f = File.new("testfile")
        f.sync = true

     (produces no output)

 

So from what I can gather, I need to understand the Java source code handling synchronized writes, which I have still not fully grokked, as it is split across multiple layers in different Java classes and then modify it it to flush so that the position is correct before writing the appended material. It is doubtful if we will get much further in this thread until I can at least provide a working Java reproducer, distilled from the JRuby source, which shows the problem, but thanks for the responses so far.

 

Ben

 

Ph Vouters
Valued Contributor

Re: Incorrect file size returned in JRuby (Java in general?) for file open for write

Ben,

 

As a real C based Ruby has not been ported to OpenVMS and the only thing available is JRuby which is a Java based Ruby implementation, I do suggest you stay at Java level to not add more complexity to the problem. Build up the as simple as possible Java source from this code extract at http://www.exampledepot.com/egs/java.io/Sync.html. This pure Java code in this Web document clearly makes two distincts calls : os.flush() and fd.sync(). You should note it never mentions any specific operating system.

 

Also, and I have been confrontated to, on Unix/Linux you must distinguish between buffered on non buffered writes. On Unix/Linux all the difference lays between each time calling the C write function versus calling fwrite. The C write on Unix/Linux is a system call whereas fwrite is a C library function. One main difference though with your working environment, OpenVMS is not a Unix like operating system. So on OpenVMS do $ help CRTL [f]write to read what the CRTL actually claims.

 

As I noticed it buffered versus non buffered file writes makes all the performance difference on Unix/Linux. In consequence, my guess is that Java only calls the underneath C fwrite (buffered) and this would compell Java code to os.flush() followed by fd.sync(). You may test the Java code in the URL I provide in various conditions.

 

In this hope this will help clarify the debate.

Philippe

BenAArmstrong
Frequent Advisor

Re: Incorrect file size returned in JRuby (Java in general?) for file open for write

Thanks, Philippe. Clearly I need to do more study on this, and I will certainly try out the test code.

Ph Vouters
Valued Contributor

Re: Incorrect file size returned in JRuby (Java in general?) for file open for write

Ben,

 

The way I understand these CRTL functions, fflush flushes out the CRTL file buffers to the underneath RMS buffers meanwhile fsync causes a flush of  the RMS buffers to the physical medium. Because { stat | lstat | fstat } as said to return an updated st_size field when flushing all buffers to disk, this would mean you need in any Java code (even more using Java/VMS) to use Java's os.flush() and fd.sync()

 

This is also likely the reason why Ruby and thus Jruby implement the fflush and fsync methods as I pointed out in a Ruby's URL I previously mentionned.

 

In the hope this clarifies your technical knowledge regarding your JRuby concern.

Hoff
Honored Contributor

Re: Incorrect file size returned in JRuby (Java in general?) for file open for write

This environment is using Unix tools and expecting the Unix file system model here, and not VMS.

 

Were it feasible, the easiest would be to entirely disable locking and the file system blade guards.

 

Given that is generally not feasible, try configuring DECC$FILE_SHARING for full (sharing) access.

 

See if this sharing allows your file processing to continue.

 

Given VMS made different choices deep within its I/O stack, and given VMS lacks foreign file system interfaces, it's difficult to completely and correctly emulate the Unix file system model on VMS; various compromises do exist here.