Operating System - OpenVMS
1752275 Members
5000 Online
108786 Solutions
New Discussion юеВ

Re: Extra bytes at EOL interpreted as linefeeds from Java println() method?

 
Tom Musson
Advisor

Extra bytes at EOL interpreted as linefeeds from Java println() method?

I have a simple stuttering version of "hello world" written in Java. It is:

$ type HelloWorld.java
public class HelloWorld {

public static void main(String args[]) {

System.out.println("Hello World!");
System.out.println("Hello World!");

}
}

Compiling/executing it provides the expected output to the terminal screen:

$ java -cp . "HelloWorld"
Hello World!
Hello World!
$

However, if the output is directed to a file and viewed in an editor, an extra linefeed is interpreted (due to what looks like extra null characters?).

$ pipe java -cp . "HelloWorld" >test_output.txt
$ typ test_output.txt
Hello World!
Hello World!
$ edit test_output.txt
Hello World!

Hello World!

[End of file]
o
o
o
$ dump test_output.txt
Dump of file TEST_OUTPUT.TXT;1 on 23-JUN-2008 14:48:52.91
File ID (23650,4604,0) End of file block 1 / Allocated 5

Virtual block number 1 (00000001), 512 (0200) bytes

6F57206F 6C6C6548 0001000E 00000002 21646C72 6F57206F 6C6C6548 0001000E ....Hello World!........Hello Wo 000000
00000000 00000000 00000000 00000000 00000000 0000FFFF 00000002 21646C72 rld!............................ 000020
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000040
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000060
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000080
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000A0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000C0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000E0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000100
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000120
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000140
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000160
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000180
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001A0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001C0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001E0

Does anyone know how to get rid of the extra null bytes?

Thanks,

Tom
9 REPLIES 9
Hoff
Honored Contributor

Re: Extra bytes at EOL interpreted as linefeeds from Java println() method?

Java version?

OpenVMS version and platform?

Current on ECOs?

Which editor? EVE? Something else?

The nulls are a canard; they're after the EOF (FFFF).

DCL creates VFC files, and the dump here sure looks like a VFC file.

I'd guess either the file record structure is mis-marked, or the editor might not be contending with the VFC quite right. TYPE is clearly dealing with this correctly, which tends to point to the editor...

OpenVMS I/O redirection is not quite like Unix I/O redirection; it's rather more arcane.

CONVERT can likely get you where you want, assuming the file here is not mis-marked.

There have also been some ECOs and some logical names to clean up and to specifically control some of the low-level file processing.

Tom Musson
Advisor

Re: Extra bytes at EOL interpreted as linefeeds from Java println() method?

Hi Hoff,

Sorry, I did neglect to provide version information.

Java version:
AXP: 1.4.2-4.p2 Classic and FastVM, 1.5.0-4 Classic and FastVM
IA64: 1.4.2-2 HotSpot, 1.5.0-2 HotSpot

OpenVMS version and platform:
AXPVMS 7.3-2, 8.3
IA64VMS 8.3-1H1

Current on all ECOs

I used LSE, EVE and EDT. All are the same.

>The nulls are a canard; they're after the EOF (FFFF).

The NULLs to which I was referring are the 3 bytes after the ASCII 10 and before the second "Hello World!" line. After the first "Hello World!" are the 4 bytes "00000002". The first byte ("02") is the carriage return and the next byte ("00") is the EOL, but it is the next two bytes that are extraneous. I am certain that they are being interpreted as a blank line as that is exactly what those two characters should be interpreted as.

>DCL creates VFC files, and the dump here sure looks like a VFC file.

It is a VFC file:

Record format: VFC, 2 byte header, maximum 0 bytes, longest 25 bytes
Record attributes: Print file carriage control

>OpenVMS I/O redirection is not quite like Unix I/O redirection; it's rather more arcane.

The same effect is found in a batch log file if the program is executed in batch, so it is not the rediret that is the problem. [Found this out after I submitted the original question.]

This only happens from Java, so I have to assume the problem is in Java.

I tried to create a small reproducer so that people would get to see the issue, but the actual applcation is several hundred thousand lines of mostly C that calls Java through JNI. The C printf statements do not get an extra carraige return, but all of the Java println methods do.

Thanks,

Tom
John Gillings
Honored Contributor

Re: Extra bytes at EOL interpreted as linefeeds from Java println() method?

Tom,

As Hoff has suggested, this looks like a VFC file (perhaps with some extra nulls, but they shouldn't matter).

If you want to control the exact format of your output file, use CONVERT on the pipe output like this (assumes V8.3 or higher):

$ PIPE java -cp . "HelloWorld" | CONVERT/FDL="RECORD; FORMAT STREAM_LF" SYS$PIPE test_output.txt

It might be interesting to see a dump of the stream_lf version of your file. If you don't have V8.3 it looks like this:

$ CREATE STM.FDL
RECORD; FORMAT STREAM_LF
$ PIPE java -cp . "HelloWorld" | CONVERT/FDL=STM.FDL SYS$PIPE test_output.txt

[aside... it's often annoying that DCL uses such an obscure, and largely useless record format by default, especially as there is no way to actually use the features of VFC! I frequently find it necessary to replace:

$ OPEN/WRITE out myfile

with:

$ CREATE/FDL=whatIwant myfile
$ OPEN/APPEND out myfile

Maybe DCL should have:

$ SET RMS_DEFAULT/FDL=whatIwant ?

A crucible of informative mistakes
RBrown_1
Trusted Contributor

Re: Extra bytes at EOL interpreted as linefeeds from Java println() method?

> The NULLs to which I was referring are
> the 3 bytes after the ASCII 10 and before
> the second "Hello World!" line. After the
> first "Hello World!" are the 4
> bytes "00000002". The first byte ("02")
> is the carriage return and the next byte
> ("00") is the EOL, ....

Not in ASCII.

It looks to me like you have extra records in your file.

Here is the first record:

21646C72 6F57206F 6C6C6548 0001000E

It is 14 (0xE) bytes long, including the VFC but not the record length. The VFC is 0x0001. The remaining 12 bytes are "Hello World!".

Here is the second record:

00000002

It is 2 bytes long, including the VFC but not the record length. The VFC is 0x0000. There are no other bytes in the record.

Here is the third record:

21646C72 6F57206F 6C6C6548 0001000E

It is the same as the first record.

Here is the fourth record:

00000002

It is the same as the second record.

Hard to say what this is:

FFFF

This would say that there are 65535 bytes in the next record. $ ANALYZE/RMS would tell you where the EOF is. Perhaps it is before the FFFF. Or maybe RMS does not complain if EOF occurs before the end of the last record.

Try dumping the file with /RECORD.

> Record format: VFC, 2 byte header,
> maximum 0 bytes, longest 25 bytes
> Record attributes: Print file carriage
> control

Interesting that DIR claims that the longest record is 25 bytes. DUMP/RECORD will be interesting.

> The same effect is found in a batch log
> file if the program is executed in batch,

That makes sense to me, since it appears that there are two records for each "Hello World!".

> so it is not the rediret that is the
> problem.

Maybe the redirect is doing something to you. What do you find in the file if you:

$ DEFINE/USER SYS$OUTPUT TEST_OUTPUT.TXT
$ java -cp . "HelloWorld"
?

Tom Musson
Advisor

Re: Extra bytes at EOL interpreted as linefeeds from Java println() method?

Hi John,

This is a Java issue, not an RMS issue.

I tried your suggestion using convert (though it really does not help me with the actual problem.) I get a stream_LF file, but it does not change the output in the editor. It does, however, now show the extra lines when you type out the file:

$typ test_output.txt
Hello World!

Hello World!

$

As I said in my previous reply "I tried to create a small reproducer so that people would get to see the issue, but the actual applcation is several hundred thousand lines of mostly C that calls Java through JNI. The C printf statements do not get an extra carraige return, but all of the Java println methods do."

Thanks,

Tom
Tom Musson
Advisor

Re: Extra bytes at EOL interpreted as linefeeds from Java println() method?

Hi RBrown,

This is a Java problem not an RMS problem.

The LRL of 25 is a red herring as it was from a version of the file that had extra debugging information within.

It does not matter how the output file gets created, the extra bytes still show up and they still get interpreted as separate records. Using your suggestion of redirecting sys$output, the file created is:

File ID (24536,140,0) End of file block 1 / Allocated 5

Record number 1 (00000001), 12 (000C) bytes, RFA(0001,0000,0000)

21646C72 6F57206F 6C6C6548 Hello World!.................... 000000

Record number 2 (00000002), 0 (0000) bytes, RFA(0001,0000,0010)


Record number 3 (00000003), 12 (000C) bytes, RFA(0001,0000,0014)

21646C72 6F57206F 6C6C6548 Hello World!.................... 000000

Record number 4 (00000004), 0 (0000) bytes, RFA(0001,0000,0024)

or

File ID (24536,140,0) End of file block 1 / Allocated 5

Virtual block number 1 (00000001), 512 (0200) bytes

6F57206F 6C6C6548 0001000E 00000002 21646C72 6F57206F 6C6C6548 0001000E ....Hello World!........Hello Wo 000000
00000000 00000000 00000000 00000000 00000000 0000FFFF 00000002 21646C72 rld!............................ 000020
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000040
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000060
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000080
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000A0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000C0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000E0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000100
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000120
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000140
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000160
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000180
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001A0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001C0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001E0

Thanks,

Tom
Craig A Berry
Honored Contributor

Re: Extra bytes at EOL interpreted as linefeeds from Java println() method?

Writing to VFC files is not likely to be something Java does well. If you can make sure Java rather than DCL opens the file and then do:

$ DEFINE JAVA$CREATE_STMLF_FORMAT TRUE

that will likely take care of your problem.

However, if you are writing to a log file created by the job controller, you may not have any choice in the matter (though some variation of John's suggestion might help).

One likely cause of the symptom you are seeing is calling fseek() on non-stream files and/or beyond EOF. Java might well be doing this under the hood without your having any control over it. Well, you can attempt to control it by enabling the feature setting DECC$POSIX_SEEK_STREAM_FILE. It's only supposed to work on stream files, but since Java is insisting on stream behavior, there is some slim chance it could defer writing those null bytes until after something else has repositioned things to the correct location.

There is more info in the fseek() section of the CRTL manual, and of course this is only a theory.
Hein van den Heuvel
Honored Contributor

Re: Extra bytes at EOL interpreted as linefeeds from Java println() method?


Rbrown,

Good work. 10 points! It brings much needed clarity, and fixes the problem in my (HP testdrive) environment.

$ delete tmp.tmp.*
$ define/user sys$output tmp.tmp
$ java -cp . "HelloWorld"
$ dump tmp.tmp

6C65480A 21646C72 6F57206F 6C6C6548 Hello World!.Hel 000000
00000000 00000A21 646C726F 57206F6C lo World!....... 000010

Of course you still have to be careful that the CRTL tries to 'help' by inheriting attributes from previously exising files.
For example

$ cre tmp.tmp ! Default variable length
$ define/user sys$output tmp.tmp
$ java -cp . "HelloWorld"
$ dump tmp.tmp

00002164 6C726F57 206F6C6C 6548000C ..Hello World!.. 000000
00002164 6C726F57 206F6C6C 6548000C ..Hello World!.. 000010

And

$ pipe show time > tmp.tmp
$ dump tmp.tmp.
3030322D 4E554A2D 34322020 8D010018 .... 24-JUN-200 000000
00000000 FFFF3331 3A31353A 35302038 8 05:51:13...... 000010

$ define/user sys$output tmp.tmp
$ java -cp . "HelloWorld"
$ dump tmp.tmp
21646C72 6F57206F 6C6C6548 0001000E ....Hello World! 000000
6F57206F 6C6C6548 0001000E 00000002 ........Hello Wo 000010
00000000 0000FFFF 00000002 21646C72 rld!............ 000020

So there it is back!
Now please notice the 'normal 8D01' in the 'show time' example. This is that standard pre- and post- output formatting.

The Java output suggests to me that it has tried to be 'clever'. This is why type 'works' only displaying 2 lines where there are in fact 4 records.

For the gory details about the formatting bits please check FAB$V_PRN in:

http://h71000.www7.hp.com/doc/731FINAL/4523/4523pro_006.html#bottom_006
[Moderator note: Removed the broken link. ]
Tom,

Please try the $DEFINE suggestion once again, after deleting all prior output files.

You may want to add Jon Gillings suggestion of pre-creating a variable length sequential to inherit from.

Of you can play with the CRTL logicals to influence that ( $HELP CRTL FEATUTE_LOGICAL )

Craig,
That logical sounds promissing, but when uing pipes/DCL log file output it is DCL, not the program which actually creates the file. And DCL will do the VFC/PRN thingy.

Tom,

For yucks I also tried a simple C program, and can not make it generate the extra lines.
So it _looks_ like a JAVA problem to me.

But this is really something to do with line-end flushes.
Think about where that new line is coming from in the first place !?
It is not provided in the string, so Java makes is up.
I suspect it 'prints' the raw user provided output first, with flush. Then it tries to help by printing a newline. And then RMS tries to help in the VFC case to do what you mean.

Here is a C reproducer:
#include
FILE *test;
main() {
test = fopen ("tmp.tmp", "w");
fprintf (test,"Hello World!");
fflush (test);
fprintf (test,"\n");
fflush (test);
fprintf (test,"Hello World!");
fflush (test);
fprintf (test,"\n");
fflush (test);
}


Rbrown>> "Hard to say what this is: FFFF"

You could say it is an end-of-data-marker.
For RMS variable length record files, a stream of binary zeroes would be a series of zero-length records. The FFFF is an invalid record size and instructs RMS to jump to the next block, which will be End-Of-File.
It has (a little) to do with the "no-span" record attribute.


Hope this helps,
Hein.

Craig A Berry
Honored Contributor

Re: Extra bytes at EOL interpreted as linefeeds from Java println() method?

Hein, yes, I'm aware DCL has already opened the file and it is a VFC file; I was just taking a swing at something that might trick Java into taking a different code path and potentially avoid the extra records. I think your C reproducer is likely the key, though. I did a bit of Googling and couldn't find a definitive or authoritative statement, but there were hints that Java's println does autoflushing by default, i.e. flushes on every write.

If Tom can create a PrintStream object attached to stdout and turn off autoflushing, that might do the trick (dunno, I'm no Java expert). Or perhaps there's a way to twiddle the autoFlush property on the System.out object.

This is often *not* what you want to have happen on a log file, where infrequent flushing can leave you in the dark about what is happening with your program, but flushing output to a record-oriented device is going to introduce record boundaries. Similar things happen with mailboxes, which makes them less than entirely satisfactory as the foundation for pipes in the CRTL, but I digress.