Operating System - OpenVMS
1747988 Members
4637 Online
108756 Solutions
New Discussion юеВ

Re: Fortran Unformatted Real

 
JDoe_1
New Member

Fortran Unformatted Real

I am writing an unformatted binary file that must be constructed in a standard format in order for its parent program to read it. [ArcGIS Shapefile for those interested] To make a long story short, I have a method for accomplishing what I need but it is SLOW, and seemingly stupid. Here is what is going on at a binary level:

Byte 0 --> 4 byte integer
Bytes 4-23 --> Zeros (as 4 byte integers)
Byte 24 --> 4 byte integer
Byte 28 --> 4 byte integer
Byte 32 --> 4 byte integer
Byte 36 --> 8 byte real
...And so on

Byte 36 becomes the problem. To that point, I am using a record length of 4 (/assume:byterecl @ compile time for other reasons) to accommodate the 4 byte integers. It works out that the 8 byte real (with an 8 record length) comes half way between records 5 and 6. To this point, I am taking the 8 byte real and writing it to a file on the hard drive, and reading back the binary as two 4 byte integers and just continuing on my merry way. Unfortunately, this isn't quick (I currently have it going at approx 4000 times/sec, but it is still my main hang up)

For those of you that stayed with me through that babble, do you have any ideas? I'm flush out. I've played with MVBITS with no success. I am pretty new to FORTRAN, so there might be something I'm missing.

Thanks in advance.
20 REPLIES 20
Steven Schweda
Honored Contributor

Re: Fortran Unformatted Real

> [...] I am taking the 8 byte real and
> writing it to a file on the hard drive,
> and reading back the binary as two 4 byte
> integers [...]

Yikes. It's been too long for me to write
the code quickly and reliably, but I think
that I'd be doing something like:

INTEGER* 4 XINT4X2( 2)
REAL* 8 XREAL
EQUIVALENCE XINT4X2, XREAL

Assign a value to XREAL, and use what's in
XINT4X2.

If you already have either of these things
somewhere, then you may be able to avoid even
the assignment by equivalencing the thing you
already have to the thing you add.
Steven Schweda
Honored Contributor

Re: Fortran Unformatted Real

> EQUIVALENCE XINT4X2, XREAL

Oops. I found some old code. Make that:

EQUIVALENCE (XINT4X2, XREAL)


[...]
C 23 June 1994. SMS.
[...]
integer* 2 W( 2)
c
integer* 4 LW
c
equivalence (LW, W)
[...]

I haven't really used Fortran since it was
FORTRAN.
Robert Gezelter
Honored Contributor

Re: Fortran Unformatted Real

JDoe,

I would not hesitate to make a comment without seeing precisely what the code that is reading this stream is.

Writing four bytes/record will be very inefficient at quite a few levels. Without a review of the sources (for both the producer and the consumer), it would be reckless to make suggestions.

The bottleneck could be in the production of the records, but other possibilities exist. For example, if the file is constantly being extended, solving the bottleneck can be as simple as adjusting RMS parameters.

More details would be appreciated.

- Bob Gezelter, http://www.rlgsc.com
Steven Schweda
Honored Contributor

Re: Fortran Unformatted Real

> I would not hesitate to make a comment
> [...]

Nor I (even if the opposite was intended).

> Writing four bytes/record will be very
> inefficient at quite a few levels.

_That_ statement is right-side-up. It's hard
to believe that any non-garbage program would
want to deal with a file structured that way.
I'd expect the segmented-record headers to
occupy more space than the actual data, if
you're using the default RECORDTYPE. I can
imagine that some missing details might make
this look less goofy, but they're missing.

Among other things, it might be interesting
to look at a DUMP (/LONGWORD?) of the file
you're creating to see if it contains what
you expect.

> [...] but it is SLOW, and seemingly stupid.
> [...]

I wouldn't say "seemingly". The EQUIVALENCE
scheme should alleviate much of that pain.

> [...] I am pretty new to FORTRAN, [...]

If C's more familiar, then think of it as a
union.

http://en.wikipedia.org/wiki/Solidarity_Forever

You do need to worry about which order to use
the pieces in (assuming that the bytes don't
need to be scrambled, too, and that the
floating-point format which you're using is
the one expected by the consumer). Not
knowing the hardware type involved (another
of those pesky missing details) makes it hard
to guess how deep in the weeds you may be.
JDoe_1
New Member

Re: Fortran Unformatted Real

Thanks guys. The EQUIVALENCE statement did speed things up a bit. It was exactly what I was looking for, but would have never found via Google.

That said, I'm interested in hearing what you had to say about the files growing and the efficiency associated with that. Is there some sort of pre-allocation? Basically, what I am doing write now is writing records one by one and each extends the file. Maybe you could give me a few more details and I could give you a better idea of what I'm doing.

Thanks both of you.

JD
JDoe_1
New Member

Re: Fortran Unformatted Real

The standard for the file can be found here:

http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf

I've set the little endian and big endian where need be by writing a switch_endian subroutine using MVBITS and converting where I need to instead of using multiple opens/closes of the file and the CONVERT=BIG_ENDIAN/LITTLE_ENDIAN options. (Certain parts of the file require one endian, whereas other parts require the other)

I'm going to go out on a limb and assume that I'm the one making this difficult and not ESRI because they have designed all the standards and programs, and I don't really think their software is shoddy, its more likely my knowledge. [I've only started using FORTRAN a few weeks ago, I've never used C]

Thanks again.

JDoe_1
New Member

Re: Fortran Unformatted Real

Sorry, 1 more.

Also, I should note that the program does work currently, and I'm just looking for greater speed as I'm writing about 20 million records, headed for 50 million. Currently I'm doing about 4500 records/second on a 2 ghz machine on 1 core. I am not using RECORDTYPE, and I'm not even sure what that is. Eventually, this code will be ported over to run on about 50 or so cores, but I want to maximize speed across one core before taking it to parallel.
Joseph Huber_1
Honored Contributor

Re: Fortran Unformatted Real

Still it is not clear what 'record' means from the description, generally in Fortran unformatted WRITE means
1 WRITE statement == 1 record.

To speed up for many extensions:
OPEN with INITIALIZE=m,EXTEND=n
where m is the initial size of the file in blocks,
n the extend size.
SEE HELP FORTRAN STATEMENT OPEN .

And RMS buffering may also play a role:
see HELP SET RMS for /BUFFER_COUNT and /BLOCK_COUNT.
http://www.mpp.mpg.de/~huber
Joseph Huber_1
Honored Contributor

Re: Fortran Unformatted Real

Looking into the 'shapefile' spec, it appears it defines its own record structure, it is not Fortran unformatted sequential, which would result in segmented records.

I assume OPEN specifies UNFORMATTED, and RECORDTYPE='STREAM'.
In this case there is no record overhead in the file, and speed of writing is only influenced by the allocation and RMS block buffer parameters.
http://www.mpp.mpg.de/~huber