Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Fortran Unformatted Real

 
JDoe_1
Occasional Visitor

Fortran Unformatted Real

I am writing an unformatted binary file that must be constructed in a standard format in order for its parent program to read it. [ArcGIS Shapefile for those interested] To make a long story short, I have a method for accomplishing what I need but it is SLOW, and seemingly stupid. Here is what is going on at a binary level:

Byte 0 --> 4 byte integer
Bytes 4-23 --> Zeros (as 4 byte integers)
Byte 24 --> 4 byte integer
Byte 28 --> 4 byte integer
Byte 32 --> 4 byte integer
Byte 36 --> 8 byte real
...And so on

Byte 36 becomes the problem. To that point, I am using a record length of 4 (/assume:byterecl @ compile time for other reasons) to accommodate the 4 byte integers. It works out that the 8 byte real (with an 8 record length) comes half way between records 5 and 6. To this point, I am taking the 8 byte real and writing it to a file on the hard drive, and reading back the binary as two 4 byte integers and just continuing on my merry way. Unfortunately, this isn't quick (I currently have it going at approx 4000 times/sec, but it is still my main hang up)

For those of you that stayed with me through that babble, do you have any ideas? I'm flush out. I've played with MVBITS with no success. I am pretty new to FORTRAN, so there might be something I'm missing.

Thanks in advance.
20 REPLIES 20
Steven Schweda
Honored Contributor

Re: Fortran Unformatted Real

> [...] I am taking the 8 byte real and
> writing it to a file on the hard drive,
> and reading back the binary as two 4 byte
> integers [...]

Yikes. It's been too long for me to write
the code quickly and reliably, but I think
that I'd be doing something like:

INTEGER* 4 XINT4X2( 2)
REAL* 8 XREAL
EQUIVALENCE XINT4X2, XREAL

Assign a value to XREAL, and use what's in
XINT4X2.

If you already have either of these things
somewhere, then you may be able to avoid even
the assignment by equivalencing the thing you
already have to the thing you add.
Steven Schweda
Honored Contributor

Re: Fortran Unformatted Real

> EQUIVALENCE XINT4X2, XREAL

Oops. I found some old code. Make that:

EQUIVALENCE (XINT4X2, XREAL)


[...]
C 23 June 1994. SMS.
[...]
integer* 2 W( 2)
c
integer* 4 LW
c
equivalence (LW, W)
[...]

I haven't really used Fortran since it was
FORTRAN.
Robert Gezelter
Honored Contributor

Re: Fortran Unformatted Real

JDoe,

I would not hesitate to make a comment without seeing precisely what the code that is reading this stream is.

Writing four bytes/record will be very inefficient at quite a few levels. Without a review of the sources (for both the producer and the consumer), it would be reckless to make suggestions.

The bottleneck could be in the production of the records, but other possibilities exist. For example, if the file is constantly being extended, solving the bottleneck can be as simple as adjusting RMS parameters.

More details would be appreciated.

- Bob Gezelter, http://www.rlgsc.com
Steven Schweda
Honored Contributor

Re: Fortran Unformatted Real

> I would not hesitate to make a comment
> [...]

Nor I (even if the opposite was intended).

> Writing four bytes/record will be very
> inefficient at quite a few levels.

_That_ statement is right-side-up. It's hard
to believe that any non-garbage program would
want to deal with a file structured that way.
I'd expect the segmented-record headers to
occupy more space than the actual data, if
you're using the default RECORDTYPE. I can
imagine that some missing details might make
this look less goofy, but they're missing.

Among other things, it might be interesting
to look at a DUMP (/LONGWORD?) of the file
you're creating to see if it contains what
you expect.

> [...] but it is SLOW, and seemingly stupid.
> [...]

I wouldn't say "seemingly". The EQUIVALENCE
scheme should alleviate much of that pain.

> [...] I am pretty new to FORTRAN, [...]

If C's more familiar, then think of it as a
union.

http://en.wikipedia.org/wiki/Solidarity_Forever

You do need to worry about which order to use
the pieces in (assuming that the bytes don't
need to be scrambled, too, and that the
floating-point format which you're using is
the one expected by the consumer). Not
knowing the hardware type involved (another
of those pesky missing details) makes it hard
to guess how deep in the weeds you may be.
JDoe_1
Occasional Visitor

Re: Fortran Unformatted Real

Thanks guys. The EQUIVALENCE statement did speed things up a bit. It was exactly what I was looking for, but would have never found via Google.

That said, I'm interested in hearing what you had to say about the files growing and the efficiency associated with that. Is there some sort of pre-allocation? Basically, what I am doing write now is writing records one by one and each extends the file. Maybe you could give me a few more details and I could give you a better idea of what I'm doing.

Thanks both of you.

JD
JDoe_1
Occasional Visitor

Re: Fortran Unformatted Real

The standard for the file can be found here:

http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf

I've set the little endian and big endian where need be by writing a switch_endian subroutine using MVBITS and converting where I need to instead of using multiple opens/closes of the file and the CONVERT=BIG_ENDIAN/LITTLE_ENDIAN options. (Certain parts of the file require one endian, whereas other parts require the other)

I'm going to go out on a limb and assume that I'm the one making this difficult and not ESRI because they have designed all the standards and programs, and I don't really think their software is shoddy, its more likely my knowledge. [I've only started using FORTRAN a few weeks ago, I've never used C]

Thanks again.

JDoe_1
Occasional Visitor

Re: Fortran Unformatted Real

Sorry, 1 more.

Also, I should note that the program does work currently, and I'm just looking for greater speed as I'm writing about 20 million records, headed for 50 million. Currently I'm doing about 4500 records/second on a 2 ghz machine on 1 core. I am not using RECORDTYPE, and I'm not even sure what that is. Eventually, this code will be ported over to run on about 50 or so cores, but I want to maximize speed across one core before taking it to parallel.
Joseph Huber_1
Honored Contributor

Re: Fortran Unformatted Real

Still it is not clear what 'record' means from the description, generally in Fortran unformatted WRITE means
1 WRITE statement == 1 record.

To speed up for many extensions:
OPEN with INITIALIZE=m,EXTEND=n
where m is the initial size of the file in blocks,
n the extend size.
SEE HELP FORTRAN STATEMENT OPEN .

And RMS buffering may also play a role:
see HELP SET RMS for /BUFFER_COUNT and /BLOCK_COUNT.
http://www.mpp.mpg.de/~huber
Joseph Huber_1
Honored Contributor

Re: Fortran Unformatted Real

Looking into the 'shapefile' spec, it appears it defines its own record structure, it is not Fortran unformatted sequential, which would result in segmented records.

I assume OPEN specifies UNFORMATTED, and RECORDTYPE='STREAM'.
In this case there is no record overhead in the file, and speed of writing is only influenced by the allocation and RMS block buffer parameters.
http://www.mpp.mpg.de/~huber
Joseph Huber_1
Honored Contributor

Re: Fortran Unformatted Real

Sorry I overlooked
>>I am not using RECORDTYPE, and I'm not even sure what that is.

But how then can the result be in the 'shapefile' format as in the paper cited ?
http://www.mpp.mpg.de/~huber
Robert Gezelter
Honored Contributor

Re: Fortran Unformatted Real

JD,

Ok, the specification that was cited in this thread is 34 pages long, and certainly does mix Big-endian and Little-endian numbers (for unspecified reasons). Admittedly, that is unusual. Most specifications pick one or the other and stick to that choice. However, since it is a specification for an outside product, it is what it is.

A casual reading of the specification would seem to indicate that its use of the term "record" has little to do with the meaning of the term "record" as used in FORTRAN. At first glance, it appears to be a binary byte stream.

Writing to this specification is not the most complex programming problem, but it is certainly not a beginner project. One post mentioned that your experience in both C and FORTRAN is limited. What languages are you familiar with? What is your general programming experience?

Since generating this data is inherently a serial process, increasing the number of processor cores is not likely to appreciably increase the speed of the code.

That said, if the format is what it appears to be, several orders of magnitude of performance (100 to 100,000 times) beyond 4K items per second would seem to be achievable.

Your organization may want to consider finding a supplemental resource with more experience in C/FORTRAN, binary files, and related areas [Disclosure: our firm provides services in this area, as do other active ITRC contributors]. ITRC participants contribute their time and insight for no compensation.

- Bob Gezelter, http://www.rlgsc.com
JDoe_1
Occasional Visitor

Re: Fortran Unformatted Real

My programming experience would be likened to a "Jack of all trades, master of none" sort of situation. Fortran is just the preferred method for the lab I'm working in currently. I've written in Java, PHP, Visual Basic, but mostly MatLab's specialized code.

Also, I should have just kept my mouth shut, but the part I'll be making parallel doesn't have to do with the file generation.

In any event, I will give the initialize and extend options a go and see what happens.

Thanks.
Robert Gezelter
Honored Contributor

Re: Fortran Unformatted Real

JDoe,

Be careful about the use of the term record. As I noted, the term "record" in the shapefile description appears to be emphatically NOT the same as the meaning of the word "record" in the RMS and FORTRAN context.

- Bob Gezelter, http://www.rlgsc.com
Steven Schweda
Honored Contributor

Re: Fortran Unformatted Real

> [...] I am not using RECORDTYPE, [...]

What _are_ you using?

> I assume OPEN specifies [...]

If a responder needs to assume anything, then
it's likely that the questioner is not doing
his job properly.

> Also, I should note that the program does
> work currently, [...]

Which suggests that something is being done
right, even if sub-optimally.

> [...] I will give the initialize and extend
> options a go [...]

Plausible, but it might be wise to see where
the time is being spent now, rather than poke
at the problem in a hopeful but disorganized
way. There are actual documents and software
which can help with performance analysis on
VMS systems. You may have some truly
horrible code, which can be greatly improved,
but optimizing horrible code which takes only
1% of the time won't buy you much.
Joseph Huber_1
Honored Contributor

Re: Fortran Unformatted Real

JD,
>>
I am taking the 8 byte real and writing it to a file on the hard drive, and reading back the binary as two 4 byte integers and just continuing on my merry way.
>>

Do I understand it right: just for the purpose of converting a real number in memory, You write it to a (temporary) file, and read it back ?
If yes, no wonder it is slowing down a lot.
For floting point in-memory conversions, VMS run-time library offers conversion routines.
Read
HELP RTl CVT$
to see if it offers the correct thing.

http://www.mpp.mpg.de/~huber
Robert Gezelter
Honored Contributor

Re: Fortran Unformatted Real

JD,

Joe has a valid point. Creating a file on disk for each number is a very expensive number. FORTRAN does have in-memory IO, but the best solution is to use the conversion library (mentioned in Joe's recent post).

- Bob Gezelter, http://www.rlgsc.com
Hoff
Honored Contributor

Re: Fortran Unformatted Real

What sort of VAX, Alpha or Integrity Itanium box are we talking about here?

What are the OpenVMS and Fortran versions?

What sorts of disk(s) and I/O hardware here?

On no source code and no evidence, I'd tend to guess this program was I/O bound.

Better yet, run DECset PCA to see where you're really spending your run-time and your wall-clock time here.

Why not build and buffer and write a hundred or a thousand or a million (of what are here confusingly called) records at a time, at memory speeds? Writing small wads of data into big buffers stored in virtual memory and then knocking out whole buffers en-mass is going to be massively more efficient and expedient than legions of smaller I/Os. Double-buffer and overlap your writes for better speed.

If working within the constraints of OpenVMS and of RMS here, I'd (still) be tempted to build a big array of entries and write it to disk with one big I/O. Record writes are slow, and (small) file extensions are to be avoided.

If you know how much data you're starting with, then you could conceivably map and write to and use a process section with a backing storage file and largely remove RMS (and the RMS concept of records) from the design. That's going to be about as fast as you can reasonably go here within the constraints of your hardware, after all...

Stephen Hoffman
HoffmanLabs LLC
Steven Schweda
Honored Contributor

Re: Fortran Unformatted Real

> Do I understand it right: [...]

And did you notice that we resolved this
problem back around "May 27, 2009 05:08:10
GMT"?
Hein van den Heuvel
Honored Contributor

Re: Fortran Unformatted Real

Steven, Steven... don't confuse us with the facts. We are just about to switch over to trolling for work. :-).



Speaking of facts

" Currently I'm doing about 4500 records/second on a 2 ghz machine on 1 core."

Jdoe,

What Hardware Platform and Operating System, is this exercise running on?

There are no 2 Ghz OpenVMS systems, other than emulated Alpha's on 2 Ghz X86's

Even with the relatively vague specification in place, I think it is safe to say the 4500 records/second is LOUSY performance no matter what. Maybe consider using an OpenVMS solution? Maybe some performance consulting?

( Is the 2Ghz the desktop you use to connect to an OpenVMS system? :-)


hth,
Hein van den Heuvel ( at gmail dot com )
HvdH Performance Consulting.




Joseph Huber_1
Honored Contributor

Re: Fortran Unformatted Real

Steven
>>
And did you notice that we resolved this
problem back around "May 27, 2009 05:08:10
GMT"?
>>

Which kind of magic was that ?
How does equivalencing integers and doubles convert the doubles to IEEE ?

http://www.mpp.mpg.de/~huber