Operating System - OpenVMS
1748185 Members
4199 Online
108759 Solutions
New Discussion юеВ

Re: Number of records in a file

 
WW304289
Frequent Advisor

Number of records in a file

Is it possible to determine the number of records in a file without reading the file and counting records? I looked at RMS headers and did not find anything.

I'm aware of rmsstats and DATA_RECORD_COUNT but as far as I understand it works only for indexed files.

Thanks,
-Boris
12 REPLIES 12
John Gillings
Honored Contributor

Re: Number of records in a file

Boris,

For some file types, for example, those with fixed length records, you can determine the record count indirectly, but for everything else, if you need an accurate value, you need to count them. Remembering that reading takes finite time, and, unless you lock the file for exclusive access for the duration, the count may change between starting and finishing.

RMS keeps the LRL (Longest Record Length) from which you can establish approximate upper limits. Does that help? (but bear in mind that the LRL is not necessarily accurate - just one example, the CRTL will sometimes blindly set LRL to 32767)

Note that DATA_RECORD_COUNT applies only to the analysis of one key, so it's not necessarily a count of records in the whole file. Maybe if you look at the primary key?, but even then I think there may be cases where it doesn't include all records. Simple example, here's an extract from a random SYSUAF with 46 records:

KEY 0
NULL_KEY no
KEY 1
NULL_KEY no
KEY 2
NULL_KEY no
KEY 3
NULL_KEY no
ANALYSIS_OF_KEY 0
DATA_RECORD_COUNT 46
ANALYSIS_OF_KEY 1
DATA_RECORD_COUNT 44
ANALYSIS_OF_KEY 2
DATA_RECORD_COUNT 44
ANALYSIS_OF_KEY 3
DATA_RECORD_COUNT 1


What problem are you trying to solve?
A crucible of informative mistakes
Hein van den Heuvel
Honored Contributor

Re: Number of records in a file

>> Is it possible to determine the number of records in a file without reading the file and counting records?

In general, NO.

You can calculate the number of records from EOF + FFB for a fixed length record file.

And RMS _does_ store record and byte count for simple, unshared, stream_lf files.
Accessible programatically, and in DCl via:
F$FILE(file,"FILE_LENGTH_HINT") => Record count and data byte count in the form (n,m), where n is the record count and m is the data byte count. An invalidated count is specified by a -1 for n or m.

As you point out, you can of course just read and count. That reading can be simple using RMS $GET's (such as SEARCH) or more smart some as homegrown algorithms some of my tools use and tools like DIX and VSelect (EGH company) use.

As per John... what problem are you really trying to solve? Maybe other, better solutions are available/possible?

Hope this helps some,
Hein




Ian Miller.
Honored Contributor

Re: Number of records in a file

Note also
$ ANALYZE/RMS/UPDATE_HEADER

command to update the file length hint.
____________________
Purely Personal Opinion
Jan van den Ende
Honored Contributor

Re: Number of records in a file

Boris,

>>>
Is it possible to determine the number of records in a file without reading the file and counting records?
<<<
Strictly speaking, this is not what you asked, but I have always been sufficiently satisfied with
$ sort nl: /stat/out=
In more recent versions, the record count (and other stats) get set to symbols SORT$*; which allows to drop the /OUT=..

Mind, this DOES read the whole file, so it may or may not be what you need.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Jan van den Ende
Honored Contributor

Re: Number of records in a file

Just saw Ian's answer.
Same as with mine: it also performs a complete file read, but then also updates the header, which also modifies the file's Modification Date.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Joseph Huber_1
Honored Contributor

Re: Number of records in a file

>>
$ sort nl: /stat/out=
In more recent versions, the record count (and other stats) get set to symbols SORT$*; which allows to drop the /OUT=..
<<

Also SEARCH/STAT could be the right tool. It counts in addition to records the number of bytes.
In older systems, where search/stat does not store the statistics in DCL symbols, use e.g. my command-file at
http://www.mpp.mpg.de/~huber/util/com/search_stat.com
http://www.mpp.mpg.de/~huber
WW304289
Frequent Advisor

Re: Number of records in a file

Thanks for all the replies.

Hein van den Heuvel wrote:
>In general, NO.

This is what I suspected.

John Gillings wrote:
>Remembering that reading takes finite time, >and, unless you lock the file for exclusive >access for the duration, the count may >change between starting and finishing.

Yes, this is why I'd prefer a "snapshot" value at sys$open/sys$display time like xabfhc.xab$l_ebk or xab$w_ffb.

>RMS keeps the LRL (Longest Record Length) >from which you can establish approximate >upper limits. Does that help? (but bear in >mind that the LRL is not necessarily >accurate - just one example, the CRTL will >sometimes blindly set LRL to 32767)

I'm not sure what you mean by "CRTL ... set LRL to 32767". Can you elaborate on this, please?

Anyway, 32767 is what xab$w_lrl reports for a Stream_LF file.

As for the problem I'm trying to solve, there is not much of a problem :-) It just would be more convenient to allocate some structure if I knew the number of records in a file. I guess, I'll be counting them :-)

Thanks again,
-Boris
Jan van den Ende
Honored Contributor

Re: Number of records in a file

Boris,

from your profile
>>>
I have assigned points to 0 of 35 responses to my questions.
<<<

Maybe time to say "thank you" in Forum style, ie, assign some points?

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
John Gillings
Honored Contributor

Re: Number of records in a file

Boris,

>I'm not sure what you mean by "CRTL ... set
>LRL to 32767". Can you elaborate on this,
>please?

The CRTL writes "native" C files directly, without going through RMS. That means you don't get things like LRL accumulation for free. Long ago that meant LRL for a stream_lf file generated by a C program was 0. This upset numerous applications which used LRL to size their input buffer. They would dutifully allocate a 0 length buffer then get an RMS RTB (Record Too Big) error on the first read.

The "fix" for this was for the CRTL to hard set the LRL to 32767 - the largest possible value for an RMS record, so "can't be wrong". However, this caused various troubles of its own, for example TYPE/TAIL doesn't work (max LRL 511), and, notoriously, SORT performance issues. SORT would allocate working memory by multiplying the LRL by the number of records. With an artificially inflated LRL, this resulted in a very sparse work array, very large work files, and lots of paging. SORTing even modest sized stream_lf files could be orders of magnitude slower than the same file with an accurate LRL.

As a further ugly hack fix for the LRL ugly hack fix, there is now a logical name DECC$DEFAULT_LRL to control the value. It's still a stab in the dark. For "normal" files, defining this value to 511 is probably the best choice. It's small enough to make TYPE/TAIL work, and large enough to cover most text files.

Of course, the correct fix is for the CRTL to keep track of the real LRL and set an accurate value. I'm not sure why the CRTL engineers never thought of that ;-)
A crucible of informative mistakes