Operating System - OpenVMS
1753518 Members
5105 Online
108795 Solutions
New Discussion юеВ

Re: Obtaining the key of a record about half-way thru a file

 
HDS
Frequent Advisor

Re: Obtaining the key of a record about half-way thru a file

Good morning.

Hoping that all had a pleasant weekend.
I wish to thank all of those who replied. I am very appreciative to the wonderful suggestions and rather detailed responses. I am currently reviewing all and, after gathering my thoughts, will reply accordingly.

Much obliged.
-Howard-
Dean McGorrill
Valued Contributor

Re: Obtaining the key of a record about half-way thru a file

hi Howard,
curious as what you come up with, maybe give a few points to these fine gentleman. interesting problem anway

>Cheers, (almost Miller time here)
maybe it should be Hein-ekin time Hein :)
HDS
Frequent Advisor

Re: Obtaining the key of a record about half-way thru a file

Good morning.

I wish to graciously thank all of those who responded. I received some rather informative responses. I ended up using parts of some...and will likely use parts of others for other situations.This was a learning experience, to say the least.

In any case, here is what was decided for this specific case.

- Using either a $CONVERT or a $SORT/SPEC, create a file consisting of 20 byte records, where each record is the key to the records in the original large file. (I say "either" cecause I find that there are times that a $SORT can perform twice as fast as a $CONVERT in some cases....I am not sure why, and I am not sure if this is one of those cases.) For the most part, I should be able to get through 14M records in about 10 minutes; creating that flat 20byte record file.
- Using either the /STAT from the $SORT/$CONVERT or rough math taking the EOF of the resulting file and dividing by 20 and multiplying by 512, I can get a record count.
- Opening the flat file of keys as sequential with direct access, I grab the 20 byte record which identifies the key of the record half way through the original file.
- Using that as the cut-off in the original file, one processing thread will read from start of file to the record who has that identified key. The other processing thread will use that 'half-way-key' to do a greater-than keyed read and then read the original file to the EOF.

So far so good with this approach. The additional 10minutes to get the midway point is easily made up by the concurrent processing threads, so we are benefiting. I might end up tweaking this as I go along...time will tell.

Again, I wish to thank all those who responded. Much obliged.

-H-
Hein van den Heuvel
Honored Contributor

Re: Obtaining the key of a record about half-way thru a file


Hmmm, a straight convert to sequential file, fixed length with /truncate should be faster than a sort.

Anyway...

Over the weekend I managed to combine some half programs I had, to do what I described in an earlier reply: Use the RMS Index tree as a way to approximate chunks of a file.

I slightly over-engineered it. :-) :-)

You can tell it at which level to do the cut.
Level 1 being the most precise, slowest but stilla magnitude faster than reading the whole file.
And you can tell it how many cuts to take. And whether to return the key values in DCL symbols:

Usage example:

$ rms_key_samples -l=2 -s -n=4 x.x
0/4 vbn:202 key:%x00000104
1/4 vbn:27511 key:%x000264E2
2/4 vbn:54997 key:%x00028BCD
3/4 vbn:79687 key:%x0002AED6
$ show symb rms*
RMS_KEY_SAMPLE_0 = "%x00000104"
RMS_KEY_SAMPLE_1 = "%x000264E2"
RMS_KEY_SAMPLE_2 = "%x00028BCD"
RMS_KEY_SAMPLE_3 = "%x0002AED6"

Verbose, level-1, with debug print out:

$ rms_key_samples -d -n=4 x.x
* 25-JUN-2007 00:21:03.95 ALQ=102945 BKS=3 LVL=3 x.x
Level 3, First VBN Pointer = 784
Level 2, First VBN Pointer = 202
re-pack 2. record_count=2000 vbn=6076
re-pack 4. record_count=3999 vbn=12133
re-pack 8. record_count=7997 vbn=24235
re-pack 16. record_count=15993 vbn=48460
re-pack 32. record_count=31985 vbn=96937
Level 1 buckets = 286, Records = 33962
0/4 vbn:4 key:%x00000000
1/4 vbn:25711 key:%x000262E6
2/4 vbn:51484 key:%x0002876C
3/4 vbn:77179 key:%x0002ABD9

Tested with binary keys (above) string keys compressed and uncompressed.
Not all combo's tested though....

Still thinking about whether to drop the '0' key output as it might be confusing, notably if the level is not 1.
A straight sequential read get you to the first chunk.

Give it a whirl!

Send me an Email if you like it.

Hope this helps someone someday...
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting
HDS
Frequent Advisor

Re: Obtaining the key of a record about half-way thru a file

Good morning.

Hein...Thank you so very much :)

I will give this a try. I might not be able to get to it until later in the week (I got hit with some priorities this weekend).

I will most certainly get back to you as soon as possible.

Again...many thanks.

-Howard-