Re: Determine the actual size of data in an indexed file

yaron1 · ‎06-13-2007

Hi,

I have an indexed that is converted weekly for space purpose. The convert command file does the ANALYZE/RMS/FDL to generate a FDL file and then optimizes the FDL file with the EDIT/FDL/ ANALYSIS=<>/NONINTERACTIVE <>. It seems that the generated FDL in this process is not the best. Especially, I have good reason to suspect that the initial allocation for areas is far too big. My Question is how can I best determine what is the actual space needed for the initial allocation for this file, how can I determine the actual space the data records and the key occupy inside the file. If I do ANALYZE/RMS/STATI on the file what should I notice in particular? Not much data is added to the file, and no extent seemed to happen recently.
Thanks for the answers.
The FDL produced in the above way looks like:

SYSTEM
SOURCE "OpenVMS"

FILE
CONTIGUOUS no
FILE_MONITORING no
GLOBAL_BUFFER_COUNT 0
NAME "NIGHT$WEEK:[CONVERT]CRDAT.TMP;1"
ORGANIZATION indexed
OWNER
PROTECTION (system:RWED, owner:RWED, group:RE, world:)

RECORD
BLOCK_SPAN yes
CARRIAGE_CONTROL carriage_return
FORMAT fixed
SIZE 224
AREA 0
ALLOCATION 2176168
BEST_TRY_CONTIGUOUS yes
BUCKET_SIZE 12
EXTENSION 65535

AREA 1
ALLOCATION 34008
BEST_TRY_CONTIGUOUS yes
BUCKET_SIZE 12
EXTENSION 8508
KEY 0
CHANGES no
DATA_AREA 0
DATA_FILL 75
DATA_KEY_COMPRESSION no
DATA_RECORD_COMPRESSION no
DUPLICATES no
INDEX_AREA 1
INDEX_COMPRESSION no
INDEX_FILL 75
LEVEL1_INDEX_AREA 1
NAME "key1"
NULL_KEY no
PROLOG 1
SEG0_LENGTH 66
SEG0_POSITION 0
TYPE string

Note how the *_COMPRESSION attributes for the key have "no"

Dean McGorrill · ‎06-13-2007

hi varon,
if your using this example of fdl
you can take out the allocations. What happens if you just do a simple convert
file1 file2? It will take out empty pointers, bucket splits etc. I have an optimizer script that I can't lay my hands on at the moment. You'll be pleny of replies, theres some pretty handy rms
folk in here! hope this helps -Dean

John Gillings · ‎06-13-2007

Yaron,

> EDIT/FDL/ ANALYSIS=<>/NONINTERACTIVE <>.

"Optimize" in this context is a misnomer. In order to optimize anything you need to know what aspect you wish to optimize. There's no such thing as "optimal for everything". Generally speaking, improving one thing requires something else be sacrificed. The automated FDL tool tries to balance everything out - in some senses the exact opposite of "optimize".

If you know what you want to optimize, you should create an FDL once and use it each time you CONVERT.

From your description it sounds like you want to optimize for space? That is, you want the file to be as small as possible, even if that means sacrificing performance?

Remember that your FDL need only mention the things you're interested in. To minimize space, turn on compression, choose higher bucket sizes with larger fill factors - this will reduce the amount of empty space left over in buckets. However, it will mean that inserting records will tend to be more expensive, as there's a higher chance of a bucket split (100% if fill factor is 100!). Lookups may also cost more as there will be more sequential searching within buckets. Compression also takes more CPU to compress and decompress data (but in some cases that may be more than compensated for by reduced I/O). The other catch with compression, is some data may not compress well.

To test the extreme case, try this FDL:

AREA 1
BUCKET_SIZE 63
DATA_FILL 100
INDEX_FILL 100
DATA_KEY_COMPRESSION yes
DATA_RECORD_COMPRESSION yes
INDEX_COMPRESSION yes

Now CONVERT your file and look at the size of the resulting file. If you have some kind of benchmark for file performance compare this file with your existing file.

A crucible of informative mistakes

Hein van den Heuvel · ‎06-13-2007

>> I have an indexed that is converted weekly for space purpose.

Hmmm... then why ask the system to leave 1/4 of every single databucket empty (75% fill) and not use data key compression.So do you still 'gain' space with the comvert? That would suggest the application is causing excessive bucket splits probably due to localized non-eof inserts.

>> The convert command file does the ANALYZE/RMS/FDL to generate a FDL file and

What a HORRIBLE waste of production hours!
Surely you can predict next weeks record counts from last weeks oonvert stats?
Just tweak the 'data record count' number in the analysis section to accomodate next beek needs. underestimate if you want to make sure not to overallocate for some bizarre reason.

>> then optimizes the FDL file with the EDIT/FDL/ ANALYSIS=<>/NONINTERACTIVE <>. It seems that the generated FDL in this process is not the best. Especially, I have good reason to suspect that the initial allocation for areas is far too big.

Share those reasons!

>> My Question is how can I best determine what is the actual space needed for the initial allocation for this file, how can I determine the actual space the data records and the key occupy inside the file.

Just carefully interpret the record counts, mean length and compression stats in the analys section of the anal/rms fdl file.

>>> If I do ANALYZE/RMS/STATI on the file what should I notice in particular?

As per above... too bad you posted the post edit/fdl output, not the file with analyze data. Please attach as a text file if further help is needed.

>> Not much data is added to the file, and no extent seemed to happen recently.

That's goodness. Be happy! Don't bother generating new FDL's until CONVERT/STAT tell you the number of records increased a bunch

>> The FDL produced in the above way looks like:

Wrong FDL file!

Please check my RMS_TOOLS freeware contribution as:
http://h71000.www7.hp.com/freeware/freeware60/rms_tools/
It has an XLS file to help you ocunt, and a 'tune_check' program
( see its -s=-1 -a option) as will as a silly but effective .BAS count tool.

So why no data key compression?
Just do a sample convert with that enabled and see whether it is effective for the particular data distribution in use

Hope this helps,
Hein van den Heuvel ( @ gmail . com )

yaron1 · ‎06-14-2007

Indeed space is the concern here.
Thank you very much.

Hein van den Heuvel · ‎06-14-2007

Hmmm, Allocation being a concern and only few records being added does not add up to a weekly convert in my book. Oh well.

Anyway, You are leaving us in suspense here!
Show us that analyze data! (or Email if you think it inappropriate for the forum)

Also, I would recommend you create an fdl along the lines John suggested, or just use your old FDL but remove the allocation and mark data key compression as YES.

Then do a CONVER/NOfast/FDL[/SHARE] file tmp.tmp.
After a minute, abort it (^Y) and do the ANAL/RMS/FDL on the tmp output file.
Now look at that FDL to see the [in]effectiveness of data key compression.

Minus is bad, Single digit positive numbers would make it unimportant and double digit number should strongly suggest to enable data key compression.

Cheers,
Hein.

yaron1 · ‎06-19-2007

Hi,
Sorry I left it suspended. I did the ANALYZE/RMS/STATIST on the file and I am pasting here the important part:

STATISTICS FOR KEY #0

Number of Index Levels: 3
Count of Level 1 Records: 172366
Mean Length of Index Entry: 70
Count of Index Blocks: 31812
Mean Index Bucket Fill: 75%

Count of Data Records: 3458883
Mean Length of Data Record: 224
Count of Data Blocks: 2068392
Mean Data Bucket Fill: 75%

Overall Space Efficiency: 68%

This is 1 month after the last CONVERT. We can see that the file is actually empty in 25% , the buckets remained filled in only 75% after 1 month so records are almost not added here. What will happen if I CONVERT without specifying the ALLOCATION attribute for the AREAs? That will create an output file in the size of the actual data size plus the *FILL percentage (and plus the index structure, control bytes, etc.), I guess, but the file will be highly fragmented without specifying initial allocation, how can I avoid this fragmentation? By COPY/CONTIG after the CONVERT?

Thanks,

Hein van den Heuvel · ‎06-19-2007

Just fix the lines: DATA_FILL 75
and INDEX_FILL 75 in the KEY 0 section.

Make the 100% for data, 90% for index for the next convert and see how you like it.

Gotta run (battery low)
Maybe more later,

Hein.

Dean McGorrill · ‎06-19-2007

hi yaron,
you can take out the allocation. as john said, leave in only what your interested in. put in Heins suggestions to, try it
and see if you like it.

John Gillings · ‎06-19-2007

yaron,

>Mean Index Bucket Fill: 75%
...
>Mean Data Bucket Fill: 75%

Excellent! This is exactly what you asked for. If you want 100% fill, just say so. As Hein suggests, leaving index fill at 90% shouldn't cost much, and it will protect you from too many index bucket splits if you add new records.

>if I CONVERT without specifying the
>ALLOCATION attribute for the AREAs? That
>will create an output file in the size of...

Yes, it will give minimal size, potentially fragmented into EXTENSION sized chunks (or whatever the default chain determines).

Instead of leaving ALLOCATION and EXTENSION blank, use your knowledge of the file to give an estimate of what you expect the final size to be. You know the current size, and you know "Overall Space Efficiency: 68%". Thus, if you want absolute minimal space, and also minimising fragmentation, make the allocation 60% of the current size, and set the extension to (say) 10% of the allocation.

Check the RMS stats after your next CONVERT and adjust the numbers accordingly.

A crucible of informative mistakes

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Determine the actual size of data in an indexed file

Determine the actual size of data in an indexed file