HPE Community read-only access December 15, 2018
This is a maintenance upgrade. You will be able to read articles and posts, but not post or reply.
Hours:
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

RMS Indexed Files - Data compression

 
SOLVED
Go to solution
Ramondo
Advisor

RMS Indexed Files - Data compression

(as distinct from Index and Key compression).

Hello,

I'd be grateful for a definitive answer on this one.

Does RMS Data Compression compress repeating same characters (up to 255?) anywhere in a record in an indexed file or only at the end?

If it does it anywhere, this this mean all occurrences of repeating characters in the same record e.g 20 spaces in the middle of the record, 20 zeroes at the end?

I want to add a 30 character filler to some RMS indexed files to avoid having to convert the file when new fields are added.

For documentation it's better if the filler is in the middle of the record (similar fields grouped together).

I would hope Data Compression on the filler would not increase the overall size of these files significantly (140, 220 & 500 byte records).

Thanks,

Ramondo.
9 REPLIES
Duncan Morris
Honored Contributor
Solution

Re: RMS Indexed Files - Data compression

Ramondo, welcome to the VMS forum!

Data compression takes place throughout the record. I am sure that Hein will chip in with full details.

I have attached a quick sample to demonstrate the compression. The FDL defined a single 5 char string key at the start of the record.

Volker Halle
Honored Contributor

Re: RMS Indexed Files - Data compression

Ramondo,

you can also look at the internal data within an indexed-sequential file with ANAL/RMS/INTERACTIVE. This allows you to navigate through the different internal structures of the file and to view the data records etc.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: RMS Indexed Files - Data compression

Did the same test.

Throughout the record but not in the key area of it. Overhead is 4 bytes in the record to indicate the compression, so I guess only compression when at least 5 times the same char. 4 bytes should indicate more than 255 char.

Wim
Wim
Ramondo
Advisor

Re: RMS Indexed Files - Data compression

Thanks for the prompt replies guys.

For some reason I thought analyze/interactive expanded the compressed items so I wouldn't be able to tell.

Cheers,

Ramondo.
faris_3
Valued Contributor

Re: RMS Indexed Files - Data compression

Hi,

Different compression types are detailed here :
http://h71000.www7.hp.com/doc/731FINAL/4506/4506pro_008.html#11_filestructure :

Prolog 3 Files


Prolog 3 files can accept multiple (or alternate) keys and all data types (including the nonstring 8-byte BIN8 and INT8 types). They also give you the option of saving space by compressing your data, indexes, and keys.

Key compression compresses the key values in the data buckets. Likewise, index compression compresses the key values in index buckets, and data compression compresses the data portion of the records in the data buckets.

Key or index compression is restricted to the string key data type and the string must be at least 6 bytes in length.

With key or index compression, repeating leading and trailing characters are compressed. With front key compression, any characters that are identical to the characters at the front of the previous key are compressed. For example, the keys JOHN, JOHNS, JOHNSON, and JONES appear as JOHN, S, ON, and NES.
With rear key compression, any repeating characters at the end of the key are compressed to a single character. For instance, the key JOHNSON00000 appears as JOHNSON0.

Enabling index compression results in RMS doing a sequential search in index buckets rather than its default binary search, since each index key value must be expanded until a match is found.

With data compression, RMS can compress sequences of up to 255 repeating characters in the data portion of the user data records. For optimal performance, RMS does not compress sequences having less than five repeating characters.

Compression has a direct effect on CPU time and disk space. Compression increases CPU time, but the keys are smaller, so your application can scan more quickly through the data and index buckets.

and also in this older article :

http://h18000.www1.hp.com/support/asktima/operating_systems/0093A18B-489951C0-1C009F.html




Hein van den Heuvel
Honored Contributor

Re: RMS Indexed Files - Data compression

The question has largely been answerred.
RMS compresses repeating chars anywhere in the record in chuncks up to 255 repeating bytes for data-compression

As it finds the start with word aligned word compares, it might not find the shortest possible string all the time.

For key compression there is front compression (those chars the same as the prior key... often very effective), and tail compression (repeat last char untill full key size is reached), but no repeating chars in the middle.

Adding a 'just in case' field does not cost much at all, but watch out for future 'back-fill'. If you were to go back and fill those empty fields with real data in a populated file, then this is likely to cause massive bucket splitting and a convert will be needed right after such job

Cheers,
Hein.
Ramondo
Advisor

Re: RMS Indexed Files - Data compression

Hein,

Thanks for your input. So I was mis-informed on the RMS Structures and Utilities course (about 10 years ago).

The reason for my question was that I was told only tail compression (on the record rather than index/key) was was performed with Data Compression, but this went against documentation I've seen since that suggested otherwise.

So, briefly:

Index Compression - Front compression (compare with previous index record), Tail Compression (repeating trailing characters)

Key Compression - Front compression, Tail Compression

Data Compression - Repeating same characters (up to 255) anywhere in the data portion of the record.

So that's 3 different types of compression (Front, Tail, Repeating?) on 3 different
'structures' (Index, Key Part Of Record, Data Part Of Record).

I'm adding a new field to some indexed files and want to make future changes easier by adding a filler. The idea is the file size doesn't increase much in size until the filler is needed.

I can't avoid changing the recordsize this time so I'll use a program (COBOL, Basic, or DTR) to create a sequential file and populate the new field with a default value and then convert back to indexed with the new recordsize in the FDL.

I'll need to do the same process for future changes but without the need to worry about changing the recordsize, so I should avoid potential re-compilation and 'invalid recordsize' errors.

I wanted to make sure where I placed the filler in the record wouldn't make a difference.

Thanks again,

Ramondo.
Hein van den Heuvel
Honored Contributor

Re: RMS Indexed Files - Data compression

>> So I was mis-informed on the RMS Structures and Utilities course

Or a minor miscommunications / misremember.

Anyway, your summary seems correct.


>> So that's 3 different types of compression (Front, Tail, Repeating?) on 3 different
'structures' (Index, Key Part Of Record, Data Part Of Record).

Right. With index and key really being the same. Key and data compression are always a winner, best I know. It tends to save CPU time. Basically, RMS spends more time dealing with uncompressed records (skipping them, moving them) then (un/re)compressing them. Index compression has the size vs binrary search trade-off which probably makes it desireable for large (> 20 bytes) keys.

>> The idea is the file size doesn't increase much in size until the filler is needed.

Correct.

>> to create a sequential file and populate the new field with a default value and then convert back to indexed with the new recordsize in the FDL.

Right. But had those fillers been added at the end a simple convert/pad can do the transformation. And the programs do not really care where the extra fields are in the record, unless they are part of a sun-structure to be moved/copied in a big chunk.
But then you'd be moving/copying nonsense for a while to come, untill the fields are used.

Oh, and while you are goimg over you files, potentially rearranging fields, be sure to check the tuning in general perhaps with my 'tune_check_ tool from the OpenVMS freeware website [rms_tools]

Enjoy!
Hein.
Ramondo
Advisor

Re: RMS Indexed Files - Data compression

Ta very much.

Closing this thread.