Operating System - OpenVMS
1748204 Members
3334 Online
108759 Solutions
New Discussion юеВ

Re: File sizing question

 
SOLVED
Go to solution
Willem Grooters
Honored Contributor

File sizing question

Consider I have a file, record size of 200 bytes. I need to expand the record size to 256 bytes and a secondary key of 18 bytes is added.
A simple CONVERT is not sufficient because of changes in the record layout, so I will use a (delivered) conversion program. This program will use a (also delivered) FDL-file to create the new file after is has renamed the old one. It will then populate the new file with the converted contents of the old one.

Conversion of a file takes a long time due to the number of records.

In order to speed up this process, I think that the new file will need to be created using sufficient allocation for each area, so I'll need to edit the allocation statements in the FDL-file.

Is there some 'rule of thumb' on the required increase? For example: just an expansion of 56 bytes on a 200-byte record would mean an increase of just over 25% per record. Would a same increase in allocation be sufficient in data area's?
What about added secondary keys, and keys changed in size?
If a key is segmented, will that constitue a penalty?

Just a 'rule of thumb' is sufficient, since this conversion will be done on different files in different environments.

Willem
Willem Grooters
OpenVMS Developer & System Manager
12 REPLIES 12
Robert Gezelter
Honored Contributor

Re: File sizing question

Willem,

Presuming that you are not using compression, and the records are fixed length (not up to 200, growing to a fixed length of 256) the growth factor of approximately 25% should be good for a straight conversion. Without going into depth, I would be more concerned about the bucket size and whether you will start creating bucket splits.

Really, a careful check of the FDL is probably called for.

Indices are a different matter. Perhaps Hein can weigh in, but my offhand recollection is that there are enough variables to make any simple calculation a question mark.

- Bob Gezelter, http://www.rlgsc.com
Jan van den Ende
Honored Contributor

Re: File sizing question

Willem,

I would think that the actual fill percentage of the old file, and the desired one of the new, are of much more importance.

Once operational, will the file (according to the main key) be added to at the end, or inserted-into more-or-less random?

In the first case, fill factor 100% is appropriate, in the second case:
make a plan for periodic re-conversion.
(example: once a year) Then estimate the expected growth in that period, and calculate the size you would need for that. Take a few % extra. Adjust your fill factor to now spread evenly over the resulting size.

For current expansion guesstimate:
Take the record length and add to that the lengths of all keys. Do this for old and new record descriptions. The factor between those will be the best guess I can give.

If Hein got different views: believe him rather than me!

Tot donderdag.

Proost.

Have one on me.

Seasonal greetings to all!

jpe

Don't rust yours pelled jacker to fine doll missed aches.
Hein van den Heuvel
Honored Contributor
Solution

Re: File sizing question

It sounds like the conversion uses straight $PUTs to the target indexed file. That will cost many IOs per record. Any file growth overhead will be mininal, with a reasonable extend (10,000 ? 50,000?). I woudl recommend focussing on optimizing the process itself, not the extend. Later, with the conversion done, then re-convert with appropriate initial area sizes and good extends.

- 25% data record size will match closely to 25% increase in buckets used except for the silly (!$) 1 and 2 block bucket sizes where it can mean the difference between say only 3 records fitting where 4 used to fit.
- Data record compression, which you really should have, will further minimize the relative growth as the overhead does not change, but the compressable data increases.
- secondary keys: let edit/fdl calculate. the simple approach is to take the key size + 7 bytes times the number of records for MB used.
- segmentation has no on-disk cost at all.

conversion advice

- do a test convert on 10,000 record or so. Then you can see how big the various areas become, and how effective the compression is.
perl -pe "last if ($.>10000)" < old.dat > test.dat

- see if you can use a SEQUENTIAL output file for the transformation, then convert/fast/nosort/stat/fdl=.. for the new files.

- if you must use an indexed file, see if you can just use the primary key during the transfortmation, then convert to multy-key

- if you must use the target file with multiple keys, be sure to add lots of buffers for the transformation. SET RMS/IND/BUF=255 (or a few thousand global buffers)

- if you can change the source, be sure to use DEFERRED WRITE with many buffers during the transformation. For a typicall batch job transfor this can be a 10x difference, as buckets are only written when a new buffer is needed. So with a typical 10 to 20 data records / data bucket that avoids many IOs.
If the (new) alternate key is somewhat in order with the primary key, then the savings there are significant as well, otherwise you will at least save in the index maintenance.


For further help, be sure to provide further details. 1,000,000 records or 100,000,000? attach an old ANA/RMS/FDL with stats and a new FDL? Stats on a 10,000 record convert?

Met vriendelijke groetjes, en beste wensen voor het nieuwe jaar...

Hein.
Willem Grooters
Honored Contributor

Re: File sizing question

Thanks so far.
Hein, as usual, you gave enough hints for further investigation, but I cannot use your suggestions in creating a sequential file, of just use the primary only.

The problem point is that most recent data, is to be converted first in order to interrupt normal procesing as short as possible. The rest is done - part by part - concurrent with normal processing. If this takes hours, I wouldn't mind.

A full, one-run conversion of one file, containing just of 3 million records, took 6,5 hours, and the total time used to convert the same amount part-by-part in parallel took 9 hours in total.

I found one reason: bucket size: the original file had a bucket size of 35 on each eare and the new file a bucket size of 3....

Willem
Willem Grooters
OpenVMS Developer & System Manager
Robert Gezelter
Honored Contributor

Re: File sizing question

Willem,

A thought. You may be able to take Hein's suggestions about a sequential file by reversing the sequence of conversion steps, to wit:
- unload the historical data to a sequential file
- convert the historical sequential file to the new format, as a sequential file
- perform a mass convert of the resulting file to the new index file
- do the indexed update of the recent information (this is the step that is time critical)

Of course, for maximum efficiency, make sure that:
- when doing file conversions your RMS buffering parameters are set high (the performance effect can be very significant)
- that the work files used by CONVERT are on a separate, fairly idle disk
- the source and destination files should be on different disks.
- since you are converting historical data, you can run the mass conversion at a time of low system load.

In short, you may be able to do this very efficiently and quickly, and not forgo any of the advantages of a fairly well organized file (I presume that the historical data is far larger than the "recent" data). The mass convert will produce better indices than adding one record at a time.

I hope that the above is helpful.

- Bob Gezelter, http://www.rlgsc.com
Willem Grooters
Honored Contributor

Re: File sizing question

The "solution" is procedural.
The converion is done in the weekend, directly after backup. Data not older than two weeks is converted first, different files in parellel - it's quite unlikely that older data is required at the time. After this first set, users are allowed to access this data.
This first conversion adds up to one hour that the application is not accessable - which is considered acceptable.

This has been used for years and the rsponsible people (application and sysem managers) are used to it. Now we face a loss of experience and knowlegde, especially in the system management area, so we prefer to stick to this procedure, in stead of adding extra steps - as long as it's possible.
We may face the situation that even a first conversion will take too much time so we have to think about solutions.

Also, conversions are a required normal procedure so the more we can speed them up, the better.

Willem
Willem Grooters
OpenVMS Developer & System Manager
Robert Gezelter
Honored Contributor

Re: File sizing question

Willem,

My experiences are that for mass conversions and reformattings, it is often far faster to:
- unload all of the historical data to a sequential file (if one is talking about a 5-year history, and current records are considered to be the last month, assuming uniform distribution, 1/60 or approximately 1.6% of the records are current).
- reformat the sequential file (this is very IO bound, I often use VERY LARGE RMS blocking, buffering, and extend sizes during this stage, it improves system efficiency and wall time tremendously -- read that as I have seen factors of 10!).
- use CONVERT to build the full indexed file from the reformatted sequential file, leaving the buffer factors high
- add the "recent" data directly to the resulting indexed file (quite possibly less than the hour you mentioned in your last posting).

This should be far faster than running multiple RMS indexed streams processing the file and reformatting records individually. It also produces far more efficient and well organized indices and buckets. Since 98.4% of the file was loaded en-masse, only a small percentage of records will have the potential of splitting buckets or producing other performance losses.

- Bob Gezelter, http://www.rlgsc.com
Willem Grooters
Honored Contributor

Re: File sizing question

Fully agreed Bob. If I have the chance of revise the conversion methodology, I would certainly take that in account. But that's long-term - and the procedures need to be completely fool-proof.

Willem
Willem Grooters
OpenVMS Developer & System Manager
Robert Gezelter
Honored Contributor

Re: File sizing question

Willem,

Agreed. That is the beauty of the "convert the history first" approach. The bulk of the conversion can be done in advance. If problems are encountered, the bulk conversion can be re-run.

My best wishes for a happy and healthy new year!

- Bob Gezelter, http://www.rlgsc.com