Convert bucket fill problem

Duncan Morris · ‎12-09-2005

I have run into a very strange problem with convert. Whilst attempting to tune some large files on our local system I found that I could not get convert to fully pack the buckets. Increasing the size of the buckets led to worse packing!

The records are simple 85 byte fixed records.

Run 1: 2,000,000 records loaded - bucket fills fine

Run 2: 3,000,000 records loaded - bucket fill slumps to 50%.

After run 1, all the data buckets show 190 records per bucket, but after run 2 the buckets contain 190, then 2, then 190, then 2 and so on.

The attachment shows the standard FDL used for both runs, and the results of anal/rms/fdl from the two runs.

The system is running Alpha V7.3-2 with all current patches.

Has anybody else see this behaviour?

Thanks

Duncan Morris · ‎12-09-2005

and this is the command used to do the converts

conv/fast/nosort/stat/fdl=test/fill test3.seq disk$data1:[000000]test3.ifl

Hein van den Heuvel · ‎12-09-2005

Easy, but little known fact.

Convert/fast will store seiers of records with the same primary key value starting in a fresh bucket.

Your file allows duplicates on the primary key.
Nothing wrong with that.
But convert was taught to speculate you did this for a reason and assumes a bunch of dups now, and maybe many more to come later.
It tries to avoid excessive bucket splits if the application did indeed add many more dups and tries to help by starting each actual series of duplicate primary key records in their own buckets.

You did a great analysis so far.
Now one more step... go DOWN into those data buckets with ANAL/RMS/INT and look at the key values for the 'empty' bucket, and for a packed bucket.

If this is a serious problem for you, then you may want to use CONVERT/NOFAST which will use plain RMS $PUTs to an empty file.
Obviously this is slower, but for single files it may be acceptable. Maybe even with 1 or 2 alternate keys the standard $PUT speed is acceptable (after SET RMS/IND/BUF=100), but beyond that, surely it will start to hurt too much to load millions of records.

As you oobserved, you can mitigate this problem (feature!) with a smaller bucket size.

It raises the question whether this is really a desirable feature, or at least a feature which would warrant an optional switch. Please consider a formal report to OpenVMS support, articulating your inputs on this.

Hope this helps,
Hein.

(if you'd like extensive help with this, beyond the scope of this forum, then check my profile on how to try contacting me).

Jan van den Ende · ‎12-09-2005

... and then maybe HP will need to hire the now-independant consultant Hein van den Heuvel (you recognise the name) to make it so. :-)

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Hein van den Heuvel · ‎12-09-2005

Minor comments on the attachment contents...

It shows no data_key nor data record compression.

You probably should enable that. It tends to be a winner in both space AND CPU SPEED.
Yes, you read that right. Often enabling compression will SAVE cpu speed. This is because RMS spends more time dealing with records you do not want (the smaller the better) than the records you do want.

Take your example. Almost 200 recors in a data bucket. So for a single keyed read RMS will on average examine the keys for, and skip over the data in some 100 records you do not need to uncompress. Then eventually it finds the target (sequential, ordered, compare) and of course will burn a little cpu undoing the compression... just for the data in that record

Increasing compression may allow you to sufficiently shrink the bucket size without increasing the index depth.
But that's of no concern to you as from my 'back of the enveloppe' calculations you can drop down to a bucket size of 8 while still staying at depth = 2 for 2M records.

200 buckets in a record is a good few, sometimes too many (bucket lock contention).
Whether you do 1 IO per 200 (35) or 1 IO per 30 records when sequentially reading athe file might be less important than wasting time looking for records you do not want during keyed or alternate-key access.

Where does the 35 block bucket size come from? It's an odd number (sic). Something to do with the old VIOC cached IO size cut-off?

Smaller buckets may also make the global buffer cache more effective: either you can just have more, whilest using the same memory, or if you just need 400 irrespective of the size (dominant random access), then those same 400 will need less memory.

Very minor comment... the log shows you using POS/BUCK=nnn where nnn is the NEXT pointer in the current bucket. This is the default action, so a straight return or 'NEXT' will suffice.

Cheers,
Hein.

Duncan Morris · ‎12-09-2005

Many thanks for that Hein.

a) the source data was generated initially as a 1 million record file with unique keys.
The 2 millions rec file was a sorted amalgamation of 2 copies of the 1 million rec file, and the 3 million version was a combination of 3 copies of the original.
Thus in the case of the 2 million rec sample there were 2 values per key, and in the 3 million rec sample there were 3 values per key.

b) data/key/index compression was deliberately turned off whilst trying to analyse the problem - the production file uses compression, has secondary keys with null values. I am a keen advocate of compression.

c) the bucket size came about as I tried a binary search on bucket sizes to pinpoint my problem! The production systems are using 8-16 as bucket size, and run to about 6.5 millions recs.

d) I could certainly do with switching the intelligence off - imageine if the primary key was sequential (such as date)? Convert would waste 50% of file space when trying to tidy up a file! Disk space may be getting cheaper, but there are still finite limits in commercial environments.

Hein van den Heuvel · ‎12-10-2005

a) the source data was generated ...

Ah! I should have realized that based on the nice round number of records this was an artificial test file. In that case the explanation may be good enough as. The production file may not have a serious issue with this or may actually benefit from the feature. Who knows ? It does suggest an analyze step shortly after the next convert (or a test conver/share on the side).

b) ... the production file uses compression, has secondary keys with null values. ..

Excellent.

c) the bucket size came about as I tried a binary search on bucket sizes to pinpoint my problem!

and an other fine explanation.

d) I could certainly do with switching the intelligence off

Submit a low-level improvement request!
Low level because at this point we do not know whether this is actually eveer a real production time problem.

> imageine if the primary key was sequential (such as date)? Convert would waste 50% of file space when trying to tidy up a file!

No No, it would take actual duplication to trigger this. sequential non-repeating dates don't do this. That why you had some bucket nicely filled, others with just a few, duplicate key value, records.

Here is how it might help... Imagine a file is designed to hold parts and their modification requests, the part number being the primary key. Some parts get many changes and thus many dups. By making the part number the primary key the application gets all relevant records in one, or just a few IOs.. They live together. By convert starting out a fresh bucket for those popular parts the system avoids a bucket split as soon as yet an other modification record for a popular part is added. It also avoids contention on data records which happen to live together in the same bucket, and may make the cache more efficiently.
So it might just help some applications.

Anyway, good to read a message from someone who actually knows rms!

Regards,
Hein.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Convert bucket fill problem

Convert bucket fill problem

Re: Convert bucket fill problem

Re: Convert bucket fill problem

Re: Convert bucket fill problem

Re: Convert bucket fill problem

Re: Convert bucket fill problem

Re: Convert bucket fill problem