- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: Convert bucket fill problem
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-09-2005 04:13 AM
тАО12-09-2005 04:13 AM
The records are simple 85 byte fixed records.
Run 1: 2,000,000 records loaded - bucket fills fine
Run 2: 3,000,000 records loaded - bucket fill slumps to 50%.
After run 1, all the data buckets show 190 records per bucket, but after run 2 the buckets contain 190, then 2, then 190, then 2 and so on.
The attachment shows the standard FDL used for both runs, and the results of anal/rms/fdl from the two runs.
The system is running Alpha V7.3-2 with all current patches.
Has anybody else see this behaviour?
Thanks
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-09-2005 04:18 AM
тАО12-09-2005 04:18 AM
Re: Convert bucket fill problem
conv/fast/nosort/stat/fdl=test/fill test3.seq disk$data1:[000000]test3.ifl
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-09-2005 05:38 AM
тАО12-09-2005 05:38 AM
SolutionConvert/fast will store seiers of records with the same primary key value starting in a fresh bucket.
Your file allows duplicates on the primary key.
Nothing wrong with that.
But convert was taught to speculate you did this for a reason and assumes a bunch of dups now, and maybe many more to come later.
It tries to avoid excessive bucket splits if the application did indeed add many more dups and tries to help by starting each actual series of duplicate primary key records in their own buckets.
You did a great analysis so far.
Now one more step... go DOWN into those data buckets with ANAL/RMS/INT and look at the key values for the 'empty' bucket, and for a packed bucket.
If this is a serious problem for you, then you may want to use CONVERT/NOFAST which will use plain RMS $PUTs to an empty file.
Obviously this is slower, but for single files it may be acceptable. Maybe even with 1 or 2 alternate keys the standard $PUT speed is acceptable (after SET RMS/IND/BUF=100), but beyond that, surely it will start to hurt too much to load millions of records.
As you oobserved, you can mitigate this problem (feature!) with a smaller bucket size.
It raises the question whether this is really a desirable feature, or at least a feature which would warrant an optional switch. Please consider a formal report to OpenVMS support, articulating your inputs on this.
Hope this helps,
Hein.
(if you'd like extensive help with this, beyond the scope of this forum, then check my profile on how to try contacting me).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-09-2005 05:54 AM
тАО12-09-2005 05:54 AM
Re: Convert bucket fill problem
Proost.
Have one on me.
jpe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-09-2005 06:14 AM
тАО12-09-2005 06:14 AM
Re: Convert bucket fill problem
It shows no data_key nor data record compression.
You probably should enable that. It tends to be a winner in both space AND CPU SPEED.
Yes, you read that right. Often enabling compression will SAVE cpu speed. This is because RMS spends more time dealing with records you do not want (the smaller the better) than the records you do want.
Take your example. Almost 200 recors in a data bucket. So for a single keyed read RMS will on average examine the keys for, and skip over the data in some 100 records you do not need to uncompress. Then eventually it finds the target (sequential, ordered, compare) and of course will burn a little cpu undoing the compression... just for the data in that record
Increasing compression may allow you to sufficiently shrink the bucket size without increasing the index depth.
But that's of no concern to you as from my 'back of the enveloppe' calculations you can drop down to a bucket size of 8 while still staying at depth = 2 for 2M records.
200 buckets in a record is a good few, sometimes too many (bucket lock contention).
Whether you do 1 IO per 200 (35) or 1 IO per 30 records when sequentially reading athe file might be less important than wasting time looking for records you do not want during keyed or alternate-key access.
Where does the 35 block bucket size come from? It's an odd number (sic). Something to do with the old VIOC cached IO size cut-off?
Smaller buckets may also make the global buffer cache more effective: either you can just have more, whilest using the same memory, or if you just need 400 irrespective of the size (dominant random access), then those same 400 will need less memory.
Very minor comment... the log shows you using POS/BUCK=nnn where nnn is the NEXT pointer in the current bucket. This is the default action, so a straight return or 'NEXT' will suffice.
Cheers,
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-09-2005 09:32 PM
тАО12-09-2005 09:32 PM
Re: Convert bucket fill problem
a) the source data was generated initially as a 1 million record file with unique keys.
The 2 millions rec file was a sorted amalgamation of 2 copies of the 1 million rec file, and the 3 million version was a combination of 3 copies of the original.
Thus in the case of the 2 million rec sample there were 2 values per key, and in the 3 million rec sample there were 3 values per key.
b) data/key/index compression was deliberately turned off whilst trying to analyse the problem - the production file uses compression, has secondary keys with null values. I am a keen advocate of compression.
c) the bucket size came about as I tried a binary search on bucket sizes to pinpoint my problem! The production systems are using 8-16 as bucket size, and run to about 6.5 millions recs.
d) I could certainly do with switching the intelligence off - imageine if the primary key was sequential (such as date)? Convert would waste 50% of file space when trying to tidy up a file! Disk space may be getting cheaper, but there are still finite limits in commercial environments.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-10-2005 01:17 AM
тАО12-10-2005 01:17 AM
Re: Convert bucket fill problem
Ah! I should have realized that based on the nice round number of records this was an artificial test file. In that case the explanation may be good enough as. The production file may not have a serious issue with this or may actually benefit from the feature. Who knows ? It does suggest an analyze step shortly after the next convert (or a test conver/share on the side).
b) ... the production file uses compression, has secondary keys with null values. ..
Excellent.
c) the bucket size came about as I tried a binary search on bucket sizes to pinpoint my problem!
and an other fine explanation.
d) I could certainly do with switching the intelligence off
Submit a low-level improvement request!
Low level because at this point we do not know whether this is actually eveer a real production time problem.
> imageine if the primary key was sequential (such as date)? Convert would waste 50% of file space when trying to tidy up a file!
No No, it would take actual duplication to trigger this. sequential non-repeating dates don't do this. That why you had some bucket nicely filled, others with just a few, duplicate key value, records.
Here is how it might help... Imagine a file is designed to hold parts and their modification requests, the part number being the primary key. Some parts get many changes and thus many dups. By making the part number the primary key the application gets all relevant records in one, or just a few IOs.. They live together. By convert starting out a fresh bucket for those popular parts the system avoids a bucket split as soon as yet an other modification record for a popular part is added. It also avoids contention on data records which happen to live together in the same bucket, and may make the cache more efficiently.
So it might just help some applications.
Anyway, good to read a message from someone who actually knows rms!
Regards,
Hein.