- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: Obtaining the key of a record about half-way t...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-15-2007 04:22 AM
тАО06-15-2007 04:22 AM
Obtaining the key of a record about half-way thru a file
This may seem a bit unusual, but I would guess that there have been many such items posted. I have an indexed file, two keys, each segmented, with anywhere from [say] 10K records to 30M records. RMS data record compression is enabled.
Is there a way to identify the key of the record that sits at or about the halfway point of the file?
I can guesstimate the approx record length with the compression and the overhead for the keys and can, by the EOF block, come within 10% of the number of records. I am looking for the key of the record that sits at 50% of that record number. For example, if I could open an indexed file as a direct-access sequential file, I could access by record number. I just can't seem to be able to do that (using Fortran).
Any ideas?
Much obliged, in advance.
-Howard-
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-15-2007 04:41 AM
тАО06-15-2007 04:41 AM
Re: Obtaining the key of a record about half-way thru a file
IF (big IF) the number of record are speead over the primary key range and somewhat predictably so, ten your best bet is a binary search or outrigth GET KGT "max-min/2"
Do you need this mid-way point once or repeatedly?
For just once I would use ANAL/RMS/INT
DOWN
DOWN
DOWN KEY
DOWN INDEX
DOWN (rootbucket header)
By carefully reading the bucket header data you can find how many entries there are in the root bucket (bucket-size minus VBN Free Space Offset divided by Bucket Pointer Size).
DOWN (first lower lever index pointer)
NEXT (entries/2)
Now... with index compression it'll be tricky to decode the key value.
Do just do a
DOWN (middle index bucket header)
DOWN firs key value... not compressed!
In a program you could use:
Create a file KEY_0.FDL like
FILE; ORG SEQ
RECORD; FORMAT FIXED; SIZE xxx
That xxx would be the TOTAL KEY SIZE for the primary key.
Now
$CONV/TRUN/PAD/STAT/FDL=KEY_0
$
$OPEN/READ keys KEY_0.SEQ
$middle="1234"
$middle[0,32] =
$READ/KEY=&middle keys middle_key
$SHOW SYMB middle_key
$CLOSE keys
The exact needs and feeds woudl of course define the optima solution.
It may involve walking the primary key index at level-1, through a special program bypassing RMS.
Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-15-2007 04:57 AM
тАО06-15-2007 04:57 AM
Re: Obtaining the key of a record about half-way thru a file
The biggest problem with the extract keys and divide that file by half is the suggestion of using the TKS, assuming the primary key starts at byte 0, which it does for 9-out-of-10 indexed files, but it does not have to!
It was indicated the key is segmented.
So the size needed really the last byte of the segment with the highest start position.
This may be a good chunk of the record.
In that case just use the whole record ?!
Please tell us which problem you are trying to solve, and why?
How often?
What are the data volumes involved?
Partitioning an overly large file?
Purging of old data?
Would it not be better to have a predictable, repeatable algortime?
Mayby something like all record before 1-1-2007 or customer 1,000,000 thru 2,000,000
or East and West.
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-15-2007 05:03 AM
тАО06-15-2007 05:03 AM
Re: Obtaining the key of a record about half-way thru a file
I suppose you could convert the indexed
file to what you want and let your fortran
at it. Hein's is an interesting approach.
Curious as to what you are trying to do
overall? -Dean
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-15-2007 05:12 AM
тАО06-15-2007 05:12 AM
Re: Obtaining the key of a record about half-way thru a file
In brief, I have an indexed ledger with the primary key consisting of the first 20 bytes of the 940 byte record, segmented into a 14 byte account and 6 byte fund.
The quantity of data ranges from 10,000 records up to 30million or more. We wish to perform some processing driven by this file. In doing so, the larger versions of this file are taking a substantial amount of time to process. (It is compute time processing that is the issue here. It is not the IO or RMS performance....I am sure.) One way to do divide the processing amongst multiple 'threads' (not using DECthreads...long story), is to have one process handle the first half of the file, and another process handle the second half of the file....we have 4 processors on the box.
This is not the only way to do this, but it seemed simple enough as long as we were able to open an indexed file sequentially with a direct access (by record number). However, as we found that this cannot be done (or at least we couldn't figure it out), we figured that we'd try alternatives.
Hope that this sheds some light on the task.
Many thanks,
-H-
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-15-2007 05:41 AM
тАО06-15-2007 05:41 AM
Re: Obtaining the key of a record about half-way thru a file
well if the key is alphabetical, you
could do something like this.. (eg in dcl)
$ open/read xx copysysuaf.dat
$ read/key="M" xx x
$ sho sym x
X = "....MCGORRILL
$ read xx x
$ sho sym x
X = "....MCPHERSON
that would put you in the middle of the
alphabet, keep reading until eof. that
would presume an even distribution across
the alphabet. an idea - Dean
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-15-2007 06:00 AM
тАО06-15-2007 06:00 AM
Re: Obtaining the key of a record about half-way thru a file
Anyway... I would strongly suggest you consider letting history teach you the right break down.
You personally probably already know 'the bad boys' from the easy ones.
Now make sure the application knows!
It is not likely to totally change overnight is it? (and even if it did, no harm done).
What you do TODAY, without changing the application main algortime, is to create creates a lookaside list containing some main volume indicators while processing.
By company, customer, by fund, I would not know until knowing more about the file itself.
Let's say you can create a list of customer numbers and records processed. Now just sort those my number of records. For the processing let the streams pick work elements from the sorted list from big to small. So the first one kicked of will be the biggest on. The next stream picks the next. As a stream is done, it picks more, and smaller items untill all done.
For sake of locatity of reference you may want to tweak this by grouping into large, medium, small units and key customers sorted within those ranges.
Any run will produce the sort order for the next run. SMOP! Easy!
Detailed help, or suggestions on how to truly break up the file in similar sized ranges woudl seem to go beyond the scope of a quick hint in a public forum. Email me if need be.
Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-15-2007 07:22 AM
тАО06-15-2007 07:22 AM
Re: Obtaining the key of a record about half-way thru a file
Run a coordinating process that creates "n" mailboxes and "n" processes, and each of these created processes connects back into its mailbox and says "hi!" to the server. The server then tosses a starting record and a range of records, or a "done!" message that tells the created process to clean up and exit.
For the first process that queues its "hi", the coordinating process gives it record 1 and the first, say, 1000 records. The second gets 1001 and 1000, etc. Each process does a keyed get on the starting record (key equal or greater, assuming the key is not issued sequentially), and stops when it gets to a record above its specified upper limit.
Use a termination mailbox to catch run-time errors, should a created process tip over and exit unexpectedly.
Why get fancier than you need to be here splitting up the work, when you can brute-force the parallelism and do nearly as well. And when the code itself can adapt to the file and its contents. Tailor your 1000 record hunk to your run-time for the clients, so that the coordination overhead (which will be minimal with the mailboxes) doesn't win out over the run-time.
Stephen Hoffman
HoffmanLabs LLC
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-15-2007 08:52 AM
тАО06-15-2007 08:52 AM
Re: Obtaining the key of a record about half-way thru a file
Most likely. But a single process / thread does not need locking. For some application which do little per record processing themself, the cost of locking may be prohibitive. For others the locking overhead is minor. The suggestion earlier was that the per record processing is significant so parallelizing will likely help.
> Run a coordinating process that creates "n" mailboxes and "n" processes, and each of these created processes connects back into its mailbox and says "hi!" to the server.
Could even be more simple. Literally $TYPE or Convert a task list file into a mailbox. Each idle server grabs a message, processes the chunk of business data associated with it, and looks for the next tasks. Exit on EOF.
>> For the first process that queues its "hi", the coordinating process gives it record 1 and the first, say, 1000 records.
Ya but, the suggestion there is that it hard to recognize te size of ranges from the primary key.
Let's says the primary key is STATE + social security number within state. The first stream could start with AK, the next AL, AR, AZ,... Fortunately CA comes early in the alphabet, but probably Texas hits hard down stream having about as many folks (23M) as all the states that follow together. So you probably want to split texas, but what is the right place for that?!
To divy it up nicely, you would have to count'm which may be only 1% of the processing cost, but could be 50% of the total cost.. on a poorly organized file.
>> Why get fancier than you need to be here splitting up the work, when you can brute-force the parallelism and do nearly as well.
Because of contention and caching effectivness reasons? You want the data concurrently processing close, but not too close.
>> And when the code itself can adapt to the file and its contents.
And that was the core of my suggestion.
To establish a pattern while processing is probaly zero overhead. Use that for good guess on the next run.
Cheers, (almost Miller time here)
Hein.
http://factfinder.census.gov/servlet/GCTTable?_bm=y&-geo_id=01000US&-_box_head_nbr=GCT-T1&-ds_name=PEP_2006_EST&-_lang=en&-format=US-9&-_sse=on
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-15-2007 02:30 PM
тАО06-15-2007 02:30 PM
Re: Obtaining the key of a record about half-way thru a file
Yes. Though isn't the goal here to keep the processors in the box busy. You might not end up with all the client processes exiting at nearly the same time, but again, is that so critical as long as the resulting run-time is less than the monolithic run-time.
Who cares if you split (following your example) Texas into one hunk, two hunks, or into hundred smaller hunks?
And over time (days and weeks), you can have the server coordinating the operation develop a better idea of the patterns (either heuristics or based on explicit input from the user); to build up run profiles, if you have particularly disparate processing runs. During one aggregate run (over minutes or hours), you can tell when and how fast the servers are finishing and can infer how dense the fill might be. From that, you can guess at the spans for successive sections as the client processes run.