Re: About perfomance with RMS relative files...

Bob CI · ‎10-01-2008

Hello to everybody the forum!!!!

We have a process in our installation that makes a lot of writes in ascending order in a relative rms file. The size of the record is 500, less than 1 block. The bucket size, actually, is 12 blocks, but we are only worried in the writing capacity of the process, and not in the performance of the reader processes. We want to speed up writings.....We want more writing per second...

I know that from the point of view of a writer, and with deferred write no suitable for our application ( critical integrity ), smaller bucket sizes are reasonable, and we want to reduce the bucket size of the file, that has value of 12 for years...

In an parallel way, system manager comment us, that with EVA in our GS80, he is interested in reducing the io sizing, because EVA has less stress with small ios.

We can make a lot of probes with different values of bucket sizes from the actual 12, each time smaller than previous time.

But, with the tools that has the operative system ( because system manager doesn´t let use other tools that operative system has ), like MONITOR RMS, MONITOR IO, MONITOR DISK and others,

I can meter the io writing io in the process taking times....but

How can detect which is the optimal value from bucket size with system tools, when we make simulations with stressed stages ????

Thanks!!!

Wim Van den Wyngaert · ‎10-01-2008

http://vms.cc.wmich.edu/disk$openvms0731/000000/731final/4506/4506pro_008.html#apps_tuning_relative_file

Wim

Wim

Wim Van den Wyngaert · ‎10-01-2008

I know nothing but would try :
1) minimize the cluster size on disk
2) get the cluster size of the file the same
3) disable buffering for the process as much as possible (set rms)
4) disable EFC (set file/cache=no
5) make sure that the extension qty of the file is high (to avoid regular enlargments)

Of course only when reading is not the issue.

May be you could post the code writing to the file.

Wim

Wim

Hein van den Heuvel · ‎10-01-2008

>> We have a process in our installation that makes a lot of writes in ascending order in a relative rms file.

You may want to review whether the application actual uses any of the relative file semantics. Does it 'delete' records and/or test whether a record read existed or not? You may find that a simple fixed length record file, or a block mode access file (SYS$READ, SYS$WRITE) would do just fine with less overhead and more flexibility allowing readers to use large chunks.

>> The size of the record is 500, less than 1 block.

That's not too important, but does suggest you could go down all the way to 1 block buckets.

Is there CLUSTER level sharing of the file?

>> The bucket size, actually, is 12 blocks,

That's not excessive it seems.

>> We want to speed up writings.....We want more writing per second...

Smaller buckets will help a little, but very little. What rates are we currently talking about?

>> I know that from the point of view of a writer, and with deferred write no suitable for our application ( critical integrity ),

Yeah yeah, that's what they all say.
But you may find that you can defer writes to some degree. For example, think about what Oracle does. On first commit it triggers a write IO. All commits coming in while that IO is active are grouped together and when the first IO ready, then all accumulated commits are executed with a single IO more.
You can do something very similar very easily.
1) Try using RMS Async IO (RAB$V_ASY)
2) Use deferred write. Set a 10 millisecond timer on the first IO when no timer is active. Just put if timer is armed. Commit when timer expires.

Depending on your needs, at some point something gotta give. Pick:
- speed,
- wallet
-- fastest controllers possible in use?
-- buy a solid state disk?
- integrity.

>> smaller bucket sizes are reasonable, and we want to reduce the bucket size of the file, that has value of 12 for years...

So try it.?

In an parallel way, system manager comment us, that with EVA in our GS80, he is interested in reducing the io sizing, because EVA has less stress with small ios.

Sure.

>> We can make a lot of probes with different values of bucket sizes from the actual 12, each time smaller than previous time.

Yes. You could have written a test in less time than it took me to reply to this.

There is nothing like just trying on your system, with your exact cpu, memory, cables,...

>> But, with the tools that has the operative system ( because system manager doesnÂ´t let use other tools that operative system has ),

Fire his sorry ass. That person is a risk to your environment. IMHO system managers are SERVANTS, not rulers.

>> like MONITOR RMS, MONITOR IO, MONITOR DISK and others,

SDA XFC TRACE.
LD IO TRACE

>> I can meter the io writing io in the process taking times....but

Yes, and that is the only real way.

>> How can detect which is the optimal value from bucket size with system tools, when we make simulations with stressed stages ????

The shortest time to write N (1000?) ascending blocks to a pre-allocated, contiguous file repeated M (1000?) times. That's all.

I suspect you will find 1 is the best, but with very little difference, possibly not enough difference to hurt the readers with.

Keep an eye on CPU time. With OpenVMS 8.2 and better you can instrument your test program with GETJPI for USER, EXEC and KERNEL mode cpu time usage. (I wrote simple 'my-timer' functions for that )
Those CPU stats would be very interesting to see alongside of the elapsed times!

What rates are being obtained now?
Where do you need to be?

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting

Wim Van den Wyngaert · ‎10-01-2008

I sure know nothing. Replace 1 and 2 by

1) minimize the bucket size of the file

Wim

Wim

Hein van den Heuvel · ‎10-01-2008

Ah yes Wim, thanks for reminding us.

If there is any concern about write speed, then that output file had better be totally pre-allocated and contiguous as well.

Bob did not mention this, but I suspect that is in place. It had better be as is is a free, first step.

With full contiguous pre-allocation in place as it should be, the clustersize and extend quantities become totaly irrelevant.

Cheers,
Hein.

Hein van den Heuvel · ‎10-01-2008

Ah, cute timing....

Wim wrote>> I sure know nothing.

And Hein replied>> Ah yes Wim, thanks for reminding us.

But that was in reference to the earlier extend/clustersize reply!

:-)

:-)

Hein.

Wim Van den Wyngaert · ‎10-01-2008

Hein will correct me if I'm wrong but you could use varaiable length if the average erecord length is substantialy lower than 500.

Also, if the file is very fragmented and you have concurrent readers, you could get window turns (or whatever it's called nowadays) that delay the file writes. A fragmented file could also cause many disk head moves.

Wim (rms-ing again after 20 years, rust never sleeps)

Wim

Hoff · ‎10-01-2008

Writes per second are limited by the available bandwidth from the host out to the media. If you need "faster", you need more aggressive caching, fewer (and larger) I/O operations, simpler I/O, and/or faster hardware. Or you shard the application processing.

As for the hardware, the AlphaServer GS80 is ancient gear, and a mid-range Integrity will very likely provide better performance. I might well look as low as an Integrity rx2660 or Integrity rx3600 as a replacement box, or one of them blade-thingies HP is so fond of.

You indicate this I/O is lowercase-i integrity-critical. How many file updates can you drop? What are the application failure scenarios? What sort of journaling and archival processes are you using?

Here, I'd look at whether it is feasible to implement the mirroring of your data within the application. One more simplistic path (trading off simplicity for speed) that's reliable on failure (for your transaction log), and a second path that's cached and fast and shared (your production file).

And you mention "installation". That's ambiguous. Is this a one-time file I/O sequence that happens when your application is loaded, or a reference to an ongoing matter within your environment?

I might well prototype direct virtual I/O, too. But RMS does pretty well here, as a rule. (I'm generally loathe to rewrite and replicate the features of a file system or a database, as that tends to turn into a maintenance problem.)

And disk fragmentation and other system and disk tuning matters are on the table here, too. But I'll assume that investigation is either underway, or has not been particularly fruitful. (And once the low-hanging fruit has been picked, faster hardware is usually a better choice when cost is a factor.)

As for sharding, that's analogous to an application-level form of disk I/O striping; where the application load is split across multiple servers and/or multiple storage controllers. This requires you to have some sort of a cleft or mechanism where you can split and route the I/O. With disk striping, the numeric block address determines which physical spindle the block ends up on within a striped disk; within a stripe-set. If you're working with tag-value pairs in your relative file, this could be where you have the first gazillion tags to one server, the second gazillion to another, and so on.

Though one interpretation of the comments from the system manager here is as a roadblock, the other is that you're trying this application testing on a production server. Now whether production monitoring is in place (and it is often best to have that available), that's another discussion. And moving toward more and smaller I/Os produces more stress on an I/O system, in my experience. Not less. And regardless, there do appear to be some staffing-level stress points here that need to be addressed. (It's distinctly possible that this ITRC thread will eventually come to the attention of the system manager, for instance.)

Bob CI · ‎10-02-2008

Ok! I understand all yours offers....

We donÂ´t use journaling for speed up writings on it and the application makes labours of recovering the situation after a crash, because these writings are result of a executed transaction that builts these messages, apart of other messages toward mailboxes, BBDD, etc...). These writings are generated again if it is necesary.

Every morning we begins with a empty file.
and the file receives several millions of records per day. We used fixed record format.

This file receives every day, enought space to allocation for the day, with a optimized FDL with great allocation, and rarely needs to extend.....

Really, the only we want is to know, is if we decrease bucket size in this relative file from 12 to 4 for example, if our writing ratio with EVA will suffer.

We understand that the writing ratio will not suffer, and perphaps will be increased,
but we can see this and confirm with the tools that operative system offers, if is possible, apart of statistical report with our meters.

Thanks in advance!

Hoff · ‎10-02-2008

Will it hurt or help? I don't know.

That determination would require some tests.

My guess: "not much".

Sitting here, I don't know how your box is tuned and I don't know how your disks are initialized (in some cases, having clusters aligned on 16x block boundaries can have a performance benefit within the processing of some of the storage controllers; there's a knee in the block alignment), and I don't know what sort of application I/O pattern is in use here. I run these tests regularly during I/O and performance investigations, looking for the "knee" in the throughput curve. In this range of blocks, I'd not expect a big effect either way. Try it. Check the data. Tweak the input. Try it. Ugly. Slow. Expensive.

Finding the knee is easiest with the I/O monitoring tools loaded. Otherwise you end up instrumenting the code itself -- which certainly isn't a bad thing -- to collect performance data. But that adds schedule time and adds cost requirements.

Unless you are right on a performance boundary (one of those knees I mentioned earlier) or unless you're tossing massive amounts of data or unless your I/O load has changed substantially since the last tuning pass, small and incremental adjustments in RMS usually don't have a big payback once the initial system and file tuning passes. Application changes and hardware changes tend to have a bigger payback, and on-going tuning (save in these exceptional cases) gets you marginal results. (And an AlphaServer GS80 says "you've been here a while"...)

It is often difficult to work around administrative and managerial matters and roadblocks using technical means; technology isn't good at that class of problems. Put another way, you need either a prototype platform or a testing window on the production gear, and you need to have the tools available or you need spend the time (and the budget) replicating what the tools can provide.

If you know the number of entries you're going to get each day, then going completely virtual I/O and to one (or two) blocks per entry (or whatever) would be the fastest I/O; you'll have most of RMS out of the way. It may or may not be the most efficient path in terms of caching and related, however.

I'm still not entirely certain what you're optimizing for here, too. I see both EVA load, and application speed. Small random I/Os are about the worst. Sequential I/Os are somewhat batter when there's no contention, and caching or in-memory storage is the best. Assuming large quantities of data and using P2 space, for instance, and running it all out of application memory tends to reduce the I/O activity. This is where I toss the data journal remotely.

If there were a single "go fast" button for application I/O, it'd very probably already have been pushed for you...

Bob CI · ‎10-02-2008

>>>> I'm still not entirely certain what you're optimizing for here, too. I see both EVA load, and application speed. <<<<<<

Really, we want to ajust to the recommendations of our system manager about smallers ios with EVA, but without damage in our writing rates with our hot relative files....

We want to decrease file bucket size, but bosses want to see how the numbers in the monitoring tools confirm that there isnÂ´t risk of performance loss with this change.

What parameters we must to check in MONITOR (RMS or whatever) to confirm this event, apart of our own report programs ????

For bosses is very important to confirm this, not only with our report, but with monitoring tools too.

Thanks!

Hein van den Heuvel · ‎10-03-2008

Let me summarize this.
- folks do not know how to use existing tools
- folks refuse to add other tools
- tools must be used.
- real information,the measurements from the application, may not be used.

Something is gonna have to give. Probably you.
Unfair, but that's how the cookie tends to crumble.

Lower is probabably better.
You will have to try it in a test.

Need a counter argument?
RMS will pre-read before the write.
So with a 12 block bucket, and a 500 byte record size you will do 1 read-IO for each 12 writes. Now if you go all the way down to a 1 block bucket, you'll need 12 reads an 12 writes for the same amount of change.
While the XFC will mitigate this, and can possibly be tricked to completely hide the read IOs, it will still cost.
EVA Write IOs are fast... say 1ms.
EVA reads CAN be fast from cache, but may well be slower than writes.
For sake of the argument, let's say a read IO is as fast as a write (it is not).
The benefit of shorter writes is NOT proportional to the write size. Let's just say that a 12 block write might just be 2 imes slower than a 1 block write and a 4 lock write 1.5 times slower. And for fun lets call a 12 block write 1ms.
Now do the math in an excel spreadsheet:
12: 1R + 12W = 13ms
4: 3(R/1.5) + 12(W/1.5) = 10ms
1: 12(R/2) + 12(W/2) = 12ms
... maybe.
Toss in a few slow reads and the 12 block size wins.

You may just needs a good performance consultant to break through all this.
I happen to know a few suitable ones (your system manager may disagree :-) if interested.

Cheers,
Hein van den Heuvel
HvdH Performance Consulting.

Hoff · ‎10-03-2008

Hein missed one: quit. Seriously. If the place is as screwed up as it appears from your postings here, it's time to tune up your resume and look for different (and better) employment. That, or check your cares at the door and take pleasure in simply watching the organizational disfunction and self-destructive behaviors. You're splitting the distance between these two career options here (I can tell from the way you're assigning the points) and that's usually on the impeding cardiac path.

Bob CI · ‎10-03-2008

Hein, thanks for your explanations!!!!

ItÂ´s the kind of things that nobody teach us!!! And I donÂ´t know any documentation that explain it.

But I have a doubt about read and write cycles that you comment.

RMS must to read the bucket although the record to be inserted was new ???

In my relative file, there is never update operations, always are insert operations, since sequence 1 until sequence 2000000 ( for instance ), and in this ascending order.

IÂ´m sorry if i donÂ´t know enough.....

Thanks, thanks...and thanks....

Hoff · ‎10-03-2008

>> ItÃ Â´s the kind of things that nobody teach us!!! And I donÃ Â´t know any documentation that explain it.

Application and system performance and tuning is still an experimental regimen. The books (the good ones) will teach you how to run tests, and the books tell you to run the tests. The performance management manual in the OpenVMS documentation set tries to explain the general principles here, for instance. (As the OS crowd finds generic go-fast fixes and tweaks, they tend to implement those, too.)

Performance varies from one device to the next. From one hardware generation to the next. From one firmware revision to the next. From one application to the next. Sometimes massively. And you find surprises, such as the benefits of cluster alignments on some of the controllers.

Sharing and clustering can both slow the I/O rates, too.

>> But I have a doubt about read and write cycles that you comment.

Go try it. It's distinctly possible that Hein might be wrong here, but I'd not bet against him.

>> RMS must to read the bucket although the record to be inserted was new ???

RMS has to figure out where to put the new data, and particularly whether it needs to extend the file. And (if you're sharing the file) coordinate access.

If you roll your own record management using direct virtual I/O, the XQP will still have to perform a few reads on your behalf to figure out where to put stuff. These reads usually happen out of cache. And you'll have to track file positions.

>> In my relative file, there is never update operations, always are insert operations, since sequence 1 until sequence 2000000 ( for instance ), and in this ascending order.

Reduce the numbers of I/Os by going larger (caching and coalescing), or try tweaking and testing (requires embedded diagnostics and/or monitoring tools). Or go to faster hardware.

Two million records (and two million blocks) isn't all that much. Run the spiral transfer rate of your disks (or use the sustained rate on the particular EVA in this case) as limited by the FC SAN speed, and that'll tell you what the upper boundary here is.

Sharding is another option for I/O-limited cases; to have even record I/O go to one spindle and odd to another, for instance.

The basics of any performance effort first involves establishing a baseline, and to then run a series of tests. Tweak one thing within the application, run a test, and compare the results against the baseline. This is a very raw example of the empirical method. (With this sequence, the occasional read I/O while writing would be visible, for instance.)

For general application tuning (and there can be all manner of performance-limiting weirdnesses within the average application), something like the DECset PCA Performance and Coverage Analyzer or similar. The system-wide tools are good, but not as good (or as easy) at spotting hot routines. And I've seen cases where the tool itself tosses reads -- such as getting the next index entry, for instance.

Hein van den Heuvel · ‎10-03-2008

>> Hein, thanks for your explanations!!!!
ItÃ Â´s the kind of things that nobody teach us!!!

There are/were OpenVMS RMS training courses.
Bruden possibly being the best offering today. But I sure many others like Parsec and consultants will do a fine job.
I'm sure even Patrick O'Malley is willing to dust of his offerenings.
Far too few folks take or have taken those courses.

>> And I donÃ Â´t know any documentation that explain it.

The OpenVMS Guiode to file applications makes an attempt.

>> But I have a doubt about read and write cycles that you comment.

Me too! :-).
It's just a guess to start working.
Write times may well be faster (write-back caching).
Read times... from the spindles... may well be slower.

>> RMS must to read the bucket although the record to be inserted was new ???

YES. For pre-allocated relative files.
That's why my very first suggestion was

"You may want to review whether the application actual uses any of the relative file semantics"

But don't ask me... check for yourself.
$SET FILE/STAT
$MONI RMS/ITEM=CACH/FILE=...
(or better still, use my RMS_STATS when the system manager is not lookign or ANAL/SYSTEM... SET PROC ... SHOW PROC/RMS=FSB.

When watching rms stats, be sure to check the locks.
For the larger bucket sizes you'll see few ENQ/DEQ (1 per 12) and 2 bucket lock converts per record put. With 1 block bucket size you'll see 12 times more ENQ+DEQ

RMS Stats will NOT provide timing data, allthough the rates may give yo a clue.

>> In my relative file, there is never update operations, always are insert operations,

You may think that. And that is what it look like. But inreality it is the other way around. There are no puts to relative files. Only updates. ($PUT+UIF)

So why on earth would the application use a relative file with its overhead?
1) possibly someone like chosing the 'go slow' button...
2) probably to make programming seemingly easier.

>> since sequence 1 until sequence 2000000 ( for instance ), and in this ascending order.

Relative files are initialized on creation.
Relative file records are placed in buckets.
They do not live in isolation.
So the system must calculate a target bucket, read that bucket, merge in the record, and write the bucket for any existing bucket.

>> IÃ Â´m sorry if i donÃ Â´t know enough.....

So spend some time (and money?) and fix that!

Thanks, thanks...and thanks....

Cheers,
Hein.

Robert Gezelter · ‎10-05-2008

Bob,

I was a bit busy for the past few days, but noticed that one aspect of this conversation appears to have been skipped: Is it possible to switch the output-only file to a straight sequential file, with the conversion for later processing taking place offline.

Conversion of a sequential file to a relative file on a background basis may be a better option. The combination of multi-buffering, multi-blocking, and deferred write (with a close/re-open) can deal with high write rates.

I would also consider including Hein's comment about closing and re-opening the file, although I would go further and recommend saving the File ID and doing the re-open by File ID, it saves the directory lookup.

I would also suggest checking where on the EVA the "volume" is provisioned. In a traditional disk, I would be questioning whether arm movement is an issue. On an EVA, the question is more complex.

And with the obvious disclaimer that our firm offers services in this area, I echo the thought that calling outside assistance may be a good way to eliminate a degree of uncertainty how how the various aspects interact.

- Bob Gezelter, http://www.rlgsc.com

Wim Van den Wyngaert · ‎10-06-2008

I created a batch job that runs in high prio on a 4100 with 7.3 and dual hsg80 (dus IO completes when controllers have received the io).

It creates a relative file with 10.000 blockss and then writes in dcl 9999 records to it.

I started it with all kind of settings for set rms/block=1|64/buf=1|16/rel
set rms/noglo
set file/cache=no
bucket size=1

and found no big difference (<2%) on any of the settings.

Opening the file /shared=read instead of /shared=write improves cpu consumption by 10% but wall time is 2% higher. No global buffers also makes it run 2% longer.

So, Bob's idea seem a good one.

Fwiw

Wim

Wim

Wim Van den Wyngaert · ‎10-06-2008

Hmmm, did the same operation on a flat file.
Took almost the same time (CPU -15% and wall -2%) but 50% of the IO's were gone.

Wim

Wim

Hein van den Heuvel · ‎10-06-2008

>> Hmmm, did the same operation on a flat file.
Took almost the same time (CPU -15% and wall -2%) but 50% of the IO's were gone.

That would be the reads as I explained earlier.... but those may have come from the XFC cache anyway and might not have beenn real IOs.
Which would be why I asked whether relational file sematics are actually used.

If one were to run a test with anything other than DCL then RMS STATS could be used to show exactly which IOs would be missing.

But IMHO any test on anything but the real system is pretty much useless when boundary conditions are to be tested.

On the other hand, the hard and fast rules, like how many reads and writes to expect for given local-buffer/global-buffer/bucket-size/sharing/pre-allocation settings are valuable even on the slowest possible test system.
That is because RMS IO is entirely predictable and repeatable.
The XFC data would be needed to extrapolate out to the disk IO needs.

btw... RMS called from DCL does NOT use global buffers, ever.

For final timings, only the real system can help.

Cheers,
Hein.

Bob CI · ‎10-07-2008

>>>> Is it possible to switch the output-only file to a straight sequential file, with the conversion for later processing taking place offline. <<<<<

We have others processes that read the relative file in a online environment. Secuential access is not recommended in this case. We use blocking AST and value block to store the new inserted sequence in the file, and the reader knows which new record must to read.

<<<<<< >> RMS must to read the bucket although the record to be inserted was new ??? <<<<<<<

Thanks to hein and Hoff for acclarations... I understood that, applying logical reasons, but I need to confirm that with your expert knowlegements.

IÂ´m gonna begin to run several tests to analyzing the results....

In another kind of things, the data :

Loc Buf Cache Attempt Rate in MONITOR RMS, which is the unit ? ( times per second ????)

Thanks..........

Wim Van den Wyngaert · ‎10-07-2008

May be try multiple processes doing the same work. Some kind of locking will be needed but it could improve performance.
Or multithreading ?

Wim

Wim

Hein van den Heuvel · ‎10-07-2008

>> the reader knows which new record must to read.

So it could do so as a SYSGET from a fixed length record file, or SYS$READ or SYS$QIO or...

>> Loc Buf Cache Attempt Rate in MONITOR RMS, which is the unit ? ( times per second ????)

Times per second. Useless.
Check my version of the rms_stats program, or use $ana/sys... show proc/rms=fsb

< EXQUOTA, Free advice quota exceeded >

Cheers,
Hein.

Bob CI · ‎10-09-2008

>>>>>> Check my version of the rms_stats program, or use $ana/sys... show proc/rms=fsb <<<<<<<<

Great,

IÂ´m taking stats now, with several bucket sizes...IÂ´m taking note about stats with MONITOR RMS and ANAS/SYS SHOW PROC/RMS=FSB

This is my first phase. IÂ´m not running an isolated program with only writes, but iÂ´m running the principal application program that do a lot of things, one of them is RMS writings.

The second phase will be to run a simple program with known number of inserts, and take timings. Finally IÂ´ll make a program that read the same written file, with a random schema of readings, to know the average time in reading a record...... this tell me how many cost a reading....

Because iÂ´m in the first phase, i havenÂ´t isolated data about locking, for example. I understand several screens with MONITOR/RMS, but iÂ´m not understand at all the screen with /ITEM=LOCKING, which are the relations between RMS reads and writes, with LOCK MANAGER resources ??? I suppose that at least we need to lock a resource for blocking access to the current bucket .....

Thanks you all!!!!

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: About perfomance with RMS relative files...

About perfomance with RMS relative files...