Re: heavy paging

Hein van den Heuvel · ‎04-19-2007

>> o would putting up decram (license cost?) and put a pagefile
on the ram disk work? (we have 2gig of memory.)

That remark might just qualify for an WTF entry.
Yeah, WTF suffegst 'What The F*&^', but is spelled as:
http://worsethanfailure.com/Default.aspx

If you have the memory, just allow the process to use it directly! (WS quotas)

Anyway, I understand that through running out of pgflquota you learned that paging might be an issue, but it does NOT appear to be the defining number for the performance here.

The accountng data, which unfortunately did not contain the also elapsed/cpu time, strongly suggests that DIRECT IO defines the performance. 16 Million IO in 33 hours is about 136 IO/sec.... when evenly spread.
If those are truly random, with single stream (1 batch job) driver) then that's about all you will ge no matter how many disks there are behind it. This is only the case if they are truly random, new READ IOs, meaning no IO cache has the data and none of the many disks is any closer to the target then any other.
With you I expect your storage system to perform better, but this may be all there is worst case.

So now back the test system.
What do the accounting numbers look like there?

Does it have EXACTLY the same database?
If it is 'very close' does it have exactly the same indexes defined.
Do both systems have much the same buffer pool defined for RDB?

If you want to make a real impact on the performance of this job then yo probably need to look at RDB settings, and query tunings, not at OpenVMS tweaks.

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting

Robert Gezelter · ‎04-19-2007

Dean,

I concur with Hoff. My standard recommendation to clients is that if paging is perceived to be the problem, something else is actually the problem.

Certainly, increasing the working set limits (and the corresponding page file quotas) to values more in line with physical memory is a good idea.

Running data collection using T4 is always a good idea. Detailed review of what this data shows is also a sound idea. (Disclosure: We do perform this service for clients).

- Bob Gezelter, http://www.rlgsc.com

John Gillings · ‎04-19-2007

Dean,

On a 2GB system WSEXTENT=64000 is *tiny*. That's only 31MB. Less than most people's wristwatches. Fortunately AUTOGEN has overriden WSEXTENT to something more reasonable.

Check the VIRTPEAK for the process to get an idea of how large it wants to be.

However, even with low WSEXTENTs, OpenVMS is quite good at handling large virtual address spaces efficiently. In a recent capacity test on a system with 4GB and WSMAX at 1.5GB, a process with 2.5GB of virtual address space, reading through it all several times, sustained a (soft) fault rate of >100,000 per SECOND for more than 5 minutes. Amazing! They were all modified list faults, in effect OpenVMS was using the modified page list as an extension of the working set of the process. In comparison with your job, we were experiencing double your TOTAL number of pagefaults for your entire 33 hour job every second!

Soft faults are cheap. Eliminating them is easy (just increase process working sets), but is unlikely to return a significant performance improvement.

>put a pagefile on the ram disk work?
>(we have 2gig of memory.)

Direct answer - NO! This doesn't make sense! The idea of a page file is a place to put stuff that doesn't fit in physical memory. Putting the pagefile itself in physical memory is like having too much stuff in your garage, so you build a shed INSIDE the garage to to put the excess in. Can you see why that won't help?

What MIGHT work would be to build a RAM disk and put the DATA FILES on it, to reduce the cost of all those direct I/Os.

The other consideration, you haven't said how much CPU time the job took. I can't see how those paging and I/O stats can account for 33 hours. The CPU usage will give you a lower limit to the run time. If it's high, you should be looking at the code to see if there are faster algorithms to achieve what you're trying to do.

A crucible of informative mistakes

Dean McGorrill · ‎04-20-2007

Hi John,
its been a dozen years since I worked with this stuff. tuned up our build system
and then it was coding thereafter..
the CPU was 5 hours and some change for
both the production and test system. From
accounting..

Peak working set: 535072

that looks like it blew past its wsextent
limit anyway. also from uaf, my account wsextent..

WSextent: 16384
$ sho work
Working Set (pagelets) /Limit=5936 /Quota=16384 /Extent=786432
Adjustment enabled Authorized Quota=16384 Authorized Extent=786432

it says I have what wsmax is set to. (?)

>What MIGHT work would be to build a RAM disk and put the DATA FILES on it

thats an idea! anyway i've upped the quotas
and raised wsinc for a test run.

Colin Butcher · ‎04-20-2007

Hello,

It would be well worth your while getting some data using T4 - then you can see what's really going on in terms of IO, paging, file opens, locking, lock migration between nodes, network traffic etc.

Can you provide some configuration data too - machine type, VMS version, disc subsystem etc. as well please?

It's also probably worth looking at XFC usage as well - 2Gbytes isn't that much memory, so with some RDB tuning and some VMS tuning, combined with extra memory - you may see a decent difference. It all depends on the workload and how the RDB job functions.

You may find that some small changes to the way the RDB job is written can provide some big changes. Sometimes relatively small changes to code can provide big wins. Data from something like T4 will give you a starting point for a thorough investigation.

Cheers, Colin (http://www.xdelta.co.uk).

Entia non sunt multiplicanda praeter necessitatem (Occam's razor).

Volker Halle · ‎04-20-2007

Dean,

please consider using T4 to collect performance data on both systems and start collecting the data NOW, before you change lots of parameters.

TLviz (the T4 data visualizer) contains marvellous features for comparing performance data in a before-after analysis.

If the accounting data shown is from the ONLY process involved in this 'large RDB job', then the direct IOs seems to be the major factor. T4 will also give you system-wide performacen data and you may be able to easily 'see' other factors influencing performance.

Volker.

Dean McGorrill · ‎04-20-2007

hi Colin, the cpu & disks are described a few posts ago.

Volker,
today is the first time I'm watching this running live, and its not what I thought it was, paging. though its using up pgflquota. using the normal tools, monitor sda etc. I've yet to see a pfw. its all direct i/o as most have said. we slowly
increase ws, even though I set wsinc to 48k (?)

they have about 20 files all on one
disk open. I've caught a few that are busy
with sda sho proc/chan. is there a tool
to show what files are getting hit the
most?

t4 sounds like a very useful tool, I found the kit but this old 7.2-1 system I don't think product can read a compressed pcsi file. is there a uncompressed one out there?
If I find out the hot files, then splitting
them out to other spindles should help.

Jim_McKinney · ‎04-20-2007

> is there a tool to show what files are
> getting hit the most?

You've got at least one - SDA.

$ analyze/system
SDA> read sysdef
SDA> show proc/chan/id=xxpidxxx
SDA> ! use the addresses in the window column
SDA> ! and note the wcb$l_reads and wcb$l_writes
SDA> format/type=wcb xxwcbxxx

Hein van den Heuvel · ‎04-20-2007

Dean,

You indicated it is an RDB application.

So, for now, forget about tuning OpenVMS! just ask RDB where it hurts!
It can tell you which files are busy,
which files are waited for most...

Poke around with RMU
Look as RMU> SHOW STAT/STALL

Check the "Rdb7 Guide to Database Performance and Tuning"

No point in speeding up an IO which should not be done in the first place!

Give some more memory to the RDB caches!?

Use SHOW MEM/CACH=(TOPQIO=20,VOLUME=...) for a simple, cheap, OS hotfile list.
... if you have XFC caching going.

Or like you did, do a MONI CLUS or MONI DISK/TOPQIO during the run and see the top busy disk(s).
Now SHOW DEV/FILE for a first impression... a file must be open to be busy!

Having said that, if like you seem to indicate all the open file for the job are on a single disk then you may want to address that first without even before learning more about the load.

Personally, I like the SAME approach. Stripe And Mirror Everything.
'Don't worry your pretty head' about which file is busy or how to exactly balance the IO. Just brute-force spread it out if you can! This is trivial (default) on the EVA specifically, but straighforward on the HSZ70 as well allthough you can not go as wide.

An other this to look at his that HSZ.
Run the display there and watch which disks are being hit. This will also nicely give a read-write ratios and HSZ cache information.

hmmm.... are the HSZ batteries on the production system working properly? If not, the HSZ will disable the write-back caching and WRITE performance will suggest tremendeously. That can easily explain several hours.

Cheers,

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting

Dean McGorrill · ‎04-20-2007

tx Jim,
hot files identified.
tx Hein,
yes the cache batts are good and
I will chat with our dba about rdb tuning.

dean

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: heavy paging

Re: heavy paging

Re: heavy paging

Re: heavy paging

Re: heavy paging

Re: heavy paging

Re: heavy paging

Re: heavy paging

Re: heavy paging

Re: heavy paging

Re: heavy paging