Re: SYSGEN parameter PE1, LOCKIDTBL, NPAGEDYN

Clark Powell · ‎02-09-2010

We're running GE Flowcast on Intersystems Cache 5.0.21 on GS1280's with 44 Gb of memory.
The two GS1280's were more a historic accident than a requirement; we were running on 16 Gb ES 45's. So we are in a processor and memory rich environment. We have 2Gb fibrechannel HBA's and EVA 5000's and it may be that our real deficit is disk IO bandwidth. Recently, as a general performance improvement, it was recommended to us to change PE1 from 0 to 50000 and LOCKIDTBL & NPAGEDYN to some unspecified higher value.

LOCKIDTBL parameter information:
Feedback information.
Old value was 1607680, New value is 1669632
Current number of locks: 1699831
Peak number of locks: 1732608

So we could change LOCKIDTBL to 2000000 from 1600000 and NPAGDYN to 100 Mb from 66 Mb

To me, the NPAGDYN seems to be sufficient as is.
Nonpaged Dynamic Memory (Lists + Variable)
Current Size (MB) 66.75 Initial Size (MB) 66.75 Maximum Size (MB) 288.25 Free Space (MB) 25.92
Largest Var Block (MB) 4.65 Smallest Var Block (bytes) 64
Number of Free Blocks 195987 Free Blocks LEQU 64 bytes 23
Free Blocks on Lookasides 195946 Lookaside Space (MB) 21.01

Obviously the memory involved will have no cost since we have more memory than we will ever use. So the question is will these increases have any positive or negative impact? Any opinions out there?

Hein van den Heuvel · ‎02-09-2010

I'm with you.
You are not going to see any improvement.
If you do, you could probably already see hints with Kernel profiling, or SPIN LOCK TRACING.
Any significant MPsync burn? Try @SYS$EXAMPLE:SPL !

If you want to burn a CPU, start the dedicated lock manager. That may well provide and overall speedup, and a nice increased lock activity visibility in T4.... assuming you do any significant locking rates at all!

"it was recommended to us to change PE1 from 0 to 50000"

Surely that recommendation came with an explanation as to WHY and what (statistics) to watch to check its effect? Please share.

"it may be that our real deficit is disk IO bandwidth."
Such hunches are often extremely valuable, and founded in real world knowledge/observations. But the challenge is often to exactly explain and quantify.

What do you have so far?
What made you think in that direction?
What do the EVA's report?
What does OpenVMS Report? (T4, XFC, ...)

What kind of IO is taking place? Database (RDB, ORacle,...)? Flat files? RMS Indexed files?...

Ever found/used a hot-file report to help focus?

"we have more memory than we will ever use."
Did you give generously to the XFC?

Hope this helps some,

Hein van den Heuvel
HvdH Performance Consulting

Clark Powell · ‎02-09-2010

Sorry, forgot to mention the version OpenVMS 8.3 with update v12

Hoff · ‎02-09-2010

Quantify your load and get a baseline, and then start after various parts. Know which files and which locks and which disks are getting pounded on by the applications.

If you think that locking or I/O is a limit here (and those can be very reasonable guesses), then look to speed up your I/O path through a shadowed RAM disk or other such, or with up-rating your SAN storage to a faster HBA or to faster DAS storage. (If you can find this hardware; the Alpha boxes tend to lag on available I/O options.) And find which locks are hot. And which files are hot. Where the applications are spending all the wall-clock.

And FWIW, forty-four gigabytes of physical memory isn't all that much memory capacity these days. That's less than a half-populated 1U server can provide.

John Gillings · ‎02-09-2010

Clark,

LOCKIDBL and NPAGEDYN are allocation parameters. They change the sizes of data structures. This will only help if those data structures are currently undersized. On the other hand, it will only hurt if allocating resources there will negatively impact somewhere else.

From what you say the structures are not undersized, and there's no resource shortage, so any (reasonable) allocation changes are likely to be benign. Worth doing? Your call.

PE1 may be a different story. It's a behavioural parameter. It controls the migration of lock trees between systems. It's also a "reserved to HP" parameter, so the general rule is "don't touch without very good reason".

In V8.3, lock tree migration isn't anywhere near as much of an issue as it was in earlier versions. Historically, it was possible for large lock trees to "flap" between systems, and migration was rather inefficient, involving the transfer of individual locks. The result was large clusters could experience "pauses" of up to several seconds while a large lock tree moved back and forth between nodes in response to changes in demand. PE1 was the hack used to prevent migration in environments where it was a problem (which were relatively rare).

Lock tree migrations are now much better optimized (large batches of locks) and flapping has been damped down by better heuristics. In general, it's a GOOD idea to migrate lock trees between nodes in response to demand, as local lock traffic is orders of magnitude faster then inter node traffic. As long as you don't notice them, they're not a problem.

So, in your environment, before you mess with PE1, I'd want to see compelling evidence that lock remastering was a performance issue. See MONITOR RLOCK. See also MONITOR DLOCK to see if there's excessive cross node traffic (which is what remastering is trying to reduce).

Even if you find remastering events, they're not necessarily bad. If they're a worry, you may be able to reduce them by moving your workloads around, concentrating processes working with a particular data base on one node to keep the lock traffic in one place.

Remember, if there are "golden" parameter settings which magically and blindly improve performance, they will be default (well, eventually anyway...)

A crucible of informative mistakes

Thomas Ritter · ‎02-09-2010

My 2 cents on pe1. Leave it alone. Let the cluster manage its own lock remastering. Our site at peak times had some 4 millions locks and enq/deq of about 65,000 /s. Lock remastering under those conditions caused database performance problems. In our 4 node cluster we even suffered from "lock tree bounce". We ended up managing pe1 by disabling it during business hours and then enabling outside of business hours.

Changes to PE1 should only be based on what you see and measure at your site.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: SYSGEN parameter PE1, LOCKIDTBL, NPAGEDYN

SYSGEN parameter PE1, LOCKIDTBL, NPAGEDYN