Operating System - OpenVMS
Showing results for 
Search instead for 
Did you mean: 

SMP Load Balancing

Go to solution
Jack Trachtman
Super Advisor

SMP Load Balancing

GS1280 15 CPU 64GB
900 processes (VMS + application + Oracle 9i)

Attached is an ECP report showing CPU usage
per CPU averaged over a 3 hour period (8-11am).

Question: can someone explain why there is a
35% spread between the lowest CPU utilization
and the highest? We are having response time
problems during spikes in application usage.
I'm guessing that if the CPU usage were more uniform, the spikes would be handled better
(though I may be wrong about this).

We would rather not buy more (very expensive)
CPU modules if there is some way to tune the
system. TIA
Hein van den Heuvel
Honored Contributor

Re: SMP Load Balancing

Simple. Openvms schedules a runnable user task with no established affinity on the first idle cpu starting from the high numbers.

The thought/plan is that the lower cpus as (semi) reserved for interupt handling (Istk) and lock manager as needed.

Looks like a nicely balanced, busy, system to me. With this as average it does not surprise me to hear that during peak soem response time problems happen, but I suspect that during those peaks all CPUs were gainfully employed, otherwise you'd never see user time on the low CPUs, and you do.

A more fine-grained picture (time wise) might help here. T4

The bulk of the CPU is spend in user mode, so any tuning would have to happen there to have an effect. Even if you magically could tune all kernel mode away, then you'd still only made a minor impact. Still, what is believed to be responsible for the kernel time? QIO, Scheduler, Locks, Logical names?

How is the gut feel on the Oracle tuning?
How much (percentage) of the time goes there?
Have folks been looking at statspack, high-get queries and such? Excessive spinning? That coudl cause an IO or lock bottleneck to look like a cpu shortage.

You might want to set some affinity to heavier hitting oracle processes to lower CPUs to keep those processes a little out of the scheduling picture (LGWR, DBWR, MON,...)


Hope this helps a little,
Hein van den Heuvel
HvdH Performance Consulting.
John Gillings
Honored Contributor

Re: SMP Load Balancing


> I'm guessing that if the CPU usage were more uniform, the spikes would be handled better

Unlikely. CPU isn't really "load balanced" in the same way as (say) network traffic. If a process is computable and there's a CPU available, it will execute. In VERY broad terms, the more CPUs you have, the more likely one will be available when a process becomes computable.

If we assume that all compute processing is independent, you can scale linearly with CPUs. However, that's a bad assumption. Most processing activity will involve access to shared resources, which requires interlocking, and therefore synchronization between CPUs. This will reduce performance scaling.

The other effect is to do with the distribution of demand for CPU. If a CPU is idle, but there are no computable processes, it will remain idle. Thus, if you have numerous processes all waiting for the same event, they may all become computable at the same time. If there are more processes than CPUs, then some will have to wait. So, even if the total demand for CPU over some period of time is less than the total available compute resource, it's still possible that spikes in demand will mean less than perfect utilization.

As Hein says, you can force some processes to use specific CPUs using affinity, but really all that can do is improve performance for the specific process by (possibly) guaranteeing access to a CPU when required. The effect on overall system performance is only likely to be negative, because it reduces the system's choices. There may be special cases where giving a "key" process preferred access to a CPU can help smooth out CPU demand, but there aren't any generic methods for identifying such a circumstance.

Learn your workload. Decrease the granularity of your samples to see if you can identify patterns.
A crucible of informative mistakes