Operating System - HP-UX
1833792 Members
2134 Online
110063 Solutions
New Discussion

Maximum Limits for CPU / Memory Load

 
Nagarajan Balakrishnan_1
Frequent Advisor

Maximum Limits for CPU / Memory Load

Hello,

I have been advising that the average Enterprise Loads should not exceed 85% in terms of CPU usage and with limited paging.

Currently, one of our sites is having a cluster running at close to 97% CPU load consistently and with continuous paging for the last 2 weeks. I have predicting that this condition would burn off the system and is not advisable. Since, we have a very good disk subsystem (EVA) and the system is tuned very well, it is still able to complete the taks in time.

Can any one help me find any substantiating evidence that doom is round the corner if not it is already staring at the system? This would help me to persuade them one last time for pushing them to upgrade.

Thanks in advance.

Regards
Baalki
15 REPLIES 15
harry d brown jr
Honored Contributor

Re: Maximum Limits for CPU / Memory Load


Unfortunately it's typically illegal to beat common sense into people.

If your advice if falling on deaf ears, then your only recourse is to do a CYA (Cover Your Arse) by sending everyone and their mother an email stating the obivious and how to correct the situation.

live free or die
harry
Live Free or Die
john kingsley
Honored Contributor

Re: Maximum Limits for CPU / Memory Load

I would say, that as long as you are your processes don't go into swap you will be okay. However, I would ask them whether they ancipate their load continuing to go up. If so, now is the time to begin planning for an upgrade.
Patrick Wallek
Honored Contributor

Re: Maximum Limits for CPU / Memory Load

I have a feeling that your 97% CPU usage is directly related to the fact that you are paging out.

You will probably see the vhand process very active on this system.

If you are paging out I would HIGHLY advise that you add more RAM as soon as possible!!!! You will probably see a fairly dramatic performance increase if you do.

High CPU usage is not a bad thing in and of itself. I mean you DO want to get your monies worth out of that hardware, right? I don't think high CPU usage will necessarily shorten the life of any components as long as it is cyclical. I'm not sure I would a machine to run at 100% all the time though. That would be an indication of a potential bottleneck.
curt larson_1
Honored Contributor

Re: Maximum Limits for CPU / Memory Load

well my question would be what happened two weeks ago that started this condition. And, what was the performance prior to that.

If you can identify that. Then, it can be fixed if it was caused by bad programming. Or, if it was added load placed on the system, then you can somewhat predict what what is going to happen when that much load is added the next time.

as mentioned previously, your system memory isn't configured proprerly or isn't sized correctly for the system as it currently is. Continuous paging shouldn't be occuring if it was. paging requires cpu cycles which increased the cpu load. also processes get blocked on memory requirements causing process to take longer to complete.

your system was probably sized to accomplish certain tasks within a specific time period. you should be able to check and see if the system is still performing within those design parameters. if it is, then it shoudn't be a concern.

But, as pointed out. the system doesn't look like it can take much more. So, maybe you can do a bit of testing to see how much more load it can take. if you can run measureware to gather some stats. it will put a bit of a load on your cpu as well. run an ad hoc backup, ftp a couple of large files across your network. See how these things affect your system. (of course, only do this when it is appropriate).

from this you should be able to guess when the system is going be at it's design limit.
Bill Hassell
Honored Contributor

Re: Maximum Limits for CPU / Memory Load

Something that is important to note: the computer won't burn up in a puff of smoke if it sits at 100% and pages at double digit rates all day. When you reach 100% then things will seem to slow down for compute-bound processes as they share time on the processors.

But paging is a VERY different metric. Ignore completely the page-in rate as it mixes programs starts as well as return-from-swap, and that's not useful. But if the page-out rate is 2 digits or more for most of the day, then *everything* is going to slow down because there just isn't enough memory. Now HP-UX is a virtual memory system so when more memory is needed, other programs are deactivated and their pages are written out to the swap area. This is a kernel task and during the paging tasks, neither the deactivated program nor the new program can run. A good rule of thumb is that excess paging (50 or more) will degrade performance by 100:1 or more.

So before you upgrade, eliminate the paging (get it to single digits) and see how it behaves. If it is still 100% usage *and* the users are happy with the performance, then all is well--but no room for growth. If the users aren't happy, send their complaints along with the upgrade for RAM and possibly additional CPU's (or faster CPUs).


Bill Hassell, sysadmin
curt larson_1
Honored Contributor

Re: Maximum Limits for CPU / Memory Load

Can any one help me find any substantiating evidence that doom is round the corner if not it is already staring at the system?

well, which metrics that are meaningfull depends a lot on what kind of system you have and what the system is used for.

But, sysmptons of a cpu bottleneck would be saturation of cpu, large queues, resource staration, and user dissatisfaction with the system.

cpu saturation maybe indicated by zero idle cpu, a high percentage of user cpu usage or a high percentage of system cpu usage.

as an example, an ideal ratio is 70% user to 30% system usage for a typical OLTP environment (your ideal/typical ratio will depend on the application that your running)

Compute-bound environments may see ratios of 90% user to 10% system usage. And, 50% user to 50% system usage could indicate usch problems as memory thrashing or smp contention.

Other useful metrics are the run queue, load average, and from measureware, the global priority queue. Of course, having a large amount of processes waiting for a cpu to run on isn't good. Also, the system call rate and the context switch rate. The system call rate is an important metric because it reflects the rate of invocation of system services. It is related to system cpu utilization. The context switch rate is an indicator of application contention and kernel contention on smp systems.


curt larson_1
Honored Contributor

Re: Maximum Limits for CPU / Memory Load

the symptoms of a memory bottleneck are the saturation of memory, a large vm queue, resource starvation, and user dissatisfaction with response time.

Saturation is indicated by low free memory, and by process deactivation. From the vmstat command you will see information about memory. avm is the number os virtual memory pages owned by processes that have run witht he last 20 seconds. if this number is roughly the sie of physical memory minus your kernel, then you are near paging. the free column indicates the number of pages on the system's free list. It doesn't mean that the process has finished running and these pages won't be accessed again; it just means that they have not been access recently.

next is the paging activity. the re field shows the pages that were reclaimed. these pages made it to the free list but were later referenced and had to be salvaged. Check to see that re is a low number. if you are reclaiming pages that were thought to be free by the system, then you are wasting valuable time salvaging these. reclaiming pages is also a symptom that you are short of memory.

From measureware, the only queue relating to memory is the number of process blocked on vm. A large vmqueue sustained over time is also indicated by a high percentage of process blocked on vm, as well as large disk queues on swap devices.

other metrics include, page in/out rates, deactivation/reactivation rates, and the number of page faults.

page-ins are normal, even when there is no memory pressure. Page-outs occure only when memory pressure exists. And, deactivations only occure as a last resort when there is severe memory pressure and when the paging system cannot keep up with demands.

resources starvation occures when a high percentage of cpu utilization is used for vm activity, or when the disk subsystem is consumed by vm activity.

And, user dissatisfaction with the system results from poor transaction response time.
Nagarajan Balakrishnan_1
Frequent Advisor

Re: Maximum Limits for CPU / Memory Load

Dear All,

Thanks a lot for your responses. Let me explain the scenario more clearly.

The system load has been growing gradually over the months. I had scheduled an upgrade in the Q1-2004, which was turned down.

To put it in perspective, this member of the cluster is a batch system and runs Oracle. Currently the CPU Load is always 100% for most of the day. (Hence the average is close to 97%) The free memory is close to zero most of the day during typical batch runs. Heavy swapping also happens during those days. It has 12GB physical memory and filled up 45GB of swap space (out of 48GB swap configured!!!) during one of those batch runs.

As I have mentioned earlier, thanks to the EVA, inspite of all the swapping, the batch processes are able to complete in the acceptable time ranges. (Of course, it has been becoming gradually slower. But it is yet to touch the business agreed limits!!)

One more point to add, we keep telling everyone 75% load is what the Enterprise systems to be tuned for getting the maximum efficiency, etc. Can I get some supporting documentation on this?

Warm regards
Baalki
Nagarajan Balakrishnan_1
Frequent Advisor

Re: Maximum Limits for CPU / Memory Load

Hello,

Some Additional information that would help...

It is a 8 CPU Alphaserver system with 12GB RAM. The average memory usage metrics looks like...

Virtual Memory Statistics: (pagesize = 8192)
procs memory pages intr cpu
r w u act free wire fault cow zero react pin pout in sy cs us sy id
19 753 173 720K 643K 177K 5G 151M 1G 169M 283M 16M 3K 41K 11K 59 34 7

As a growing enterprise, load would only increase and my bosses do not seem the understand the urgency in planning an upgrade now, which would take 2-3 months for reaching the implemenation stage.

Thanks.
Regards
Baalki
Bill Hassell
Honored Contributor

Re: Maximum Limits for CPU / Memory Load

The memory and swap info as well as the application details are very important. It sounds like the DBAs are configuring Oracle to use RAM quite heavily. Now the 45Gb swap space in use: is that all reserved, half reserved, half in use, or? As mentioned, once page-outs start hitting 40 to 80 or more, performance will take a big hit. You'll need to double your RAM to reduce the page-out rates.

Start by checking dbc_max_pct in the kernel to see that it is about 5% and dbc_min_pct is 1-2%. If it is 50% (the unuseable default), the system may take a while to push that down (depending on your version and patch level for HP-UX). The DBC should be about 400-600 megs.

As far as 75% (or 80%, etc) CPU usage as a limit, that number is meaningless without more details. 100% is absolutely desirable in a compute-bound process such as math problems like simulations and design. In a database, low CPU usage accompanied with high disk usage is not good either. It means that the database is not taking advantage of RAM or needs more indexes to avoid serial or partial searches and sorts.

A 5-line script can consume 100% CPU...run 50 copies at the same time and your system will look quite busy. Yet, a database will likely run with little degradation due to the way HP-UX schedules I/O versus CPU intensive processes. So using a single number (CPU usage) is not a meaningful metric by itself.

The real measurement is the acceptable response times for the customer and (if required) service level agreements. A lot of improvement can be made with database tuning and minor improvements with kernel changes. Once you've run out of improvements (and the application continues to grow), it's time to change hardware or service level agreements.


Bill Hassell, sysadmin
Bill Hassell
Honored Contributor

Re: Maximum Limits for CPU / Memory Load

Whoops...I think everyone was giving you advice for HP-UX, not an Alpha box. I'm sure all those references to process deactivations, Glance and Measureware were confusing.

The posted values for vmstat seem to inidcate a very high pageout rate, but these vmstat numbers appear to be cumulative. Use vmstat -z (or whatever the option is for your flavor of Unix) and then monitor. Page outs per second (average) is the important metric. Large numbers means severe memory pressure, only fixable with (a lot) more memory.


Bill Hassell, sysadmin
Nagarajan Balakrishnan_1
Frequent Advisor

Re: Maximum Limits for CPU / Memory Load

Bill,

Thanks for the analysis.

I am looking for some supporting documentation that could help me highlight that the overload of the system is not a good thing and to proceed further to avoid a disaster.

BTW, I am providing them with CPU usage / Memory / Paging activities on a daily basis.

Regards
Baalki
curt larson_1
Honored Contributor

Re: Maximum Limits for CPU / Memory Load

as a somewhat rule of thumb
a cpu bottleneck would be characterized as:
consistent high cpu utilization (>90%) and a run queue or load average > 3, and having processes blocked on priority.

a memory bottleneck would be characterized as:
high physical memory utilization (>95%)
significant pageouts (page out rate > 5) or any deactivations (deactivation rate > 0)
vhand process consistently active (vhand's cpu utilization > 5%)
processes or threads blocked on virtual memory
Nagarajan Balakrishnan_1
Frequent Advisor

Re: Maximum Limits for CPU / Memory Load

Curt,

Perfect. We all use these "thumb rules". I am looking for some supporting documentation for the same to present to my management.

Can you help?

Thanks anyway.
Regards
Baalki
curt larson_1
Honored Contributor