1832912 Members
2541 Online
110048 Solutions
New Discussion

CPU 100%

 
SOLVED
Go to solution
Bill McNAMARA_1
Honored Contributor

CPU 100%

if cpu is at 100% what would i see in the syslog.

Also if memory was at 100% what would I see.

in terms of applications trying to spawn processes.

I know this has been mentioned before, but is high CPU utilisation a good measure of application benchmarking or should
I look elsewhere also.

Thanks,
Bill
It works for me (tm)
12 REPLIES 12
James R. Ferguson
Acclaimed Contributor

Re: CPU 100%

Hi Bill:

I wouldn't necessarily expect to see or not see something informative in /var/adm/syslog/syslog.log.

I'd be more interested in things like load average ('uptime') and swap utilization ('swapinfo -tam') along with key process table states ('sar -v 5 10') for first views.

I'd also look at 'glance'. Do you see lots and lots of process births and deaths?

Regards!

...JRF...
Roger Baptiste
Honored Contributor

Re: CPU 100%

<>

Unless there are errors or alerts, it shouldnt show anything related with CPU usage.
I have systems which consistently use 100% CPU, because the applications which run on the box hammer the CPU's , thanks to the code; and
the syslog does not show anything related to the 100% usage.


<>

Same holds for memory too. Unless there is a alert or panic or failed allocation, nothing shows up in the syslog related to memory usage.

<>

That can be seen through trace or in glance
or in sar. Only when the processes hit the limit, it will show up as a nproc error.

<I look elsewhere also. >>

Are you referring specifically to the individual process CPU utilization or
overall CPU utilization. If it is overall CPU utilization, make sure to see how much it is being used by "user ' and how much is by "system" and how much by "I/O".

Regarding process CPU utilization, to get a larger picture, drill it down and see what sort of system calls it is making. How much of the CPU is going into I/O. How long is its WAIT stages (on what is it waiting?). Is it waiting on other processes (PIPE). The point is find the basis for the heavy CPU usage.
For instance, in some of my CPU hogging applications, they use it mostly for heavy sorting.

HTH
raj
Take it easy.
G. Vrijhoeven
Honored Contributor

Re: CPU 100%

Hi Bill,

1. what do you see in the syslog?
this depents on the configuration, you can use the logger command in combination with monitoring tools to write messages in the syslog.


2. other things?
It could be nice to now how semaphores, kernel parameters etc. are used (glance) if the reach limits before your system is at 100 %.

Hope this will help,

Gideon
Solution

Re: CPU 100%

100% CPU utilisation isn't necessarily a bad thing (although more often than not it is!) The thing to look at when CPU utilisation is 100% is the run queue length, which you can get from uptime or sar -q. If there are lots of processes that could be doing work ifthe CPU weren't maxed out then that is an issue...

Some applications run in tight loops doing non-blocking read operations (badly written tty based apps can do this)- this can generate a lot of CPU utilisation which isn't necessarily a problem.

I am an HPE Employee
Accept or Kudo
Bill McNAMARA_1
Honored Contributor

Re: CPU 100%

I don't see anything yet..
I'm reporting on someones doc doing a benchmark on an application and they've just mentioned cpu usage as the system benchmark.. ie it'll work fine up to the point where cpu usage is at 85%

although cpu usage on it's own doesn't mean anything, I'm trying to figure out, why they're coming out with stats saying 98% of file record processes work fine at 85% cpu. Now if cpu is stably at 85% then the fact that the files didn't record is not a problem of CPU, but of disk or process failure etc..
and I'm wondering if there would be a signature such as can't fork process due to memory bottleneck..

Thanks,
Bill
It works for me (tm)
James Beamish-White
Trusted Contributor

Re: CPU 100%

I agree with the above, in that I would not expect a problem to appear in the syslog (unless of course you are using some cron'd scripts to logger if a sar test alert goes off).

Something you might see could be:
proc: table is full (need to increase nproc in kernel params)

If you are asking these questions for figuring performance of apps on a system, maybe you should write some scripts to test and log CPU util, memory usage and I/O, run them in the background then perform your tests. Otherwise get something like Rational's Purify.

Cheers,
James


GARDENOFEDEN> create light
Darrell Allen
Honored Contributor

Re: CPU 100%

Hi Bill,

Duncan beat me to the punch about the run queue length (also known as load average). You can find it in glance also. Generally, you want the average run queue length less than 3.

CPU utilization by itself is not very accurate for indicating where the bottleneck is. Memory paging, disk bound i/o, and run away processes can all result in 100% cpu usage.

The Performance and Tuning class is a good one if you get the chance to take it. A very short summary from the class is:

1. If memory utilization > 95% with much paging then there could be a memory bottleneck
2. If disk utilization is > 50% and there are disk i/o requests in the queue then there could be a disk bottleneck
3. If cpu utilization > 90% and there are processes in the cpu run queue then there could be a cpu bottleneck

All systems behave differently and as usual, you should have a baseline to compare to.

Darrell
"What, Me Worry?" - Alfred E. Neuman (Mad Magazine)
Frank Slootweg
Honored Contributor

Re: CPU 100%

As James mentioned, I would not expect to see or not to see messages in the syslog.

As to the 100% memory utilization: Above 95 or 96 %, I would expect to see paging or/and swapping. I understand you have Glance, so you could look at its "Memory Report".

As to a CPU utilization of 100%: That can be quite normal.

Generalization: A 'perfect' system would be at 100% CPU/memory/disk utilization all the time. I.e. the system is doing what you payed it to do. Whether the *users* would be impressed with the *response time* is another matter, but the *system* is doing what it should. (For HP-UX: 95 or 96 % memory utilization, not 100.)

"in terms of applications trying to spawn processes."

Do they *want* to spawn processes, but can not or it is not happening, or *are* they spawning processes and that is the problem?

In any case, like James implies, if there is still *swap* space and process table entries, then processes can be spawned. It may require some paging/swapping, i.e. time, but eventually it will happen.

John Bolene
Honored Contributor

Re: CPU 100%

The CPU at 100% can be a good thing or a bad thing.

What is normally used is transaction response time. The TRT will normally start growing when the CPU reaches somewhere around 85% and will grow exponentially starting around 90% CPU.

Notice I said normally. It all depends on the code. If a lot of IO's are being done, you will have an IO bottleneck unless you have the "HOT" files in memory or SSD. If you have an IO bottleneck, there is normally not any way to get to 100% CPU.

TRT is a much better metric to use because that is what the user sees.

The "BEST" metric is one that runs on the users machine. The total TRT can be measured that way. It may depend on the resources are available on the user end and also network resources, not just the resources on the end server.
It is always a good day when you are launching rockets! http://tripolioklahoma.org, Mostly Missiles http://mostlymissiles.com
Bill McNAMARA_1
Honored Contributor

Re: CPU 100%

how do I measure the TRT?
It works for me (tm)
John Bolene
Honored Contributor

Re: CPU 100%

Another tough question. In an HPUX environment, I do not know as I have not had to answer that question yet. I say yet, as the processing on my UNISYS mainframe is being gradually moved to TANDEM(UNIX sorta machine) and HPUX.

I did write an application that interfaced with the PC UNISYS emulator that sent a specific transaction to the UNISYS and the response time was measured on the UNISYS and sent back to the PC. The PC then analyzed how much of the time was not related to the mainframe time and sent that data back to the mainframe to be included in a response time database. A TRT report was run once a week to show what the response time was at the user level in several around-the-world locations.

This report was done to eliminate false reporting of user slow downs that were reported to be because of slow response time from the mainframe. 99% of the response slowdowns turned out to be network related and not mainframe related.

You can get a general knowledge of TRT in HPUX from looking at perfview reports of applications that have included transaction tracker metrics. But this again, will only be from looking at the server, not from the user's point of view.
It is always a good day when you are launching rockets! http://tripolioklahoma.org, Mostly Missiles http://mostlymissiles.com
Roger Baptiste
Honored Contributor

Re: CPU 100%

Bill,

Ok, If you want to have clear indicators on CPU bottlenecks (rather than just process cpu utilization , as i earlier interpreted ):--
than yes, 100% CPU utilization is not an essential confirmation of a bottleneck. It can be a bottleneck , only if
both GBL_CPU_TOTAL_UTIL is 100% and GBL_PRI_QUEUE is greater than three, consistently.

PRI_QUEUE is a better indicator of cpu bottleneck than RUN_QUEUE, since runq includes i/o related processes or processes blocked on I/o also as runnable, whereas pri_queue lists the processes which are purely blocked on cpu priority and would have been running otherwise.

You can get these values using
the extract command of measureware. The template file is /var/opt/perf/reptall.

HTH
raj
Take it easy.