Operating System - HP-UX
1820605 Members
1800 Online
109626 Solutions
New Discussion юеВ

CPU "0" getting overloaded ..

 
Varghese Mathew
Trusted Contributor

CPU "0" getting overloaded ..

Hi Everyone,

In my production V 2500 server the top command shows the cpu 0 is heavily loaded compared to the other 14 CPU's. I am attaching an output of top command also(this top output is not taken in the peaktime, at peak time CPU "0" will be loaded with 3.70 value).

System is running with HPUX 11.00 and Informix database is running , and the application is SAP R/3 version 4.6. We are facing performance issue on this system, where normally 800 -1000 users log in everyday.

Can anyone tell me is this normal?, and why is it so ?

Thanx in advance,
Cheers !!!
Mathew
Cheers !!!
12 REPLIES 12
Bill McNAMARA_1
Honored Contributor

Re: CPU "0" getting overloaded ..

It's still 99% idle..
There is a thread here to set processor affinity: attached file.

run top -h
top find out which processes have high loads on cpu 0.
It works for me (tm)
Tim D Fulford
Honored Contributor

Re: CPU "0" getting overloaded ..

Do you have processor affinity set within informix?

in Informix do
onstat -g glo to get the PID's of the oninits, then trace which PID is on which CPU. What other processes are running?

If you have 15 CPU's you should not have more than 15 oninits running, if so you may find that there is lots of context switching.

Also what is running on CPU 0. If you have glance, you can run glance -a.

The more info you give on this subject the better

Cheers

Tim
-
Paula J Frazer-Campbell
Honored Contributor

Re: CPU "0" getting overloaded ..

Hi
As Bill points out it is 99.0% idle and 0% user 1.0% system.

At what state was the snapshot from top taken? - if during a normal working day the you system processors are not to blame.

If you have glance installed then run it and watch disk activity and any other areas where bottlenecks may occur.

Or use sar:-

sar -A -o /tmp/sardata 60 480

Which will collect all data once a minute for 480 times (8 hours) and send the output to a file in tmp.

See man sar

HTH

Paula


If you can spell SysAdmin then you is one - anon
Tim D Fulford
Honored Contributor

Re: CPU "0" getting overloaded ..

You could invert this problem, CPU 13 is using 18.8% CPU & doing no user work!

Generally this is not good, (I recon 2:1 ratio in favour of the user CPU) but you do not know what is causing this? It could be a intended large sequential scan, hence is doing alot of sys calls.

The system seems generally quiet, so the above ratio (2:1) may not be applicable. However, there is another possibility...., there could be a bottleneck!!! such that CPU 0 has lots of processes on it waiting on something else (disk reads...?).

I'm going to re-evaluate what I said above. (though it may still be useful). You will need to check if there are any bottlenecks
* Are disks are busy (glance -u)
* If there is any memory/semaphore queues?
* Are any processes being blocked - if so on what.
For all the above I would use MeasureWare or Glance. I'm not too familier with sar, iostat, netstat etc to give flags. If you do want to know what the MW stuff is I'll post later

Tim

-
Bill McNAMARA_1
Honored Contributor

Re: CPU "0" getting overloaded ..

Do you have any cpu's deallocated?
if so - top patch PHCO_22686
uptime - PHCO_21928

check whether these multiprocessor patches are installed. PHNE_24100 and PHNE_23456 and PHKL_24943.

Predictive, if you're running it, could also be a cpu hog culprit, see PHSS_21219.

Later,
Bill

It works for me (tm)
Tim D Fulford
Honored Contributor

Re: CPU "0" getting overloaded ..

If you do not have MeasureWare installed this lot might as well be in acient Greek text for all the good it will do! Maybe some of the others can give generic commands to help bridge the gap?

So here goes

GBL_ALIVE_PROC
GBL_ACTIVE_PROC
GBL_STARTED_PROC
GBL_COMPLETED_PROC
GBL_PRI_QUEUE
GBL_RUN_QUEUE
GBL_MEM_QUEUE
GBL_IPC_SUBSYSTEM_QUEUE
GBL_NETWORK_SUBSYSTEM_QUEUE
GBL_SLEEP_QUEUE

Also if you have found that processes have stopped look at PROC_STOP_REASON

Take a look at Doug Grumann's paper, I was impressed
http://devresource.hp.com/devresource/Docs/TechPapers/UXPerfCookBook.pdf

Phew!

Tim


-
Varghese Mathew
Trusted Contributor

Re: CPU "0" getting overloaded ..

Hi,

Bill, i am attaching another top -h output whaich was taken in peak time.

Tim, We have 18 oninits are running though we do have only 15 CPU's.

Paula, The snapshot which i have sent earlier was not taken at the peak time. I will run the sar .. which you have told to do so.

Tim, Yes there seems to be a bottleneck in disk IO --> especially the root hard disks --> we have one EMC Symmetrix array as the large storage device which is of 1.5 TB size. The root hdd's are installed in a HASS disk array with software mirrored (Mirror/UX)

Bill, except PHNE_23456 none of the other patches are installed in the system, i have just started talking to HP on their recommendations.

Thanx for ur responses in advance...

Cheers !!!
Mathew


Cheers !!!
Sridhar Bhaskarla
Honored Contributor

Re: CPU "0" getting overloaded ..

Hi Varghese,

This is very common on multi-processor systems that we do not see the load equally shared. I would start suspecting if one processor is continuously showing the %IDLE as 0. As long as there is room, I wouldn't worry about the load.

Interestingly, your %sys is around 25. Hard disks cannot be the bottlenecks unless you configured any file system in them that does a lot of disk I/O or your system is doing a lot of swapping. You can do a sar -d 2 20 and observe the disks with %utilization 100 AND %avqueue more than 0.5. Also, check your buffer cache utilization by sar -b 2 20
Your %rcache should be more than 90% with %wcache above 70%.

Also do you have the latest ONLINEDIAGNOSTICS installed?

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Varghese Mathew
Trusted Contributor

Re: CPU "0" getting overloaded ..

Hi,

Sreedhar, I have run the same what you told to do so.., i am attaching the output in a zip format with this reply. Seems there is a bottleneck with Root disk - u can verify that...

Thanx for the help
Cheers !!!
Mathew
Cheers !!!
Juan Manuel L├│pez
Valued Contributor

Re: CPU "0" getting overloaded ..

Hello.
Check if you have the cclogd daemon taking the 100% of 1 cpu.
It is a diagnostics software bug.
You have to install the PHSS 24044 patch.
Please confirm.
Juanma
I would like to be lie on a beautiful beach spending my life doing nothing, so someboby has to make this job.
Sridhar Bhaskarla
Honored Contributor

Re: CPU "0" getting overloaded ..

1) Your root disk utilization is 100% with some queue on them. I wonder what exactly is going on there.I don't think your system is doing a lot of swapping. Check your swapinfo -t to see if there is any swap utilization (other than the field in reserver). Also if you have much memory and if haven't considered already, you may want to turn the swap_mem on. You may need to do some investigation on it like what processes are causing. Looks like there is a lot of reading that is going on as avserv > avwait. You can use glance or gpm to check the details with IO By Disk and then checking in detail. You may also need to check I/O by file system to understand where your problem is.
2)You may want to check the mib2agt and the diagnostics versions. If not of proper version, they seem to cause a high %SYS. High %SYS doesn't necessarily mean that it is due to system daemons though.
3)Your other disks seem to be doing well.
4)Your sar -b output is impressive.

So, I feel you should consider upgrading your diagnostics and the latest patch for mib2agt and see how your system will behave.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Tim D Fulford
Honored Contributor

Re: CPU "0" getting overloaded ..

Hi, sorry for not getting back earlier but ITRC was not letting me in!

Please forgive if I'm teaching you to suck eggs, my verbose flag seems to be permenantly on!

As you know & a few people have said check your root disks (I assume c0t6d0 & c2t6d0) They are having problems as
o excessive service times of 35ms, 8ms or less is normal
o Low throughput 600kB/s, 1MB/s+ is normal

You need to sort out the root disks. (as suggested above). To check your root disk I would look at it with glance and check the filesystems & see if anything leaps out e.g
# glance -i

Also check the root disks if they have any stale extents
# pvdisplay /dev/dsk/c0t6d0 | grep -i stale
# pvdisplay /dev/dsk/c2t6d0 | grep -i stale

If either of these are not zero then you probably have a duff disk. Check it with STM,(xstm, or mstm) or even better log a H/W call with HP.

A few other things, the important stuff is above.
* Are you backing up your logical logs to disk or do they go straight to tape? If to disk then is it in vg00. If so this could be a problem. I was also thinking that this type of problem might occour if yo are errounously backing up or doing logical logs to a non existent tape device say /dev/rmt/lm (that is "L") or /dev/rmt/Om (that is alphabetic "O" not zero). If this happened you would fill up / very quickkly though!

* I would seriously think about reducing the number of oninits (VPs) running to say 14 or 15 VPs. Here, it is better to have fewer busy VPs not context switching than more quiet VPs context switching. Remember that the primary oninit (VP) will take up most CPU (as user informix do onstat -g glo), if you force it to share a CPU the context switching will cause performance problems. This is not your main problem but just some general advice.
* Your root disk is thrashing, why? This is usually due to the fact the head is going from the inside to the outside alot (hence service time is 35ms). Also the time remaining for actually extracting data is substantually reduced, hence throughput is about 600kB/s. I would normally expect to see 1MB/s+ on root disk. (like your EMC's they are lightning fast!). You could have a flakey root disk or controller but I doubt this as both disks are having problems. The usual cause for this is swaping. Do you have multiple swap areas in vg00? If so check their priorities, make sure they do not have the same priority.
# swapinfo -mt

Phew War and peace!

Tim

-