1823369 Members
2612 Online
109654 Solutions
New Discussion юеВ

%sys in sar is very high

 
SOLVED
Go to solution
Evelyn Daroga
Regular Advisor

%sys in sar is very high

Ever since rebooting the system a couple weeks ago, the %sys in sar runs very high during business hours (it appears normal after hours). Any answer other than we're just overloading our system? I've noticed some sys processes (psmctd, pwgrd, midaemon) showing up as among the higher cpu users --is that normal? I've stopped/started the psmctd and pwgrd processes, but that doesn't seem to help. Disk I/O (glance) has at times been 100%, but not during the time I took these stats. People are complaining -- any suggestions would be appreciated!

OUTPUT OF SAR IS:
sar 5 20
HP-UX visib B.11.00 U 9000/800 01/10/08
09:23:02 %usr %sys %wio %idle
09:23:07 27 72 0 0
09:23:12 39 60 1 0
09:23:17 35 65 0 0
09:23:22 32 67 0 0
09:23:27 32 67 1 0
09:23:32 43 55 1 0
09:23:37 43 56 0 0
09:23:42 26 74 0 0
09:23:47 41 59 0 0
09:23:52 35 65 0 0
09:23:57 34 66 0 0
09:24:02 35 65 0 0
09:24:07 36 64 0 0
09:24:12 40 60 0 0
09:24:17 51 48 0 0
09:24:22 48 52 0 0
09:24:27 38 61 0 0
09:24:32 31 69 0 0
09:24:37 52 48 0 0
09:24:42 42 58 0 0

Average 38 62 0 0

TOP CPU USERS:
UNIX95= ps -ef -o "pcpu pid user ruser stime time args" | sort -rn | head -10
22.11 15590 lp lp 08:51:44 08:05 quiz auto=/fh_home/jervis/v63yoln/quiz/pa110rrjw.qzs NOLIST
16.99 5887 dmsarka dmsarka 07:47:46 01:16 quick subdict=search auto=/fh_home/jervis/v63yoln/MENUGO.qkg
9.53 18764 clstale clstale 07:54:45 01:38 quick subdict=search auto=/fh_home/jervis/v63yoln/MENUGO.qkg
9.36 17340 oracle oracle 09:21:11 00:15 oracleWEBB (LOCAL=NO)
7.82 19647 root root Jan 4 05:10:10 psmctd
7.60 19150 hikostr hikostr 09:23:30 00:01 quiz auto=/canada_home/jervis/v63yoln/quiz/pa130bld5.qzs NOL
6.37 26983 ccparke ccparke 07:18:29 03:14 quick subdict=search auto=/fh_home/jervis/v63yoln/MENUGO.qkg
6.36 14094 dmmcdon dmmcdon 08:26:35 00:19 quick subdict=search auto=/fh_home/jervis/v63yoln/MENUGO.qkg
6.04 17307 lp lp 09:21:08 00:07 qtp cc=(JERVIS,UNIX,US,ORACLE,LSTRANS) subdict=search auto=/
5.51 29190 root root Jan 4 01:58:06 /usr/sbin/pwgrd

And again:
UNIX95= ps -ef -o "pcpu pid user ruser stime time args" | sort -rn | head -10
15.68 18764 clstale clstale 07:54:45 01:43 quick subdict=search auto=/fh_home/jervis/v63yoln/MENUGO.qkg
14.91 5887 dmsarka dmsarka 07:47:46 01:20 quick subdict=search auto=/fh_home/jervis/v63yoln/MENUGO.qkg
12.94 14094 dmmcdon dmmcdon 08:26:35 00:24 quick subdict=search auto=/fh_home/jervis/v63yoln/MENUGO.qkg
12.84 15590 lp lp 08:51:44 08:07 quiz auto=/fh_home/jervis/v63yoln/quiz/pa110rrjw.qzs NOLIST
11.92 18592 oracle oracle 09:23:04 00:05 oracleWEBB (LOCAL=NO)
11.50 19150 hikostr hikostr 09:23:30 00:06 quiz auto=/canada_home/jervis/v63yoln/quiz/pa130bld5.qzs NOL
10.47 19265 oracle hikostr 09:23:32 00:04 oracleCAN (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
9.45 17340 oracle oracle 09:21:11 00:18 oracleWEBB (LOCAL=NO)
7.42 19647 root root Jan 4 05:10:13 psmctd
6.25 26983 ccparke ccparke 07:18:29 03:17 quick subdict=search auto=/fh_home/jervis/v63yoln/MENUGO.qkg
13 REPLIES 13
Steven E. Protter
Exalted Contributor

Re: %sys in sar is very high

Shalom,

Performance monitor
http://www.hpux.ws/?p=6

Memory leak detector:
http://www.hpux.ws/?p=8

Looks like Oracle and/or autofs is using a lot of resources.

There may be a lot of writes going, creating i/o which the system handles.

You need more data to determine the source of the problem.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Rita C Workman
Honored Contributor

Re: %sys in sar is very high

You probably shouldn't shutdown the psmctd, cause that daemon passes information to another monitoring daemon (psmmon) and then you starting getting other errors. Could even affect MC/SG.

If your midaemon is running high and staying there, then you have a something running that is choking the box.
The fact that your disk is hitting 100%, and I'm guessing by your concern, more often than it rightly should also indicates some process that has run amuck.

It would be too hard to answer based on just this info. Could be tuning of parms is in order; could be poorly written syntax on some job or querry; and so on.

Start digging around. See if your DBA can do some checks on the high hitter processing to narrow something down from the Oracle side. If you have some utilities that can grab some packettes of your high users (like "tusc") that you could look down, you might be able to find something.
Depending on your O/S version you might have other utilities you can use, like pstack....

Just a couple thoughts,
Rgrds,
Rita
Laurent Menase
Honored Contributor

Re: %sys in sar is very high

Hi
First remark, sar cpu stats doesn't give precise data as it is a tick time sample.
, prefer "glance" to have accurate data.
Evelyn Daroga
Regular Advisor

Re: %sys in sar is very high

Thanks for the feedback Steven, Rita and Laurent.
I have run the scripts provided by Steven -- they didn't reveal anything real obvious. No apparent memory leaks, although mem usage is very high. Thanks, Steve, I'll keep them around for future use.

As for disk i/o, it is not unusual for that to be very high ├в even 100%. It has actually been lower throughout the morning, than it generally is. What IS unusual is for the SYSTEM CPU usage to be so high. I have been watching closely via sar, top, and glance ├в all indicate the CPU max'd out (which, also, has not been that unusual on this system), and all indicate the SYSTEM using lots of CPU ├в this is the part that is unusual. If the CPU is max'd out, then so be it. My questions is not "why is cpu usage so high", but rather "why is the SYSTEM cpu usage so high?"

If the users are running more reports than usual, then that would increase the I/O. I understand that I/O requests are "system" requests, and thus add to the system's cpu requirements.

Memory is also an issue on this system ├в has peaked at 100% more than I like. This has caused some paging -- again, increasing the system's cpu requirements.

If these are viable explanations, then ok. I can understand that the system's cpu usage might go up a bit, but it seems excessive to me. I guess that's where my dilemma is.

Thanks for all the input ├в I appreciate it!
Laurent Menase
Honored Contributor

Re: %sys in sar is very high

use glance and look at the time passed in the different syscalls,

Forget sar if you want to make an accurate analyse on CPU usage.
Rita C Workman
Honored Contributor

Re: %sys in sar is very high

Evelyn,

What is your dbc_max% & dbc_min% ?
Can you give us copy of your kernel parms?
Can you give us a copy of your swapinfo -tam?
Can you run for me sar -v 1 20 (my favorite sar command)....?
What is total physical memory on this box?
How many CPU's?
I see your at 11.0 - so we know the version level.

Let's just look down a couple basic parms and see if there is anything that will help.

Also - Oracle - what disks exactly hit 100% utilization? What is hitting those disks? Your disk contention may be resolved by moving things around. You may have too much hitting one disk...like all your oracle logging going to the same disk for example.

Hopefully someone might see something that might help.
Rgrds,
Rita
Evelyn Daroga
Regular Advisor

Re: %sys in sar is very high

Thanks, again, for the reply Rita.
I have attached a txt document with the information you requested. I appreciate your help!
Evelyn Daroga
Regular Advisor

Re: %sys in sar is very high

Thanks also for your reply, Laruent. I have been looking at glance along with sar. The items that jump out with the highest cpu usage are: OPEN, STAT, and LSTAT64.
Evelyn Daroga
Regular Advisor

Re: %sys in sar is very high

Thanks also for your reply, Laruent. I have been looking at glance along with sar. The system calls that jump out with the highest cpu usage are: OPEN, STAT, and LSTAT64.
Rita C Workman
Honored Contributor

Re: %sys in sar is very high

Hi Evelyn,

Your box is light on physical memory, so to start I'd try to add some if you can get it.

Couple things on your parms that I found interesting. Your shmmax is the same size as your maxdsiz, if I read it right.
0x40000000 = 1073741824. I'd tend to set my maxdsiz smaller (even though Oracle wants it higher. Maybe 0x10000000 would help. Other's might disagree, but it has worked for here.
You inode output on sar -v has me blown away. I tend to reduce the size of that parm to far less than what the formula sets it at. And you are running full out. So basically, something keeps spawning new processes requiring an inode and the inode table is out of any available inodes--NO available inodes to recycle.
You need to find what processes is doing that....

Another thing I noticed is I don't see any swap being set up from your swapinfo -tam command. I don't even see lvol2 ! Please double check this........
I was going to recommend adding some disk swapspace, but we need to know that lvol2 is there.

Hope this gives you something to look at,
Rita
Evelyn Daroga
Regular Advisor

Re: %sys in sar is very high

Thanks again Rita.
Yes, the box is light on memory -- we are working on increasing to 5G.

I may look into the MAXDSIZE parameter -- I generally try to stay with what is recommended -- especially since we have 7 db's running here. But I'll give it some thought -- thanks for the suggestion.

As for the inodes -- our application and Oracle each spawn several processes for each user. I don't see anything that appears to be spawning processes more than usual. Maybe I need to increase the ninode parameter to allow for more?

The swap output was my error -- I typed -taM rather than -tam. Here is an updated output:

swapinfo -tam
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 500 0 500 0% 0 - 1 /dev/vgroot/swap
dev 3000 204 2796 7% 0 - 0 /dev/vg01/lvol1
dev 3000 207 2793 7% 0 - 0 /dev/vg01/lvol2
dev 3000 200 2800 7% 0 - 0 /dev/vg01/lvol3
reserve - 4120 -4120
memory 2719 646 2073 24%
total 12219 5377 6842 44% - 0 -


I'm out of the office until Tuesday 1/15 and won't be able to get back on this until then. But thanks again for your help -- I'm always looking to learn more. I'll check back next week.
Bill Hassell
Honored Contributor
Solution

Re: %sys in sar is very high

ninode is way, way too high. You don't need more than 2000 to 4000 as these inode entries are only for HFS files (/stand). The formula is very bad and hasn't been changed since the days of 64MB systems. All values, whether 4,000, or 100,000 will eventually fill the cache. Unlike nproc or nfile, it does not really fill up, it is simply a cache to improve HFS disk activity. The value of 27,808 is just wasting a lot of RAM in the kernel.

The system overhead is being generated by the applications which are making a massive number of calls for system services. You'll need to find the the processes that are using the most CPU and use Glance to look at the system calls. All this tells you is that the application may not be well written and needs patching.

maxdsiz (and maxdsiz_64) are fences or cutoff limits. If a process needs more memory than allowed by the fence, it usually crashes or occasionally reports that more RAM is needed. The max*siz values are for developer environments to prevent runaway programs from locking up the system. In a production system, setting the value too low means processes will fail to run.

The swapinfo command should be run with -tam, not -taM so tou can see all the values. But even so, RAM is a major problem. Using swap space is nothing more than a waste of time, user time in this case. You want only occasional use of swap space. To see how bad the effect is, run vmstat and look at the po column. Better yet, use awk to isolate the value:

vmstat 1 20 | awk '{print $9}'

Single digits are OK (0-9), double digits indicate low RAM and anything over 50 means your system is being crippled by lack of memory. The kernel is deactivating processes then paging them out to the swap area to make room for other processes to run and this is thrashing back and forth. And yes, swapping creates a *LOT* of system overhead. You do NOT want swapping if performance is a concern.

I wouldn't worry at all about the disk layout. Rearranging the disks will provide minimal performance benefits because the main bottleneck is RAM. You may need to go to 8 or even 16GB to get the best performance. The Oracle DBAs would love to use a larger SGA for buffering and other performance enhancements. A multi-gigabyte SGA is not uncommon for high performance database designs.


Bill Hassell, sysadmin
Evelyn Daroga
Regular Advisor

Re: %sys in sar is very high

Thanks for the information, Bill. It sounds like memory is the biggest issue, and is contributing to the high %sys cpu usage. I will change the ninode value from the formula to 4000. Hopefully, that will free up some memory for processing.

vmstat has historically reported the PO column as zero almost consistently. We recently added another Oracle db, and there is additional processing occurring there which has pushed our memory usage over the top. I am now seeing PO spiking to 3-digit numbers, although not consistently. As I indicated in an earlier post, we plan to increase the memory to 5G (from the current 3.5G), by replacing some of the 256M mem cards with some 512M mem cards from our development server (same model).

Thanks for the help!