1833980 Members
1904 Online
110063 Solutions
New Discussion

Possible CPU Bottleneck

 
Charles McCary
Valued Contributor

Possible CPU Bottleneck

Group,

Hi - here's the scenario:

We have a vendor product running on an HP N-class with 8 processors. When their process runs, it spawns parallel processes (8) and it basically takes up 100% of all processors during the time of it's running.

The runq is occupied about 85% of the time and is always aroun 1.5-2.5 in length.

Of course, the h/w they recommended was much LESS than the N-class and was supposed to handle the processing it needs to do in less time.

It is now running way too long and I want to make sure that I haven't missed something critical to help speed this up from an admin perspective. Can you guys think of anything I should check? I'll supply more info. if you need it to offer an opinion.

tx,
c
26 REPLIES 26
Victor BERRIDGE
Honored Contributor

Re: Possible CPU Bottleneck

Hi,
Difficult to give an advice like that...
If you have glance then use it to see whats happening, if you dont have it - Get an trial version (a time limited I think 60 days).
What is the product supposed to do?

All the best
Victor
Helen French
Honored Contributor

Re: Possible CPU Bottleneck

Hi Charles:

Some points to check:

1) Kernel parameters. Check all parameters and increase/reduce if needed. Reduce the dbc_max_pct value, if it is set to 50 ( default ). You can try 15 or 20.
2) System patch level.
3) Check system usage with GlancePlus and find out if anything wrong.

HTH,
Shiju
Life is a promise, fulfill it!
Paula J Frazer-Campbell
Honored Contributor

Re: Possible CPU Bottleneck

Hi
Do you have glance installed ?

If not there is a 60 day trial version on the apps cd.

Glance will allow you to track what is going on.
If it is slow running then check memory/swap usage and also disk activity by disk.


Sar is also a useful tool see man sar.


HTH

Paula

If you can spell SysAdmin then you is one - anon
Trond Haugen
Honored Contributor

Re: Possible CPU Bottleneck

what is the HP-UX version? I know there is a bug in 11i showing a large load average on even a idle system. It's fixed with PHKL_24551.
Are the users reporting bad performance? Is most CPU time spendt in user or system mode?

Regards,
Trond
Regards,
Trond Haugen
LinkedIn
S.K. Chan
Honored Contributor

Re: Possible CPU Bottleneck

You need glance to do this. At the surface you can quickly check if you CPU is used more by the "system" or "user". If more of the CPU time is used by the user (ie outside the kernel), then it's generally a good thing, still you have to determine that it is running the correct user program.
Run glance and at the main process list window, enter SHIFT-?, that gives you a list of things you want to look at. Enter "c" for CPU report.
Now in that window, look at "User","System" and "Idle", or better still post the snapshot of this glance, so we could look at it. What you want to see is the "user" utilization should be higher than that of "system" which simply could mean the CPU is well utilized. If you got more % of CPU used by "system", we'll have to drill further.
Charles McCary
Valued Contributor

Re: Possible CPU Bottleneck

Group,

Hi - I do have glance and the system utilization is way under user utilization:

Not sure how this is going to post, but here's a snapshot:
State Current Average High Time Cum Time
--------------------------------------------------------------------------------
User 86.8 82.3 88.9 6.96 6011.97
Nice 0.0 0.0 1.6 0.00 2.98
Negative Nice 0.1 0.1 1.3 0.01 5.44
RealTime 1.0 1.0 1.6 0.08 69.86
System 9.7 9.2 16.2 0.78 672.97
Interrupt 0.9 0.6 3.1 0.07 40.21
ContextSwitch 1.0 0.9 1.3 0.08 68.13
Traps 0.1 0.2 0.4 0.01 11.95
Vfaults 0.0 0.1 4.0 0.00 5.44
Idle 0.4 5.7 98.7 0.03 417.24


The main problem is that they're program is running much longer than they promised, there's not really a performance lag during the time that it's running, or at least the user's don't seem to complain.

tx,

Charlie
Charles McCary
Valued Contributor

Re: Possible CPU Bottleneck

Also, running HPUX 11.0
Bill McNAMARA_1
Honored Contributor

Re: Possible CPU Bottleneck

I would say yes, but, you are buying the CPU to use 100% of it aren't you?

Although it's from the hp java site, this analysis flow is probably what'll get to the nitty gritty of it:

http://h21007.www2.hp.com/dspp/tech/tech_TechDocumentDetailPage_IDX/1,1701,1618,00.html

Later,
Bill
It works for me (tm)
Jeff Schussele
Honored Contributor

Re: Possible CPU Bottleneck

Hi Charles,

Please note that patch PHKL_24551 (11i only) has been superseded by PHKL_25389(11i only) - see:

http://us-support.external.hp.com/wpsl/bin/doc.pl/screen=wpslDisplayPatch/sid=46a2f2481b40fd883f?PACH_NAM=PHKL_25389&HW=s800&OS=11.11

Although looking at your glance output I would:
1) Question whether the code may be running away
2) Question what kernel parameter settings and/or patches the developer need or recommend.

because it's almost all wrapped up in user time - >10 % in system. You might want to start looking deeper at what's happening by using the ipcs command or the lsof utility. And don't discount the possibility af corrupted or even "crap" code or libraries.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Sandip Ghosh
Honored Contributor

Re: Possible CPU Bottleneck

Can you provide some more data about the system?
Amount of total memory
swapinfo -tm output
Kernel Parameters
sar -b 5 10
top 5 process from top

Sandip
Good Luck!!!
Charles McCary
Valued Contributor

Re: Possible CPU Bottleneck

 
Charles McCary
Valued Contributor

Re: Possible CPU Bottleneck

Also, here's the glance syscall info for one of the processes:

System Calls PID: 24642, accumat2 PPID: 24628 euid: 501 User: mibas
Elapsed Elapsed
System Call Name ID Count Rate Time Cum Ct CumRate CumTime
--------------------------------------------------------------------------------
read 3 2064 430.0 0.22196 1150407 696.8 208.19893
write 4 370 77.0 0.00641 251975 152.6 5.04829
open 5 1693 352.7 0.07188 907305 549.6 28.53180
close 6 1693 352.7 0.02098 907304 549.6 12.71636
time 13 1693 352.7 0.00572 907305 549.6 3.51548
lseek 19 1693 352.7 0.01586 908630 550.4 6.13265
stat 38 1694 352.9 0.03282 907305 549.6 18.53448
ioctl 54 1694 352.9 0.02062 907305 549.6 7.81397
ulimit 63 28506 5938.7 0.13937 14394366 8719.6 55.12364
gettimeofday 116 1120 233.3 0.00691 548530 332.2 5.48679

Hai Nguyen_1
Honored Contributor

Re: Possible CPU Bottleneck

Charles,

I think now it is time to turn to the vendor's application. Does it have any tunable configuration files which, for example, specify how often a program runs, how many processes should be spawn...

Also, a good application should generate some meaningful log(s) which you can look into to find out what it is doing...

That's my 2 cents.

Hai
Charles McCary
Valued Contributor

Re: Possible CPU Bottleneck

Hai,

thanks, yes as an aside, this vendors product can run in several different configurations. It can run with 1 process all the way up to 64 processes. We have run with many different configurations and finally decided that 16 was giving us the best performance. I'm beginning to suspect you are correct though, I don't see any obvious system problems (and no one else does either), so it is time for the vendor to speed up their program.

thanks,

Charlie
Charles McCary
Valued Contributor

Re: Possible CPU Bottleneck

Someone above requested the following info:

8 550 mhz processors
4 gb memory
Jeff Schussele
Honored Contributor

Re: Possible CPU Bottleneck

Hi Charles,

Well, nothing stands out to me on your tunables & overall setup.
You're using some swap space but I suspect that it's being reserved by all the instances of accumat2. And my, does that app know how to really "take over" a system?
I would hesitate to run anything else on this system w/o throttling it back some - either less processes or higher nice values.

So I guess my final response would be - unless it's response times start tanking OR you run out of memory or swap - let it rip. Unused CPU cycles are just that - unused. You paid for them - might as well use them.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Charles McCary
Valued Contributor

Re: Possible CPU Bottleneck

Jeff,

yeah - the only problem with letting them rip is that their product is taking too long to complete. When you factor in the runtime of their product, then another application that runs after it, then backups, we have basically started running out of hours in the day. I was hoping someone would look at this stuff and say "whoa, you really missed the boat on that one", tell me something obvious to change and make the runtime of the app go from 9 hours to 6 hours. Oh well, wishful thinking I guess.

thanks,

Charlie
Sandip Ghosh
Honored Contributor

Re: Possible CPU Bottleneck

Hi ,

Everything looks pretty normal, I mean to say the configuration. It is not swapping, the %rcache for the system is 100% (too good), there is no runaway process. It's now really difficult to comment.

Can you just once change the bufpages and nbuf to 0 and see the performance? It seems the nbuf parameter is set too high. Normally it should be bufpages/2 but in your case it is more than the bufpage.
Good Luck!!!
Charles McCary
Valued Contributor

Re: Possible CPU Bottleneck

Sandip,

Hmmm - didn't notice that.

sysdef reports nbuf as 90443, and SAM (kernel parms) shows it set to 0.

Sandip Ghosh
Honored Contributor

Re: Possible CPU Bottleneck

What abt the bufpage? that is also showing 0 like nbuf?
Good Luck!!!
Charles McCary
Valued Contributor

Re: Possible CPU Bottleneck

yep
Charles McCary
Valued Contributor

Re: Possible CPU Bottleneck

Here's the values in SAM


Cur Pending
Name Value Value
NSTREVENT 50 50
NSTRPUSH 16 16
NSTRSCHED 0 0
STRCTLSZ 1024 1024
STRMSGSZ 65535 65535
acctresume 4 4
acctsuspend 2 2
aio_listio_max 256 256
aio_max_ops 2048 2048
aio_physmem_pct 10 10
aio_prio_delta_max 20 20
allocate_fs_swapmap 0 0
alwaysdump 0 0
bufcache_hash_locks 128 128
bufpages 0 0
chanq_hash_locks 256 256
create_fastlinks 0 0
dbc_max_pct 7 7
dbc_min_pct 5 5
default_disk_ir 0 0
disksort_seconds 0 0
dnlc_hash_locks 64 64
dontdump 0 0
dskless_node 0 0
dst 1 1
eqmemsize 15 15
fcp_large_config 0 0
fs_async 0 0
ftable_hash_locks 64 64
hdlpreg_hash_locks 128 128
hfs_max_ra_blocks 8 8
hfs_ra_per_disk 64 64
initmodmax 50 50
io_ports_hash_locks 64 64
ksi_alloc_max 65696 65696
ksi_send_max 32 32
max_async_ports 50 50
max_fcp_reqs 512 512
max_mem_window 0 0
max_thread_proc 256 256
maxdsiz 301989888 301989888
maxdsiz_64bit 1073741824 1073741824
maxfiles 60 60
maxfiles_lim 1024 1024
maxssiz 8388608 8388608
maxssiz_64bit 8388608 8388608
maxswapchunks 4098 4098
maxtsiz 67108864 67108864
maxtsiz_64bit 1073741824 1073741824
maxuprc 1000 1000
maxusers 1024 1024
maxvgs 10 10
mesg 1 1
modstrmax 500 500
msgmap 42 42
msgmax 8192 8192
msgmnb 16384 16384
msgmni 50 50
msgseg 2048 2048
msgssz 8 8
msgtql 40 40
nbuf 0 0
ncallout 8228 8228
ncdnode 150 150
nclist 16484 16484
ncsize 10564 10564
ndilbuffers 30 30
nfile 15331 15331
nflocks 200 200
ninode 9540 9540
nkthread 14387 14387
no_lvm_disks 0 0
nproc 8212 8212
npty 128 128
nstrpty 60 60
nstrtel 60 60
nswapdev 10 10
nswapfs 10 10
num_tachyon_adapters 0 0
o_sync_is_o_dsync 0 0
page_text_to_local 0 0
pfdat_hash_locks 128 128
public_shlibs 1 1
region_hash_locks 128 128
remote_nfs_swap 0 0
rtsched_numpri 32 32
scroll_lines 100 100
scsi_max_qdepth 8 8
scsi_maxphys 1048576 1048576
sema 1 1
semaem 16384 16384
semmap 514 514
semmni 512 512
semmns 16384 16384
semmnu 512 512
semmsl_override 2048 2048
semume 256 256
semvmx 32767 32767
sendfile_max 0 0
shmem 1 1
shmmax 1073741824 1073741824
shmmni 768 768
shmseg 256 256
st_ats_enabled 1 1
st_fail_overruns 0 0
st_large_recs 0 0
streampipes 0 0
swapmem_on 1 1
swchunk 2048 2048
sysv_hash_locks 128 128
tcphashsz 0 0
timeslice 10 10
timezone 420 420
unlockable_mem 0 0
vnode_cd_hash_locks 128 128
vnode_hash_locks 128 128
vps_ceiling 16 16
vps_chatr_ceiling 65536 65536
vps_pagesize 4 4
vx_maxlink 32767 32767
vx_ncsize 1024 1024
vx_ninode 0 0
vx_noifree 0 0
vxfs_max_ra_kbytes 1024 1024
vxfs_ra_per_disk 1024 1024
Sandip Ghosh
Honored Contributor

Re: Possible CPU Bottleneck

Sorry Charles,

I think now we have to turn on to the aplication side. There is no other choice left out in the system side.

Sandip
Good Luck!!!
Hai Nguyen_1
Honored Contributor

Re: Possible CPU Bottleneck

Charles,

Is it true that all processes perform the same functionality? If so, can you attach one of their log files? If the log is too big, you may truncate it.

Hai