- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: Performance question
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-06-2001 07:13 AM
09-06-2001 07:13 AM
Here's the situation:
Hardware:
N-Class with 4 440 processors (originally)
FC-60 disk array
Scenario:
We have a vendor who is writing code for us to basically read in some data and write this data to an Oracle database (it's more complex than this, but that's it in a nutshell).
During the times when their program is running, the system shows normal data for performance except for the CPU which is max'ed out (disk, memory, swap are all ok). Of course we purchased more horsepower in all of the resources than the benchmark required but we're still looking at abnormal runtimes of their program.
They suggested that we purchase additional processors, so we did. We now have 8 440 processors and it looks like the run time of their program has remained basically the same.
All perf. data is still the same (cpu max'ed out, all else ok).
Couple of other notes - their program is configured to use the processors available, so they are taking advantage of the processors by spawning more processes.
I'm pretty sure this is a code issue, but I wanted to get the groups opinion on whether I've missed something or not.
thanks,
C
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-06-2001 07:30 AM
09-06-2001 07:30 AM
Re: Performance question
Doubling the horsepower should halve the time for execution since it sounds like they can keep all the processors busy.
More processes do not necessarily make an application run faster. Sounds like there is a syncronization problem and the processes are all trying to do the same task and running into each other.
Reading data and writing it should make for I/O bottlenecks, not CPU, unless they are computing prime numbers before writing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-06-2001 07:31 AM
09-06-2001 07:31 AM
Re: Performance question
So all eight CPUs are running at 100%? What does your run queue look like? Did your I/O rates increase in respone to more CPUs? Fire up glance and look at your Global Waits (B). Are you blocked on I/O? Sleep? Semaphore?
Spawning more procs in response to more CPUs sounds like a rather crude way of going about multithreading. If all eight of your CPUs are genuinely maxed out, and you're not continually waiting on something like I/O, I'd say the developers have something spinning away in their proc(s) that needs to be fixed.
Cheers,
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-06-2001 07:40 AM
09-06-2001 07:40 AM
Re: Performance question
You are now in my area and I think I can give you a technique to nail down the problem - I always use this method when trying to solve this type of problem.
You need to start graphing some metric (e.g. insertions/s, updates/s, ... vs quantity of data). It is usually necessary to plot the log
of the metric vs quanity of data - e.g. log insertions/s vs rows of master data.
The slope of this curve can be very revealing about the nature of the problem. For example,
if the slope of the log plot is about 2 then you have an N-squared problem. I have sometimes seen performance degrade with the 4th power of the number of rows. Typically, problems like these arise from poor indexing and badly formed joins. Many times a single index can fix the problem. Graphing the data
tends to reveal the point at which no amount of hardware is going to fix the problem.
Regards, Clay
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-06-2001 08:15 AM
09-06-2001 08:15 AM
Re: Performance question
Here's some interesting stats from Glance, tell me what you think:
Under the Event Column:
Event % Time
Pipe 5.4 87.98
Semaphore 3.5 60.41
Sleep 35.4 619.65
Stream 14.6 255.24
Terminal 0.3 5.07
Other 16.1 381.60
Under the Blocked On Column:
Blocked ON % Time Procs
IO 0.8 13.77 2.7
Priority 2.3 39.84 7.8
System 19.5 340.43 66.5
Virtual Mem 0.0 0.22 0.0
All other rows under both columns were 0.
Looks like things are sleeping waiting on CPU time to me - what do you think?
tx,
C
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-06-2001 11:44 AM
09-06-2001 11:44 AM
Re: Performance question
# collect sar data
0 * * * * /usr/lbin/sa/sa1
20,40 8-17 * * 1-5 /usr/lbin/sa/sa1
#reduce the sar data
5 18 * * * /usr/lbin/sa/sa2 -s 8:00 -e 18:01 -i 900 -A
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-06-2001 01:22 PM
09-06-2001 01:22 PM
Re: Performance question
Since you have Glance+ installed, I would suggest to configure workloads on the system. Check the file /var/opt/perf/parm. Create one application with your application executables and collect the data. There are some examples in the file itself that will direct you. You need to restart scope/UX to re-read the parm file.
Once the data is collected, you can generate reports on various metrics
* APP_PRI_WAIT_PCT
* APP_DISK_SUBSYSTEM_WAIT_PCT
* APP_MEM_WAIT_PCT
* APP_SEM_WAIT_PCT
* APP_TERM_IO_WAIT_PCT
* APP_OTHER_IO_WAIT_PCT
* APP_NETWORK_SUBSYSTEM_WAIT_PCT
* APP_SLEEP_WAIT_PCT
* APP_IPC_SUBSYSTEM_WAIT_PCT
There are other interesting Application metrics. You can see them in /var/opt/perf/reptall file.
You will get a very good feel of what the application is doing.
-Sri
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-06-2001 01:50 PM
09-06-2001 01:50 PM
Re: Performance question
have checked wether your Oracle is actually using all
those CPUs? There is an "init*ora" parameter for the
amount of CPUs used by Oracle - and the default is 1!
The second attempt could be to reduce the kernel
parameter "time_slice" which is 10*10ms per process,
but in your highly cpu-intensive environment, you might
have some advantage by REDUCING it, say to 7 or 8,
from its default of 10. Batch-oriented jobs will take longer
then, but the I/O oriented jobs get a time-slice more
often...
HTH,
Wodisch
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-07-2001 01:28 AM
09-07-2001 01:28 AM
Re: Performance question
my application vender use to make this CPU problem to me.
event 1. they configure their apps to spawn parallel processes and , I find from glance that they're waiting for something (I cann't remember) it's problem about locking and when we change to option not to run parallel it is more fast , let's say from many hours to 5 mins.
event 2. they write shell script that consume lots of CPU just checking and compare time.
I change that script to run in cron , CPU usage was reduce about 40%
----------------
I don't think buy more CPU is good idea.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-07-2001 05:16 AM
09-07-2001 05:16 AM
Re: Performance question
so you went from running multiple processes in parallel (this is what we're doing now) to running only one process? And this helped?
tx,
C
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-07-2001 05:48 AM
09-07-2001 05:48 AM
Re: Performance question
This is all very interesting.
I think the problem is indeed too many processes all trying to do locking at the same time. Remember, on any multiple CPU server HP-UX has to do all critical locking on only ONE CPU. ie. the first. The kernel here carrys out locking in a single threaded way - one process at a time, regardless of which CPU theyre running on - they all need to come back to a single CPU when they get around to doing some locking. Check your system calls and context switching values.
So, in theory, and knowing that we have a single threaded part of the kernel running on a single CPU to handle all our critical locking, is it better to have more and more cpus, and more and more processes all trying to access locks on a single CPU,
OR
have a single CPU as fast as possible where everything should run a lot more sequentially thru the single threaded part of the kernel. Thus locking system call totals and context switching should be able to run higher on this configuration.
I think the latter is true. Weve already had an example here of an application which ran much faster on 2-way 550 Nclass than on a 4x440 Lclass !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-07-2001 05:54 AM
09-07-2001 05:54 AM
Re: Performance question
If they are all waiting for syncronization (writing to the same place in memory, they would all be working on the same set of data), then the extra overhead for all these processes will eat up the system.
Since it is database work, they may be updating the same areas of disk and you would be waiting on I/O to complete, which does not seem to be the case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-07-2001 05:58 AM
09-07-2001 05:58 AM
Re: Performance question
I guess I should provide a little more info:
When we were originally testing the vendor code (and we only had 4 processors), we did do some tests to guage run-time. We ran the code with 20 parallel processes, then dropped that number by two and re-ran multiple times until we reached 4 parallel processes.
The fastest runtime was seen when running with 8 parallel processes.
After adding the 4 additional processors to the machine, logically the fastest time should be seen when running 16 parallel processes (if 8 was fastest with 4 processors, at least that's my thinking anyway).
Does this change your opinion?
tx,
Charlie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-07-2001 06:02 AM
09-07-2001 06:02 AM
Re: Performance question
I'm still convinced this is poorly written code. Fortunately you should be able to get your software developer to compile everything with -p to enable profiling. You can then use prof to get statistics on which functions are being hammered and zero in on the bad code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-07-2001 06:07 AM
09-07-2001 06:07 AM
Re: Performance question
So the optimum was 4 CPUs and 8 parallel processes (I guess approx 2 per CPU).
Then you upgraded to 8 CPU's. I would not think this would make your server faster as the overhead from the kernel having to manage an additional 4 CPUs, and the overhead of having to squeeze processes running on an additional 4 CPU's into the single threaded locking part of the kernel would in fact slow down your application.
Adding 4 more CPU's should allow more users onto the server and especially if you have > 1 application running on it, so they wont necessarily compete for the same resources (hopefully:-) ), but in terms of straight performance I would not think it would speed it up, but marginally slow it down.
Instead of upgrading to 8x440's if you replaced the existing 4x440's with 4x550's I would expect a 20-25% increase. I think this should be your preferred plan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-07-2001 09:20 AM
09-07-2001 09:20 AM
SolutionI have faced similar issues with vendor
programs which import/export data in a data mining environment. The question which
needs to be addressed here is the objective
of the "tuning" exercise. Does the vendor/users feel that the response time
has to improve further?? or is it question
of pegging the CPU usage below the maximum
of 100%?
What is the run-queue and pri_queue
values?? CPU utilization alone is not
a good indicator of the system/cpu performance.
If the CPU queues (pri_queue is a better
indicator than run_queue) are also exceedingly
high (anything consistenly above 3), then
you have a CPU bottleneck issue.
Check the history of these values through
measureware as follows:
-----------
copy the /var/opt/perf/reptall file into
/tmp/reptall.
Edit the /tmp/reptall file and enable
the GBL_PRI_QUEUE , GBL_RUN_QUEUE and
other CPU usage values.
Then run,
extract -xp -v -gp -r /tmp/reptall
------
The problem here is obviously related
with the way the application is coded.
If they are running multi-stream jobs,
there is a chance that these jobs may
need to access a common file/resource, which
can involve contention.
Since, there is no disk, memory bottleneck
here, it makes the jobs open to use the
CPU's all the time!.
The "data conversion" applications which
we use are CPU hoggers, by the way they
are designed. So, adding CPU's is just
like throwing another rock in the ocean.
It may not necessarily help.
Tackle this from the application end. The
measureware stats will help you in presenting
your case.
Best!
Raj