Operating System - HP-UX
1821051 Members
2504 Online
109631 Solutions
New Discussion юеВ

Processor/system overhead related to multi-processors

 
Tom Krocak_1
Advisor

Processor/system overhead related to multi-processors

We have a rp7410 system with 5 processors running HPUX 11i. It is running the QAD ERP application with Progress DB, which uses disk cache but still has a high physical I/O rate in the range of 1500/sec (logical I/O is in the 10k-20k range/sec). This system was recently upgraded from a 4 to 5 processor system, but it appears that system processor overhead (root associated processes) increased 5-15% such that the net productive processor addition appears to be more like 85% of a processor. Has anyone any experience or comments on what the incremental overhead is with adding processors? Does processor speed slow down with multi-processors due to memory or other contention factors (i.e. will a measured 10.000 CP second task before the processor addition still take 10.000 CP seconds)? Is there any way to measure this type of impact using performance tools (such as measureware/OVPA), short of doing a benchmark test?
6 REPLIES 6
Patrick Wallek
Honored Contributor

Re: Processor/system overhead related to multi-processors

Well, if you are talking about a single task, then adding additional CPUs will normally not make the taks go any faster, unless that task happens to be multi-threaded.

A single-threaded task will likely still take the same amount of CPU seconds to complete. Now the benefit would be, with multiple CPUs, you can have multiple different single-threaded tasks executing on each CPU, thus each task could get back to a CPU quicker.

There could potentially be some overhead with additional CPUs because HP-UX has to keep track of what is running on each CPU. I don't know if that has been measured anywhere though.
Tom Krocak_1
Advisor

Re: Processor/system overhead related to multi-processors

My question relates more to your 3rd comment. Thanks
Bill Hassell
Honored Contributor

Re: Processor/system overhead related to multi-processors

When the first HP9000 multi-processors came out (the Emerald system such as the 870), there were definitely major issues in adding more processors. The second processor was worth 0.9 CPU, the next was about 0.6 CPU, the next less than 1/2 a CPU (4 processors was the max back then). Huge amounts of code changes were made to minimize multi-CPU management as the basic kernel was not originally designed for multi-processors. Linux went through the eaxct same growing pains as multi-CPU support was added.

But all that was back in the late 1980's. 9.04 was the first opsystem with excellent code for multi-processor systems, overhead measured in less than a couple of % total CPU time. But note that this is never a fixed value because it depends on what is happening in the Monarch (first) processor. There are specific events that must be single-threaded such as rearranging memory, performing a core dump on a program, and many other kernel tasks that cannot done in parallel. When this happens, a spinlock is created which effectively causes all non-Monarch processors to stop running kernel code until the task is completed. Minimizing spinlocks reduces the overhead for multiple CPUs.

Today, multi-CPU systems run 32 (or more) CPUs and spinlocks are quite minimal. I doubt that the increase in processor count significantly increased multi-CPU overhead. Certain types of I/O can create extra spinlocks so it may be possible to see additional kernel delays (not necessarily compute time) with Glance/MWA.

Your biggest changes in performance figures will be with disk I/O first. The high physical I/O rate needs to be addressed and since the applications are generating massive I/O's that are apparently not using the cache effectively, DBAs need to look at DB performance figures. It is really easy to write bad queries and without appropriate indexes, massive (but unnecessary) disk I/O will be generated.

Like all performance projects, you have to create a benchmark task that can be run repeatedly with dependable results. Then when changes are made, the benchmark can provide accurate metrics. Most database programs can benefit by configuring a lot of additional memory (Gb).


Bill Hassell, sysadmin
Ted Buis
Honored Contributor

Re: Processor/system overhead related to multi-processors

First, you never get 100% of a CPU when adding one, just like you never get 100% of network bandwidth. You mention you are doing I/O, so adding CPUs won't help much in that area. There is the possibility of bus contention for the limited memory bandwidth, but contention for I/O bandwidth typical is more often seen. However, in your case, there may be other factors. The rp7410 can have two cell boards. If you had a single cell board originally, you had uniform access to memory within the cell. A rp7410 cell can hold up to four CPUs. To get the fifth CPU, a second cell board is added with additional memory on that cell board. Now the average latency of memory increases, because it takes a little longer to fetch memory from cell 2 by a processor in cell one. Memory access is non-uniform. It isn't until 11i v2 that HP-UX is able to distinguish cell local memory. Sometimes it is better to use cell local memory due to the slightly lower latency, and sometime it is better to distribute the memory across cells, since there is greater total bandwidth. Different application respond differently to variations in latency and bandwidth, in the cell based machines. In Superdome, there can be a boosts in the productivity at certain points as processors are added, as more connections between the crossbar switches are added as the system scales up. You might also look at how your memory is configured across the two cell boards and in each cell board in your rp7410. Some memory configurations perform better than others. The optimal configuration is to half fill or completely fill the memory slots on a cell board.
Mom 6
Tom Krocak_1
Advisor

Re: Processor/system overhead related to multi-processors

The added 5th processor was part of ICOD; prior to the enabling of the 5th, we were using processor 0,1,2,5 (using sar -M), now we are using 0,1,2,3,5. So I think we were always in the situation of cross-fetching memory across cells, as noted by Ted. I will attempt to get the memory configuration on both cell boards (we have a total of 8 GB ram).

Bill's comments about system code optimization for multi-processors was interesting, and I feel comfortable in ruling out additional system overhead just due to adding a 5th cp.

Both Bill and Ted alluded to disk I/O and potential contention. Using sar, typical numbers are %usr = 35%, %sys = 20% and %wait = 35%. As I noted originally, QAD/Progress uses disk cache. The cache hit ratio averages in the range of 89-91%, using about 800 MB for cache (dbc_max_pct = 10%, for a 8 GB system). According to "rule of thumb", 90% is an acceptable hit ratio. We have an EMC disk array which has an average response time in the 5-9 ms range (sar -d). Given all this, I assume we just have an I/O bound system, generated by the ERP application and it's various transaction types, generated by ~400 end-users. Reducing I/O volumes would appear to be a very difficult task. Does the %sys (from sar above), reflect system processor time or system wait time?

Thanks for your comments - they were informative and I am just trying to understand how this works so we can make good decisions as our processor load grows (50+% growth trend in the last year, as measured by mwa GBL_CPU_TOTAL_UTIL).
Steven E. Protter
Exalted Contributor

Re: Processor/system overhead related to multi-processors

The i/o contention may or may not be related to the CPU situation.

If you have your disk layout with to much write activity in one disk or disk set, you can get massive i/o problems that have nothing to do with the additional cpu's.

If you have a write intensive database sitting on a lun laid out with raid 5 you can have the same problem.

bad application code can cause it as well.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com