System Administration

Compilation Intensive Load rx2660 vs rx5670 performance

 
SOLVED
Go to solution
Michael S Costello
Occasional Advisor

Compilation Intensive Load rx2660 vs rx5670 performance

Have a rx2660 dual core IA64 running HP-UX 11.23 that is taking an intolerably long time to compile c++ code.

Are there any recommendations anyone has on optimizing for "aCC" and "ecom" programs when they are apparently not memory bound but bound by cpu performance and perhaps disk io?

I've been unable to build on a dual core 1398MHz rx2660 in less than 6 hours what a slightly beefier server dual-core 1500 MHz rx5670 can build in 1.5 hours. The latter machine actually has less ram installed.

I guess my question is: Is the difference between these two models hardware and io performance that pronounced?
If not should I look to tuning and / or hardware failure as a remedy for the differences.
16 REPLIES 16
Dennis Handly
Acclaimed Contributor

Re: Compilation Intensive Load rx2660 vs rx5670 performance

It depends on your source sizes and what opt levels you are using.
How many sources?

Are your source and object directories on NFS?
Gokul Chandola
Trusted Contributor

Re: Compilation Intensive Load rx2660 vs rx5670 performance

Hi,
No sir, there are many factors, it requres a lot of deep analysis.

Regards,
Gokul Chandola
There is always some scope for improvment.
Michael S Costello
Occasional Advisor

Re: Compilation Intensive Load rx2660 vs rx5670 performance

To answer Dennis:
Number of Source Files: 8218

# find . -type f -name '*.c*' | wc -l
8218

NFS: No nfs is not used for accessing any files used except when a build completes, compilation is performed on a local filesystem.

Building 64 bit code using appropriate flags for that purpose.
Don Morris_1
Honored Contributor
Solution

Re: Compilation Intensive Load rx2660 vs rx5670 performance

Since there are a bunch of read/writes going on under compilations -- did you per chance lower dbc_min_pct / dbc_max_pct on the rx2660 and shrink the buffer cache? Throttling the cache could be aggravating I/O bottlenecks.
Michael S Costello
Occasional Advisor

Re: Compilation Intensive Load rx2660 vs rx5670 performance

Don, thanks for the response, here is the range of values that I've already tried:
I had been tuning the rx2660 dbc_max_pct and dbc_min_pct in the following ranges with no difference in compilation time, it's fixed at 6 hours with each.

* defaults 50/5 max/min
* 20/20 (as configured on the dev machine rs5670)
* 90/5 (most recent setting)

Top down make time was unaffected with each, though with the middle value other tunings were applied, essentially attempting to slavishly match the 5670's values. I suspect that the 5670 had been tuned by HP hosts during a 64 bit porting seminar some years ago.

I'm going to do a complete reset to default kctune'd values to establish a baseline time later today and see what information I can get with those values then apply some different values for whatever anyone suggests here.

Seeing about getting a copy of glance as well, but it's hard to know when I'll be able to run that tool for this.
Dennis Handly
Acclaimed Contributor

Re: Compilation Intensive Load rx2660 vs rx5670 performance

You might want to look at caliper.
Also what version of aC++ do you have?
You didn't mention how many CPUs cores you have and whether you are doing parallel makes.
Michael S Costello
Occasional Advisor

Re: Compilation Intensive Load rx2660 vs rx5670 performance

Dennis:
Caliper is something that is available, but I honestly don't know what to look for with it.

aCC version: HP C/aC++ Developer's Bundle C.11.23.15.2

The builds are non-parallel makes, gmake is performing them, and enabling extra jobs causes compilation to fail, so It's not running parallel makes on the Dev machine.

Both machines have single cpu's with dual cores. the rx5670 has "beefier" MHz values, but I believe the rx2660 has a family 32 and the 5670 has a family 31 model 1 revision 5 (older cpu, pre 9000 Itanium 2) which makes me wonder if one can downgrade families for better performance. :)

Also, I HyperThreading is enabled on both (not 100% sure on the 5670 though) toggling threading on the 2660 didn't have a noticeable effect. Since the jobs are not parallel though, I don't know that the additional cpu's will help at all, would just make the system more responsive while one cpu is loaded.

While the compiles are rolling 100% utilizations alternate between cpu 0 and 1 with the relaxed cpu hovering at 10-30% values. As top reports things anyway.
Dennis Handly
Acclaimed Contributor

Re: Compilation Intensive Load rx2660 vs rx5670 performance

>aCC version: HP C/aC++ Developer's Bundle C.11.23.15.2

This is not the version I want and requires a secret decoder ring to get A.06.20. What does "aCC -V" show?
And you didn't mention what opt level you were using?

>The builds are non-parallel makes, gmake is performing them and enabling extra jobs causes compilation to fail, so It's not running parallel makes on the Dev machine.

Then you aren't using the resources you have. Why does it fail? (Out of swap or bad makefiles?) You probably need more memory/swap.

>HyperThreading is enabled on both

I didn't think you could do that on 11.23 and you probably don't want it.

>I don't know that the additional CPUs will help at all

You do multiple compiles at once.
Michael S Costello
Occasional Advisor

Re: Compilation Intensive Load rx2660 vs rx5670 performance

aCC -v output:

aCC: HP C/aC++ B3910B A.06.20 [May 13 2008]

Optimization Level: -fast flag is used for most of the .o files produced and the -D64 and -W*** flags (which aren't optimizations per se) no -O# optimizations are being used.

Memory/Swap:
Physical Memory: 4079.1 MB
Real Memory: Active/Total ~ 300 MB/500MB
Virtual Memory: Active/Total ~ 700MB/800MB
swap is 8 Gb and approximately 500-700 Mb of that is used.

So how do I make the machine forego using swap, if it is thrashing on swap and seems to have enough physical memory to get by just fine?

Threads: initial runs were with threading disabled, threading was enabled later in the game to see if it would affect performance either way, they haven't. I (admittedly ignorantly) suspect that 11.23 kernel might be ignoring the EFI set threading behavior anyway since it doesn't measurably change behavior or introduce breaks.

Multiple Compiles: The product built with it's existing makefiles do not support multiple job makes (apologies, thought I had written that in to an earlier posting) so that's not an option, but it does rule out the other machine being run with multiples to explain the difference.