cancel
Showing results for 
Search instead for 
Did you mean: 

Terrible performance on Itanium

Jim Pennino
Occasional Visitor

Terrible performance on Itanium

I have a C++ application that processes data as it comes in and the performance on Itanium is terrible to say the least.

The exact same code is run on SPARC and RISC systems and runs fine.

One processing loop that takes less than a second to execute on the other systems takes 46 seconds on Itanium. This is all calculations.

Using grof, prof, and caliper I have not been able to find where the time goes.

The really strange thing is when I run the program under tusc, the loop execution time drops from 46 seconds to 2 seconds.

Top shows the CPU's to be basically idle when this runs.

Any suggestions on how to track this down?


5 REPLIES
Steven Schweda
Honored Contributor

Re: Terrible performance on Itanium

> I have a C++ application [...]

Compiled and linked how?

> [...] that processes data as it comes in
> [...]

I may be dense, but that description tells me
nothing. What sort of data, coming into
where, from where, how, processed how?

I know nothing, but a Forum search for
keywords like, say:
alignment
might find some old threads related to one
possible source of sloth on IA64 systems.
Dennis Handly
Acclaimed Contributor

Re: Terrible performance on Itanium

>Using grof, prof, and caliper I have not been able to find where the time goes.

Does time(1) and caliper show the system (vs user) is taking most of the time? What about gpm?
What caliper reports did you try?
Is your application threaded?
What HP-UX version?

>This is all calculations.

Are you heavily using floating point and getting denorms? Link with +FPD

>The really strange thing is when I run the program under tusc

Hmm, rings a faint bell?

>Steven: Compiled and linked how?

Right, what compiler version and opt level?

>alignment might find some old threads related to one possible source of sloth on IA64 systems.

Except that PA has the same alignment restrictions and the default action on HP-UX is to abort.
Jim Pennino
Occasional Visitor

Re: Terrible performance on Itanium

Building:

On both Itanium and RISC:

CXXFLAGS = -AA -O $(INCLUDES)
LDFLAGS = -AA -Wl,-O,+n,+s,-s,-z,+k

This program under normal operation runs forever but for running tests I added some code to limit the run to 15 minutes.

time executable gives:

Itanium
real 15m27.43s
user 0m2.48s
sys 0m0.53s

RISC
real 15m7.36s
user 0m6.58s
sys 0m3.28s

Around one of the main calculation loops I added code that looks like:

start_time = time (0);
for () {
calculations
}
syslog (LOG_DEBUG, "Loop time %d", time (0) - start_time);

For Itanium with 1194 data points, loop time is around 64 seconds.

For RISC with 2337 data points, loop time is
0 or 1 second.

The numbers for a SPARC machine are similar to the numbers for RISC.

The Itanium system does not have gpm.

In a 15 minute period this loop gets called 30 times.

All the operations in the loop are simple arithmetic, some float, some int.

It seems to me that most of the wall clock time is spent waiting or blocked on something, but I have no clue what would block arithmetic.

The system has lots of memory and is not swapping.

One other thing that may make a difference just occurred to me; the machine is virtual and I have no access to the "real" machine but the admin's butt for the real machine is on the line to get this working and will do anything I ask him to.
rick jones
Honored Contributor

Re: Terrible performance on Itanium

Ask him to let you run the experiment on the host rather than the guest.
there is no rest for the wicked yet the virtuous have no pillows
Jim Pennino
Occasional Visitor

Re: Terrible performance on Itanium

It may come down to doing that just to prove it is the virtual part causing the problem, but it won't solve the problem.

To make a long story short, the application in question is but one of many interacting applications that are a giant pain to set up on a system, which is why I haven't done it.