Operating System - Linux
1752475 Members
6320 Online
108788 Solutions
New Discussion юеВ

Can obtain higher performance if using +DD64 not +DD32 for f90 on Superdome Itanium2?

 
HM Li
Advisor

Can obtain higher performance if using +DD64 not +DD32 for f90 on Superdome Itanium2?

Someone said if using +DD64 to complile fortran file, can obtain higher performance the using +DD32 on HP-UX Superdome Itanium2 system. But on my testing: mpif90 +O2 +DD64 and mpif90 +O2, the diffirence between them is small than 5%.


HP-UX 64-bit performance considerations
Most applications can remain in 32-bit mode on HP-UX 64-bit systems. However, some
applications, which manipulate very large data sets, are constrained by the 4GB address space
limit in 32-bit mode. These applications can take advantage of the larger address space and larger physical memory of 64-bit systems.
Some I/O bound applications can trade off memory for disk I/O. By restructuring I/O bound
applications to map larger portions of data into memory on large physical memory machines, disk
I/O can be reduced. This reduction in disk I/O can improve performance because disk I/O's are
more time-consuming than memory access.
Memory-constrained applications, such as large digital circuit simulations, may also benefit by
transitioning to 64-bit mode. Some of these simulations have grown to the point where they cannot run without major code modifications in a 32-bit address space.

what impacts performance in 64-bit mode

Typical applications do not require more virtual memory than what is available in 32-bit mode.
When compiled in 32-bit mode on HP-UX 64-bit platforms, these applications usually perform better than when recompiled in 64-bit mode on the same 64-bit platform. Some of the reasons for this include:

64-bit programs are larger. Depending on the application, the increase in the program size can increase cache and TLB misses and place greater demand on physical memory.

64-bit long division is more time-consuming than 32-bit integer division.

64-bit programs that use 32-bit signed integers as array indexes require additional instructions to perform sign extension each time an array is referenced.

By default, 64-bit object modules can be placed into shared and archive libraries and used in main programs. 32-bit code must be compiled with the +z or +Z option if it is used in shared libraries.
8 REPLIES 8
Torsten.
Acclaimed Contributor

Re: Can obtain higher performance if using +DD64 not +DD32 for f90 on Superdome Itanium2?

I would ask this question in the HP-UX forum, not here.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
rick jones
Honored Contributor

Re: Can obtain higher performance if using +DD64 not +DD32 for f90 on Superdome Itanium2?

You might want to consider using Caliper to profile your application and see where it is spending all its time and then go from there.

Is this a single-threaded application?

Is there CLM enabled on the Superdome?

Is processor or locality domain affinity being used?

You might look at some of the options used in HP's SPECfp2000 submittals and see if any of them are applicable to your application and if they help improve the performance.
there is no rest for the wicked yet the virtuous have no pillows
HM Li
Advisor

Re: Can obtain higher performance if using +DD64 not +DD32 for f90 on Superdome Itanium2?

Thank you!

I had tested a serial F90 program, a MPI+F90 program and an OpenMP+F90 program compiled using +DD64 +DSitanium/+DD32, but the diffirence between them is not obvious.

For example a serial F90 program:

f90 +DD64 +DSintanium test.f90
real 14:31.4
user 14:26.6
sys 0.3

f90 +DD64 test.f90
real 14:30.2
user 14:26.7
sys 0.1

f90 test.f90
real 14:36.8
user 14:27.1
sys 0.4

The reason I ask this question is a HP staff told us, if using +DD64 on Itanium2 will obtain large improvement, but as I tested and known form the HP-F90 manual, it is opposition with the staff.
rick jones
Honored Contributor

Re: Can obtain higher performance if using +DD64 not +DD32 for f90 on Superdome Itanium2?

There is something called "the telephone game" - you get a fairly long line of people - say 20 or so and inject a message at one end. Each person repeats the message to the next in line and you see what message comes-out the end.

Perhaps something like that happened here.

It could be that someone was thinking of doing math operations on 64 bit quantities rather than 32 bit quantities, not whether the addressing was 32 or 64 bit.

Perhaps something else.

"Long ago and far away" some PA-RISC programs could be faster compiled for 64-bit addressing because the shared library linkage mechanism was better than that for 32-bit addressing - some legacy stuff could be avoided.

On some processors (Itanium is not one of them IIRC neither are SPARC or Power) when an application is compiled for 64-bit addressing, that lack of legacy meant that more registers were available - I believe that Opteron is like that, and perhaps EM64T.

In any event, I would suggest moving-on to other options to examine and potentially improve the performance of your application - such as the aforementioned use of options from SPECfp2000 "peak" submittals and examining just where time is being spent with the likes of Caliper.
there is no rest for the wicked yet the virtuous have no pillows
HM Li
Advisor

Re: Can obtain higher performance if using +DD64 not +DD32 for f90 on Superdome Itanium2?

Thank you.
I see.
Steve Lewis
Honored Contributor

Re: Can obtain higher performance if using +DD64 not +DD32 for f90 on Superdome Itanium2?

Theres a lot of hype about 64 bit computing that everybody needs to be wise to.

64 bits gives you address space improvements - more held in memory so less file i/o is required and you can work with large amounts of data in memory all at once. A single i/o request from initial call to getting the data back can take 40,000 machine instructions to execute, plus time taken waiting for the i/o device itself where the delay can be the equivalent of several million instructions.

So, the improvements with 64 bits are more with address and data set space than actual instructions. You are really doing the right thing by testing your code for yourself.

It could be that the majority of your processing time is spent writing debug messages out to files. That's what our tracing found. Don't knock 5% - its one hour out of 20 and that sort of improvement actually matters to some of us.

There used to be purchasable specialist math libraries for hp-ux on various cpus. Search for math library in all of hp,
http://search.hp.com/query.html?qt=math+library&submit.x=8&submit.y=5&hpl=0&todo=search&searchcriteria=allwords&from=hpmaintainer&searchcategory=ALL&rn=25&presort=rank&source=7000&esc=europe.support.itrc.hp.com&wpa=forums1.itrc.hp.com%3A80&origin=0&chkServStor=on

HM Li
Advisor

Re: Can obtain higher performance if using +DD64 not +DD32 for f90 on Superdome Itanium2?

Thank you for your advice. I see.
Dennis Handly
Acclaimed Contributor

Re: Can obtain higher performance if using +DD64 not +DD32 for f90 on Superdome Itanium2?

>32-bit code must be compiled with the +z or +Z option if it is used in shared libraries.

While this is true for PA, the +z/+Z options are ignored on IPF since PIC is the default.