1831479 Members
3352 Online
110025 Solutions
New Discussion

HP Fortran problems

 
Laszlo_Pap
Occasional Contributor

HP Fortran problems

Dear HP-UX/FORTRAN Experts,

I am an user of two HP platforms (A500/PA8500 and rx2800/Itanium) both running HP Fortran software on 2-2 processors. So parallelization is possible for both machine but as far as my experience goes it is not a straightforward task. Here I would like to share some strange observations hoping that some of you can point out where and how I did something wrong. Many thanks for your help in advance!
So, parallelization is very well functioning on the older A500 using the corresponding HP compiler directives (e.g. C$DIR LOOP_PARALLEL) inserted directly in the code and run the compiler like this:
f90 +Oreport +Onoinline=prism +O3 +Oparallel -o../mybin/gfm gfm.f.
During compilation time the compiler nicely lists (+Oreport) where it is able to parallelize the code, which variables privatized, etc...
The executable gfm runs on both processors using nearly their 100% of their capacity. Great! The only one strange thing is that above N~6100000 (N is the of number of loops in which subroutine prism is called and parallelized) defined as an input variable during startup of gfm the op. system starts to use only one processor. ??? This problem, however can be overcome by replacing the variable by a numeric constant, like this:
DO 100 I=1,7000000 (instead of READ(*,*) N ... DO 100 I=1,N). OK, it works, but not nice. Every the time I have to change N I have to recompile the code...
If I use OPENMP directives (e.g. !$omp parallel do private(...etc...) shared(...etc...)) then it works fine too and even I have NO PROBLEM with defining of cycle terminal N as a variable. But then there is a slight increase of the runtime (~ 10%). Not a big deal, worth to use OPENMP if the "problem N" can be avoided. A500 is finished... even if the "problem N" is annoying... Why is it so?

But all written above does not hold for rx2800!

On this platform the HP compiler directives cannot be used (obsolated) so
f90 +Oreport +Onoinline=prism +O3 +Oparallel -o../mybin/gfm gfm.f does not work at all! And I could not find a way to use HP specific parallelization just the OPENMP. Which, as far as I understood, is a rather general technique not really optimized for Itanium processors. So the only way to parallelize gfm.f is:
f90 +O3 +Oautopar +Oopenmp -o../mybin/gfm gfm.f
combined with the use of OPENMP compiler directives in the code indicating program sections to be parallelized. It works but the load of the cores (8 in two Itanium cores hardly reach 60% (even if no other user program runs...). One could say that it should be enough but... Why does the native HP way of parallelization not work on rx2800 platform? Why can I use not more than 60% of the processor power on rx2800 when it works on a much older platform of A500?

Many thanks for your patience and I hope to get some response from someone having much more experience in FORTRAN code parallelization on HP platforms.

With my best regards,

Laszlo