f90 problems (parallelization)

Hao Yu — Tue, 18 Dec 2001 06:53:21 GMT

Merry Chrismas,

I encountered following 2 confused phenomenon while I tried
to parallelize a loop in a real application.

Some facts:

- The serial loop runs about 0.02-7 seconds
- compiling options: +O3 +Oparallel +Onodynsel
- privatization of arrays were done by duplicated arrays
+ last values were taken care of by myself.
- loop contains 3 nested loops inside, maximal nest level is 4

1)
fact update:
- compiling options: +O3 +Oparallel +Onodynsel

phenomenon:

directive "C$DIR LOOP_PARALLEL" C$DIR NO_PARALLEL
---------------------------------------------------------
execution time 3T ~ 4T T

( the first case is 3~4 times slower than the second case )

possible reasons?
a) T ~ 0.02-7 second, which is enough to cover the overhead of
the library subroutines such as fork() or pthread utilities.
b) while using "C$DIR LOOP_PARALLEL", there is no difference between
with or without attached "C$DIR LOOP_PRIVATE(...)" entries.

2)

fact update:
- compiling options: +O3 (+O2)
- with "C$DIR LOOP_PARALLEL" directive

phenomenon:

compiling options +O3 +O2
--------------------------------------------
execution time 3T ~ 5T T

two problems from this phenomenon:
a) "+O3" + "C$DIR LOOP_PARALLEL" determined the slowness
==> "+Oparallel" switch here have no effect on that
( should have though )

b) "+O3" lost performance dramatically comparing to "+O2"
in this case.
( in general, "+O3" is slightly slower than "+O2" too )

The first problem is what I need to get over with. I encountered the
second one while I walking around the first problem.

Thanks a bunch + happy holiday.

The source code is attached + the make file is as below:
( same make file content is in the attached source )

F90 = f90
O3POPTS=+U77 +extend_source +O3 +r8 +Oparallel +cpp=yes
FLDOPTS = -lpthread -lcps -lveclib -lm

.f.o:
$(F90) $(O3POPTS) -c $<

TI.par: TISI.p1.o
time $(F90) $(O3POPTS) -o $@ TISI.p1.o $(FLDOPTS)

Re: f90 problems (parallelization)

Hao Yu — Tue, 18 Dec 2001 07:08:54 GMT

The tables messed up. Here
are them again:

1)
loop_parallel
execution time = 3T ~ 5T

no_parallel
execution time = T

2)
+O3
execution time = 3T ~ 5T
+O2
execution time = T

topic f90 problems (parallelization) in Operating System - HP-UX

f90 problems (parallelization)

Re: f90 problems (parallelization)