f90 problems (parallelization)

Hao Yu · ‎12-17-2001

Merry Chrismas,

I encountered following 2 confused phenomenon while I tried
to parallelize a loop in a real application.

Some facts:

- The serial loop runs about 0.02-7 seconds
- compiling options: +O3 +Oparallel +Onodynsel
- privatization of arrays were done by duplicated arrays
+ last values were taken care of by myself.
- loop contains 3 nested loops inside, maximal nest level is 4

1)
fact update:
- compiling options: +O3 +Oparallel +Onodynsel

phenomenon:

directive "C$DIR LOOP_PARALLEL" C$DIR NO_PARALLEL
---------------------------------------------------------
execution time 3T ~ 4T T

( the first case is 3~4 times slower than the second case )

possible reasons?
a) T ~ 0.02-7 second, which is enough to cover the overhead of
the library subroutines such as fork() or pthread utilities.
b) while using "C$DIR LOOP_PARALLEL", there is no difference between
with or without attached "C$DIR LOOP_PRIVATE(...)" entries.

2)

fact update:
- compiling options: +O3 (+O2)
- with "C$DIR LOOP_PARALLEL" directive

phenomenon:

compiling options +O3 +O2
--------------------------------------------
execution time 3T ~ 5T T

two problems from this phenomenon:
a) "+O3" + "C$DIR LOOP_PARALLEL" determined the slowness
==> "+Oparallel" switch here have no effect on that
( should have though )

b) "+O3" lost performance dramatically comparing to "+O2"
in this case.
( in general, "+O3" is slightly slower than "+O2" too )

The first problem is what I need to get over with. I encountered the
second one while I walking around the first problem.

Thanks a bunch + happy holiday.

The source code is attached + the make file is as below:
( same make file content is in the attached source )

F90 = f90
O3POPTS=+U77 +extend_source +O3 +r8 +Oparallel +cpp=yes
FLDOPTS = -lpthread -lcps -lveclib -lm

.f.o:
$(F90) $(O3POPTS) -c $<

TI.par: TISI.p1.o
time $(F90) $(O3POPTS) -o $@ TISI.p1.o $(FLDOPTS)

student

Hao Yu · ‎12-17-2001

The tables messed up. Here
are them again:

1)
loop_parallel
execution time = 3T ~ 5T

no_parallel
execution time = T

2)
+O3
execution time = 3T ~ 5T
+O2
execution time = T

student

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

f90 problems (parallelization)

f90 problems (parallelization)

Re: f90 problems (parallelization)