<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic f90 problems (parallelization) in Operating System - HP-UX</title>
    <link>https://community.hpe.com/t5/operating-system-hp-ux/f90-problems-parallelization/m-p/2632030#M726374</link>
    <description>Merry Chrismas, &lt;BR /&gt;&lt;BR /&gt;I encountered following 2 confused phenomenon while I tried&lt;BR /&gt;to parallelize a loop in a real application.&lt;BR /&gt;&lt;BR /&gt;Some facts:&lt;BR /&gt;&lt;BR /&gt;  - The serial loop runs about 0.02-7 seconds&lt;BR /&gt;  - compiling options: +O3 +Oparallel +Onodynsel&lt;BR /&gt;  - privatization of arrays were done by duplicated arrays&lt;BR /&gt;    + last values were taken care of by myself.&lt;BR /&gt;  - loop contains 3 nested loops inside, maximal nest level is 4&lt;BR /&gt;&lt;BR /&gt;1)&lt;BR /&gt;  fact update:&lt;BR /&gt;  - compiling options: +O3 +Oparallel +Onodynsel&lt;BR /&gt;&lt;BR /&gt;  phenomenon:&lt;BR /&gt;&lt;BR /&gt;        directive       "C$DIR LOOP_PARALLEL"   C$DIR NO_PARALLEL&lt;BR /&gt;        ---------------------------------------------------------&lt;BR /&gt;        execution time          3T ~ 4T                 T&lt;BR /&gt;&lt;BR /&gt;        ( the first case is 3~4 times slower than the second case )&lt;BR /&gt;&lt;BR /&gt;  possible reasons?&lt;BR /&gt;  a)  T ~ 0.02-7 second,  which is enough to cover the overhead of&lt;BR /&gt;      the library subroutines such as fork() or pthread utilities.&lt;BR /&gt;  b)  while using "C$DIR LOOP_PARALLEL",  there is no difference between&lt;BR /&gt;      with or without attached "C$DIR LOOP_PRIVATE(...)" entries.&lt;BR /&gt;&lt;BR /&gt;2)&lt;BR /&gt;&lt;BR /&gt;  fact update:&lt;BR /&gt;  - compiling options: +O3 (+O2)&lt;BR /&gt;  - with "C$DIR LOOP_PARALLEL" directive&lt;BR /&gt;&lt;BR /&gt;  phenomenon:&lt;BR /&gt;&lt;BR /&gt;        compiling options       +O3             +O2&lt;BR /&gt;        --------------------------------------------&lt;BR /&gt;        execution time          3T ~ 5T         T&lt;BR /&gt;&lt;BR /&gt;  two problems from this phenomenon:&lt;BR /&gt;  a)  "+O3" + "C$DIR LOOP_PARALLEL"  determined the slowness&lt;BR /&gt;      ==&amp;gt; "+Oparallel" switch here have no effect on that&lt;BR /&gt;          ( should have though )&lt;BR /&gt;&lt;BR /&gt;  b)  "+O3" lost performance dramatically comparing to "+O2"&lt;BR /&gt;      in this case.&lt;BR /&gt;      ( in general, "+O3" is slightly slower than "+O2" too )&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;The first problem is what I need to get over with.  I encountered the&lt;BR /&gt;second one while I walking around the first problem.&lt;BR /&gt;&lt;BR /&gt;Thanks a bunch + happy holiday.&lt;BR /&gt;&lt;BR /&gt;The source code is attached + the make file is as below:&lt;BR /&gt;( same make file content is in  the attached source )&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;F90 = f90&lt;BR /&gt;O3POPTS=+U77 +extend_source +O3 +r8 +Oparallel +cpp=yes&lt;BR /&gt;FLDOPTS = -lpthread -lcps -lveclib -lm&lt;BR /&gt;&lt;BR /&gt;.f.o:&lt;BR /&gt;        $(F90) $(O3POPTS) -c $&amp;lt;&lt;BR /&gt;&lt;BR /&gt;TI.par: TISI.p1.o&lt;BR /&gt;        time $(F90) $(O3POPTS) -o $@ TISI.p1.o $(FLDOPTS)&lt;BR /&gt;&lt;BR /&gt;</description>
    <pubDate>Tue, 18 Dec 2001 06:53:21 GMT</pubDate>
    <dc:creator>Hao Yu</dc:creator>
    <dc:date>2001-12-18T06:53:21Z</dc:date>
    <item>
      <title>f90 problems (parallelization)</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/f90-problems-parallelization/m-p/2632030#M726374</link>
      <description>Merry Chrismas, &lt;BR /&gt;&lt;BR /&gt;I encountered following 2 confused phenomenon while I tried&lt;BR /&gt;to parallelize a loop in a real application.&lt;BR /&gt;&lt;BR /&gt;Some facts:&lt;BR /&gt;&lt;BR /&gt;  - The serial loop runs about 0.02-7 seconds&lt;BR /&gt;  - compiling options: +O3 +Oparallel +Onodynsel&lt;BR /&gt;  - privatization of arrays were done by duplicated arrays&lt;BR /&gt;    + last values were taken care of by myself.&lt;BR /&gt;  - loop contains 3 nested loops inside, maximal nest level is 4&lt;BR /&gt;&lt;BR /&gt;1)&lt;BR /&gt;  fact update:&lt;BR /&gt;  - compiling options: +O3 +Oparallel +Onodynsel&lt;BR /&gt;&lt;BR /&gt;  phenomenon:&lt;BR /&gt;&lt;BR /&gt;        directive       "C$DIR LOOP_PARALLEL"   C$DIR NO_PARALLEL&lt;BR /&gt;        ---------------------------------------------------------&lt;BR /&gt;        execution time          3T ~ 4T                 T&lt;BR /&gt;&lt;BR /&gt;        ( the first case is 3~4 times slower than the second case )&lt;BR /&gt;&lt;BR /&gt;  possible reasons?&lt;BR /&gt;  a)  T ~ 0.02-7 second,  which is enough to cover the overhead of&lt;BR /&gt;      the library subroutines such as fork() or pthread utilities.&lt;BR /&gt;  b)  while using "C$DIR LOOP_PARALLEL",  there is no difference between&lt;BR /&gt;      with or without attached "C$DIR LOOP_PRIVATE(...)" entries.&lt;BR /&gt;&lt;BR /&gt;2)&lt;BR /&gt;&lt;BR /&gt;  fact update:&lt;BR /&gt;  - compiling options: +O3 (+O2)&lt;BR /&gt;  - with "C$DIR LOOP_PARALLEL" directive&lt;BR /&gt;&lt;BR /&gt;  phenomenon:&lt;BR /&gt;&lt;BR /&gt;        compiling options       +O3             +O2&lt;BR /&gt;        --------------------------------------------&lt;BR /&gt;        execution time          3T ~ 5T         T&lt;BR /&gt;&lt;BR /&gt;  two problems from this phenomenon:&lt;BR /&gt;  a)  "+O3" + "C$DIR LOOP_PARALLEL"  determined the slowness&lt;BR /&gt;      ==&amp;gt; "+Oparallel" switch here have no effect on that&lt;BR /&gt;          ( should have though )&lt;BR /&gt;&lt;BR /&gt;  b)  "+O3" lost performance dramatically comparing to "+O2"&lt;BR /&gt;      in this case.&lt;BR /&gt;      ( in general, "+O3" is slightly slower than "+O2" too )&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;The first problem is what I need to get over with.  I encountered the&lt;BR /&gt;second one while I walking around the first problem.&lt;BR /&gt;&lt;BR /&gt;Thanks a bunch + happy holiday.&lt;BR /&gt;&lt;BR /&gt;The source code is attached + the make file is as below:&lt;BR /&gt;( same make file content is in  the attached source )&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;F90 = f90&lt;BR /&gt;O3POPTS=+U77 +extend_source +O3 +r8 +Oparallel +cpp=yes&lt;BR /&gt;FLDOPTS = -lpthread -lcps -lveclib -lm&lt;BR /&gt;&lt;BR /&gt;.f.o:&lt;BR /&gt;        $(F90) $(O3POPTS) -c $&amp;lt;&lt;BR /&gt;&lt;BR /&gt;TI.par: TISI.p1.o&lt;BR /&gt;        time $(F90) $(O3POPTS) -o $@ TISI.p1.o $(FLDOPTS)&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 18 Dec 2001 06:53:21 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/f90-problems-parallelization/m-p/2632030#M726374</guid>
      <dc:creator>Hao Yu</dc:creator>
      <dc:date>2001-12-18T06:53:21Z</dc:date>
    </item>
    <item>
      <title>Re: f90 problems (parallelization)</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/f90-problems-parallelization/m-p/2632031#M726375</link>
      <description>The tables messed up. Here&lt;BR /&gt;are them again:&lt;BR /&gt;&lt;BR /&gt;1)&lt;BR /&gt;  loop_parallel&lt;BR /&gt;    execution time = 3T ~ 5T&lt;BR /&gt;&lt;BR /&gt;  no_parallel&lt;BR /&gt;    execution time = T&lt;BR /&gt;&lt;BR /&gt;2)&lt;BR /&gt;  +O3&lt;BR /&gt;    execution time = 3T ~ 5T&lt;BR /&gt;  +O2&lt;BR /&gt;    execution time = T</description>
      <pubDate>Tue, 18 Dec 2001 07:08:54 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/f90-problems-parallelization/m-p/2632031#M726375</guid>
      <dc:creator>Hao Yu</dc:creator>
      <dc:date>2001-12-18T07:08:54Z</dc:date>
    </item>
  </channel>
</rss>

