<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Performance slower on RX8640 then Blade BL860c and rx2620 in Operating System - HP-UX</title>
    <link>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211275#M650052</link>
    <description>As this problem hit 2 production servers running Oracle 10G RAC. We had Oracle and HP-Mission critical support scratching their heads. &lt;BR /&gt;So we organised to replicate the problem at DR ( identical HW, adding 4th cell board to rx8640). With 3 cell boards, oracle ran fine. With 4 cell boards, the RAC slowed enormoursly and CPUs hit 100%.&lt;BR /&gt;After much testing we found the huge load is during the "sqlplus as /" connection, even before an sql is run ( we had to run 100 in parallel to see the problem). &lt;BR /&gt;Finally Oracle identified Bug 9205576: CONNECTION TAKES MORE TIME WITH PRE_PAGE_SGA=TRUE IN 4 CELL COMPARED TO 3 CELLS. The SGA was fully scanned on each sqlplus connection. Poblem is solved by changing pre_page_sga to false.&lt;BR /&gt;&lt;BR /&gt;Thanks to all for taking the time to respond to my question.</description>
    <pubDate>Wed, 13 Jan 2010 04:01:35 GMT</pubDate>
    <dc:creator>isaac_loven</dc:creator>
    <dc:date>2010-01-13T04:01:35Z</dc:date>
    <item>
      <title>Performance slower on RX8640 then Blade BL860c and rx2620</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211267#M650044</link>
      <description>Hi All. &lt;BR /&gt;we have 2 RX8640 Servers, each with 3 fully populated cell boards ( 24cpu/192Gig in one npar) runing vpars and HPUX 11.23. &lt;BR /&gt;Oracle RAC 10G was running at 60% CPU load. &lt;BR /&gt;To increase capacity HP added a fourth cell board to each RX8640 (took it from a working Dev Rx8640 - same model number). &lt;BR /&gt;&lt;BR /&gt;Servers booted fine, but when users started to use the database, all CPUs hit 100%, and i/o dropped to almost zero. We had to &lt;BR /&gt;&lt;BR /&gt;delete the cell board from the npar to restore normal operation. Has anybody else had this problem?&lt;BR /&gt;&lt;BR /&gt;As these are production servers it is hard to reproduce the error. So we created a test 64bit C program that adds the &lt;BR /&gt;&lt;BR /&gt;contents of 2 large arrays and uses 1 Gig of memory. Average run times are as follows:&lt;BR /&gt;BL870c 10 seconds&lt;BR /&gt;BL860c 12 seconds&lt;BR /&gt;rx2620 12.2 sec&lt;BR /&gt;rx8640 with once cell board 17.2 sec&lt;BR /&gt;rx8640 with once cell board 23.5 sec&lt;BR /&gt;rx8640 with once cell board 28.5 sec&lt;BR /&gt;&lt;BR /&gt;Do you think we have missed something when adding the cell board ( parmodify â  p 0 â  a 3:base:y:ri )? It this a memory &lt;BR /&gt;&lt;BR /&gt;interleaving problem ?&lt;BR /&gt;&lt;BR /&gt;This is the code. I am enclosing the executible. Can anybody else benchmark their servers please. &lt;BR /&gt;&lt;BR /&gt;/* memtest.c */&lt;BR /&gt;&lt;BR /&gt;#define BIG     100000000&lt;BR /&gt;&lt;BR /&gt;int a[BIG],b[BIG],c[BIG];&lt;BR /&gt;&lt;BR /&gt;int main()&lt;BR /&gt;{&lt;BR /&gt;        int i,j;&lt;BR /&gt;        for(j=0;j&amp;lt;10;++j)&lt;BR /&gt;                for(i=0;i&lt;BIG&gt;                        a[i]=b[i]+c[i];&lt;BR /&gt;        return 0;&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;# cc +DD64 -o memtest memtest.c &lt;BR /&gt;# time ./memtest &lt;BR /&gt;Isaac Loven&lt;/BIG&gt;</description>
      <pubDate>Wed, 25 Nov 2009 05:29:27 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211267#M650044</guid>
      <dc:creator>isaac_loven</dc:creator>
      <dc:date>2009-11-25T05:29:27Z</dc:date>
    </item>
    <item>
      <title>Re: Performance slower on RX8640 then Blade BL860c and rx2620</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211268#M650045</link>
      <description>Hi Issac,&lt;BR /&gt;&lt;BR /&gt;Check the firmware revision of the MP. Log to MP and run the command sysrev.&lt;BR /&gt;&lt;BR /&gt;Rgds-Kranti&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 25 Nov 2009 06:21:42 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211268#M650045</guid>
      <dc:creator>Kranti Mahmud</dc:creator>
      <dc:date>2009-11-25T06:21:42Z</dc:date>
    </item>
    <item>
      <title>Re: Performance slower on RX8640 then Blade BL860c and rx2620</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211269#M650046</link>
      <description>It is only slow if the cell is added?</description>
      <pubDate>Wed, 25 Nov 2009 06:26:02 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211269#M650046</guid>
      <dc:creator>Torsten.</dc:creator>
      <dc:date>2009-11-25T06:26:02Z</dc:date>
    </item>
    <item>
      <title>Re: Performance slower on RX8640 then Blade BL860c and rx2620</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211270#M650047</link>
      <description>&amp;gt; It this a memory interleaving problem?&lt;BR /&gt;&lt;BR /&gt;Yes, most likely.  Though "all CPUs hit 100%, and I/O dropped to almost zero" seem rather extreme.&lt;BR /&gt;&lt;BR /&gt;Those blade and non-cell based systems can run rings around a cell based system if you don't get interleaving right.</description>
      <pubDate>Wed, 25 Nov 2009 08:28:13 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211270#M650047</guid>
      <dc:creator>Dennis Handly</dc:creator>
      <dc:date>2009-11-25T08:28:13Z</dc:date>
    </item>
    <item>
      <title>Re: Performance slower on RX8640 then Blade BL860c and rx2620</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211271#M650048</link>
      <description>The DB problem is one thing [and I can only make wild guesses given we have no real data - so I won't], but as far as the raw time issue... this isn't interleave being wrong, that's how interleave _works_.&lt;BR /&gt;&lt;BR /&gt;Your program has no threads and I assume you run it such that only a single instance is instantiated at a time (multiple runs for the average, but as multiple executions sequentially, not a massive parallel execution). [I'd also comment that your program uses 100,000,000 * 3 64-bit integers, so you're really looking at 24 * 100,000,000 bytes or 2.235Gb [not 1Gb]].&lt;BR /&gt;&lt;BR /&gt;So -- what you have is the following scenarios.&lt;BR /&gt;&lt;BR /&gt;Assume a 1 to 4 cell nPartition.&lt;BR /&gt;&lt;BR /&gt;The cost of accessing memory in the same cell as you are running is X.&lt;BR /&gt;&lt;BR /&gt;The cost of accessing memory in an adjacent cell is Y (where X &amp;lt; Y). &lt;BR /&gt;&lt;BR /&gt;For a single cell rx8640 -- all accesses are X.&lt;BR /&gt;&lt;BR /&gt;When you add a cell, if all memory is configured to be interleaved, your private object has a 50% chance of getting a cache line in the same cell and a 50% chance of getting a remote cell (assuming balanced ILV, each cell contributes equal cache lines). Hence your application accesses at .5*X + .5*Y (which since Y is strictly greater than X is greater than X).&lt;BR /&gt;&lt;BR /&gt;When you add _another_ cell, your accesses become .33*X + .66*Y [approx, 1/3 and 2/3 really].&lt;BR /&gt;&lt;BR /&gt;And when you add the fourth cell, your accesses become .25*X + .75*Y.&lt;BR /&gt;&lt;BR /&gt;As you can tell -- you approach Y instead of X (and if you went 8 cell this actually gets more interesting in that usually there's a higher cost for some cells rather than others). Hence why you see your output climbing with additional cells (that's how I interpret your "with once cell board" since otherwise, I'm not sure what you're saying).&lt;BR /&gt;&lt;BR /&gt;Now if your application were moving processor context such that your accesses were also across the machine, you'd have a better chance of any given access being local -- this is what ILV is meant for, objects shared across the entire platform.&lt;BR /&gt;&lt;BR /&gt;What you would want for your application to perform here is Cell Local memory (you'd configure each cell to only give 75% [or 50%, etc.]) to the interleave). Then all accesses in your program would stay X regardless of how many cells were in the system (assuming sufficient CLM in the cell the program is executing in, of course).&lt;BR /&gt;&lt;BR /&gt;With 64Gb per cell (and a need for 2.235Gb for your program) configuring each cell to have 1/8th of memory cell local would probably be enough [assuming little else running to steal your CLM in the given execution context] to see more performant behavior. Since Oracle does work with a large shared memory set which is typically accessed from everywhere in the partition -- you certainly want to leave significant ILV configured, but you may want to consider reducing ILV. At a bare minimum you'd want to have enough for the SGA, your SYS memory load (since this is v2) and a reasonable extra for things like binaries, shared libraries, etc. Having some CLM available will help the process-private data accesses.</description>
      <pubDate>Wed, 25 Nov 2009 11:52:41 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211271#M650048</guid>
      <dc:creator>Don Morris_1</dc:creator>
      <dc:date>2009-11-25T11:52:41Z</dc:date>
    </item>
    <item>
      <title>Re: Performance slower on RX8640 then Blade BL860c and rx2620</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211272#M650049</link>
      <description>(No points on this, just a clarification).. sometimes I need to drink coffee first. 64-bit declaration of "int" is 4 bytes, of course -- not 8. One would think I'd remember that but my brain jumped to "64-bit.... int" and glued them together as "64-bit integer" (long).&lt;BR /&gt;&lt;BR /&gt;So yes, 1Gb -- or close enough. Kind of irrelevant, but worth precluding a post where you have to tell me I screwed up.</description>
      <pubDate>Wed, 25 Nov 2009 14:19:31 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211272#M650049</guid>
      <dc:creator>Don Morris_1</dc:creator>
      <dc:date>2009-11-25T14:19:31Z</dc:date>
    </item>
    <item>
      <title>Re: Performance slower on RX8640 then Blade BL860c and rx2620</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211273#M650050</link>
      <description>Nice explanation Don. But with vPars in use (configuration details???) all this become even a bit more complicated. So there isn't a solution possible without knowing all the details.</description>
      <pubDate>Wed, 25 Nov 2009 14:23:55 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211273#M650050</guid>
      <dc:creator>Torsten.</dc:creator>
      <dc:date>2009-11-25T14:23:55Z</dc:date>
    </item>
    <item>
      <title>Re: Performance slower on RX8640 then Blade BL860c and rx2620</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211274#M650051</link>
      <description>Ah... good point, missed the "and vpars" in there since all the talk was about nPar operations.&lt;BR /&gt;&lt;BR /&gt;Well then, the big thing to check would be that your cpu and memory align reasonably in the vPar.&lt;BR /&gt;&lt;BR /&gt;Most especially, if you have a vPar with only a cell's (or sub-cell or really, really close to a cell) worth of processors and the I/O hubs are off of the same cell -- it would be worth your while to configure CLM such that the vPar can have either a little ILV and almost all memory as CLM. Effectively, you want a vPar on a multi-cell nPar to look like the smallest number of cells as possible if you want performance.&lt;BR /&gt;&lt;BR /&gt;And with the caveat that this isn't official "This is supported" doctrine -- I configure IPF vPars with no ILV all the time. I swear every time folks ask this there's some firmware or vPar reason to keep some ILV around -- but in my opinion it is worth it if you have a vPar running in a single or sub-cell to be 100% local to that cell. And hence, it is worth a try to see if you can configure a vPar that way (since if the vPar won't load, you can just add some ILV [keep the nPar with some] and reload the vPar).&lt;BR /&gt;&lt;BR /&gt;Any vPars you have which require more than a cell or two of resources (say you use 3 vPars, 2 fit in less than a cell apiece [maybe the same cell] and the other requires the other 3 cells), you can plan your nPar for it in a way that's good for the vPars but not the usual pattern for nPar mode. (In this case, you could configure the nPar with only 3 cells contributing to the ILV by 50%, and the 4th cell 100% CLM and place the two sub-cell vPars in the 4th cell with the multi-cell vPar getting all the ILV and resources of the other 3 cells).</description>
      <pubDate>Wed, 25 Nov 2009 16:42:42 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211274#M650051</guid>
      <dc:creator>Don Morris_1</dc:creator>
      <dc:date>2009-11-25T16:42:42Z</dc:date>
    </item>
    <item>
      <title>Re: Performance slower on RX8640 then Blade BL860c and rx2620</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211275#M650052</link>
      <description>As this problem hit 2 production servers running Oracle 10G RAC. We had Oracle and HP-Mission critical support scratching their heads. &lt;BR /&gt;So we organised to replicate the problem at DR ( identical HW, adding 4th cell board to rx8640). With 3 cell boards, oracle ran fine. With 4 cell boards, the RAC slowed enormoursly and CPUs hit 100%.&lt;BR /&gt;After much testing we found the huge load is during the "sqlplus as /" connection, even before an sql is run ( we had to run 100 in parallel to see the problem). &lt;BR /&gt;Finally Oracle identified Bug 9205576: CONNECTION TAKES MORE TIME WITH PRE_PAGE_SGA=TRUE IN 4 CELL COMPARED TO 3 CELLS. The SGA was fully scanned on each sqlplus connection. Poblem is solved by changing pre_page_sga to false.&lt;BR /&gt;&lt;BR /&gt;Thanks to all for taking the time to respond to my question.</description>
      <pubDate>Wed, 13 Jan 2010 04:01:35 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211275#M650052</guid>
      <dc:creator>isaac_loven</dc:creator>
      <dc:date>2010-01-13T04:01:35Z</dc:date>
    </item>
    <item>
      <title>Re: Performance slower on RX8640 then Blade BL860c and rx2620</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211276#M650053</link>
      <description>Solution found and 4th cell board added to production successfully.</description>
      <pubDate>Wed, 13 Jan 2010 04:03:36 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/performance-slower-on-rx8640-then-blade-bl860c-and-rx2620/m-p/5211276#M650053</guid>
      <dc:creator>isaac_loven</dc:creator>
      <dc:date>2010-01-13T04:03:36Z</dc:date>
    </item>
  </channel>
</rss>

