<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Slow BL645c node through Infiniband in BladeSystem - General</title>
    <link>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177879#M15709</link>
    <description>Good point about interrupts.  Since you're already using collectl it's easy enough to look at the interrupt distribution by cpu by adding 'j' to your -s switch.  While you're t it you might also consider using -sC which will break out the CPU load by individual CPU and also show the type of load.  Perhaps one system is spending more time processing interrupts than the other?  or maybe they're not being distributed across the multiple ones.&lt;BR /&gt;-mark&lt;BR /&gt;</description>
    <pubDate>Fri, 29 May 2009 15:21:04 GMT</pubDate>
    <dc:creator>MarkSeger</dc:creator>
    <dc:date>2009-05-29T15:21:04Z</dc:date>
    <item>
      <title>Slow BL645c node through Infiniband</title>
      <link>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177874#M15704</link>
      <description>&lt;!--!*#--&gt;We have a blade with 16 BL465c nodes connected through a Voltaire switch.&lt;BR /&gt;One of them had the motherboard replaced and now we noted it runs slower than the other nodes, around 20-25% slower, but only in multinode jobs.&lt;BR /&gt;So we started experimenting:&lt;BR /&gt;-  With jobs that run inside the node, it behaves exactly as the other nodes.&lt;BR /&gt;- iLO &amp;amp; BIOS are the same version.&lt;BR /&gt;- We swapped Mezzanine cards between "the node" and a "normal" and the problems stays at the node.&lt;BR /&gt;- We swapped bays to see if the problem was at the enclosure, but the problem stays at the node too.&lt;BR /&gt;&lt;BR /&gt;So it looks the problem is in the motherboard, someway, or the connection to the Mezzanine card. We've compared the transmission though Infiniband with collectl and it shows  that drop of 20% in the transmission:&lt;BR /&gt;&lt;BR /&gt;#HCA    KBIn   PktIn  SizeIn   KBOut  PktOut SizeOut  Errors&lt;BR /&gt;   0   23730   15848       1   23689   15835       1       0&lt;BR /&gt;   0   23487   15701       1   23522   15718       1       0&lt;BR /&gt;   0   24021   16014       1   23915   15972       1       0&lt;BR /&gt;   0   24138   16078       1   23911   15976       1       0&lt;BR /&gt;   0   23455   15674       1   23556   15726       1       0&lt;BR /&gt;   0   19665   13283       1   19924   13416       1       0&lt;BR /&gt;   0   13640    9066       1   13605    9052       1       0&lt;BR /&gt;   0   19898   13396       1   20180   13548       1       0&lt;BR /&gt;   0   22485   14947       1   22299   14860       1       0&lt;BR /&gt;   0   23293   15452       1   22870   15271       1       0&lt;BR /&gt;   0   23500   15704       1   23599   15761       1       0&lt;BR /&gt;   0   18080   12097       1   18237   12178       1       0&lt;BR /&gt;   0   23663   15778       1   23426   15685       1       0&lt;BR /&gt;   0   23695   15811       1   23611   15787       1       0&lt;BR /&gt;   0   26494   17822       1   27162   18156       1       0&lt;BR /&gt;   0   26702   17705       1   26042   17408       1       0&lt;BR /&gt;   0   23845   15907       1   23781   15894       1       0&lt;BR /&gt;   0   23065   15486       1   23457   15675       1       0&lt;BR /&gt;   0   22316   14173       1   17785   11985       1       0&lt;BR /&gt;   0   23777   15917       1   24177   16127       1       0&lt;BR /&gt;   0   24563   16537       1   24729   16640       1       0&lt;BR /&gt;   0   22804   15079       1   22585   15004       1       0&lt;BR /&gt;&lt;BR /&gt;   yei20&lt;BR /&gt;   0   29449   19836       1   30230   20180       1       0&lt;BR /&gt;   0   34207   22994       1   34893   23287       1       0&lt;BR /&gt;   0   29751   19953       1   30065   20080       1       0&lt;BR /&gt;   0   26190   17760       1   27606   18423       1       0&lt;BR /&gt;   0   31813   21245       1   31725   21204       1       0&lt;BR /&gt;   0   29496   19792       1   30044   20061       1       0&lt;BR /&gt;   0   34131   22993       1   35194   23510       1       0&lt;BR /&gt;   0   14868    9954       1   14968   10002       1       0&lt;BR /&gt;   0   23639   15904       1   24293   16214       1       0&lt;BR /&gt;   0   34303   23078       1   35023   23427       1       0&lt;BR /&gt;   0   29756   19949       1   30067   20107       1       0&lt;BR /&gt;   0   30751   20801       1   31788   21302       1       0&lt;BR /&gt;   0   32752   21921       1   33397   22228       1       0&lt;BR /&gt;   0   27596   17866       1   23892   16047       1       0&lt;BR /&gt;   0   29323   19719       1   30084   20077       1       0&lt;BR /&gt;   0   34359   23108       1   35005   23418       1       0&lt;BR /&gt;   0   29810   19971       1   30099   20110       1       0&lt;BR /&gt;   0   30319   20550       1   31879   21293       1       0&lt;BR /&gt;   0   27610   18439       1   27461   18351       1       0&lt;BR /&gt;   0   29517   19839       1   29991   20051       1       0&lt;BR /&gt;   0   31908   21608       1   33411   22315       1       0&lt;BR /&gt;   0   31921   21336       1   31776   21253       1       0&lt;BR /&gt;&lt;BR /&gt;Has anyone experienced anything like this? Solved it? Any hint? Any way to check it deeper? Or should I open a case? &lt;BR /&gt;&lt;BR /&gt;Thanks in advance</description>
      <pubDate>Wed, 27 May 2009 12:48:59 GMT</pubDate>
      <guid>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177874#M15704</guid>
      <dc:creator>Ángel Gutiérrez Rodrígu</dc:creator>
      <dc:date>2009-05-27T12:48:59Z</dc:date>
    </item>
    <item>
      <title>Re: Slow BL645c node through Infiniband</title>
      <link>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177875#M15705</link>
      <description>Were there perhaps some BIOS tweaks lost when the motherboard was swapped?</description>
      <pubDate>Wed, 27 May 2009 23:35:53 GMT</pubDate>
      <guid>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177875#M15705</guid>
      <dc:creator>rick jones</dc:creator>
      <dc:date>2009-05-27T23:35:53Z</dc:date>
    </item>
    <item>
      <title>Re: Slow BL645c node through Infiniband</title>
      <link>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177876#M15706</link>
      <description>Oh, yes! We've already checked it too. As far as we've were able to check both BIOS as equal and have the same setup too.</description>
      <pubDate>Thu, 28 May 2009 08:13:15 GMT</pubDate>
      <guid>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177876#M15706</guid>
      <dc:creator>Ángel Gutiérrez Rodrígu</dc:creator>
      <dc:date>2009-05-28T08:13:15Z</dc:date>
    </item>
    <item>
      <title>Re: Slow BL645c node through Infiniband</title>
      <link>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177877#M15707</link>
      <description>Then unless other, more fruitful suggestions are forthcoming here, you should go ahead and excercise your support contract(s) and open a case.</description>
      <pubDate>Thu, 28 May 2009 15:47:46 GMT</pubDate>
      <guid>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177877#M15707</guid>
      <dc:creator>rick jones</dc:creator>
      <dc:date>2009-05-28T15:47:46Z</dc:date>
    </item>
    <item>
      <title>Re: Slow BL645c node through Infiniband</title>
      <link>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177878#M15708</link>
      <description>This is IPoIB yes?  Are the interrupt assignments the same between a fast system and the now slow one?</description>
      <pubDate>Thu, 28 May 2009 15:48:44 GMT</pubDate>
      <guid>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177878#M15708</guid>
      <dc:creator>rick jones</dc:creator>
      <dc:date>2009-05-28T15:48:44Z</dc:date>
    </item>
    <item>
      <title>Re: Slow BL645c node through Infiniband</title>
      <link>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177879#M15709</link>
      <description>Good point about interrupts.  Since you're already using collectl it's easy enough to look at the interrupt distribution by cpu by adding 'j' to your -s switch.  While you're t it you might also consider using -sC which will break out the CPU load by individual CPU and also show the type of load.  Perhaps one system is spending more time processing interrupts than the other?  or maybe they're not being distributed across the multiple ones.&lt;BR /&gt;-mark&lt;BR /&gt;</description>
      <pubDate>Fri, 29 May 2009 15:21:04 GMT</pubDate>
      <guid>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177879#M15709</guid>
      <dc:creator>MarkSeger</dc:creator>
      <dc:date>2009-05-29T15:21:04Z</dc:date>
    </item>
    <item>
      <title>Re: Slow BL645c node through Infiniband</title>
      <link>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177880#M15710</link>
      <description>OK! My (our) bad. after checking all what was said here we came back to the BIOS again and we saw the in thw first tab there was a discrepancy. &lt;BR /&gt;&lt;BR /&gt;I can't remember now the exact tab or name, but it was in teh 1st tab and it was about some kind of power management option. I don't know exactly what it did , but we changed it to something like "OS controlled" and now it looks it is running again at the same speed as the other nodes.&lt;BR /&gt;&lt;BR /&gt;Thank you all!</description>
      <pubDate>Mon, 15 Jun 2009 17:00:39 GMT</pubDate>
      <guid>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177880#M15710</guid>
      <dc:creator>Ángel Gutiérrez Rodrígu</dc:creator>
      <dc:date>2009-06-15T17:00:39Z</dc:date>
    </item>
    <item>
      <title>Re: Slow BL645c node through Infiniband</title>
      <link>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177881#M15711</link>
      <description>Opps, explained above...</description>
      <pubDate>Mon, 15 Jun 2009 17:03:55 GMT</pubDate>
      <guid>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177881#M15711</guid>
      <dc:creator>Ángel Gutiérrez Rodrígu</dc:creator>
      <dc:date>2009-06-15T17:03:55Z</dc:date>
    </item>
    <item>
      <title>Re: Slow BL645c node through Infiniband</title>
      <link>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177882#M15712</link>
      <description>Making the not-always-true assumption that iLO is iLO, looking at th eiLO on a DL785 suggests that it would be the "Power Management" tab at the top followed by "Settings" on the left.  IIRC the default is "HP Dynamic Power Savings Mode" and often when one is in a "Damn the Watts! Full Speed Ahead!" mindset :) it can be set to "HP Static High Performance Mode"  or in this case, when the OS is smart enough into "OS Control Mode"&lt;BR /&gt;&lt;BR /&gt;This all controls who/how processor "p-states" (the "Processor States" selection on the left of the "Power Management" page get set.  P0 means highest performance and highest power consumption, P3 means lowest performance and lowest power consumption.&lt;BR /&gt;&lt;BR /&gt;My experience as an end-user is that Dynamic Power Savings Mode is when the BIOS takes its best guess as to what mode should be selected for each core, and it will go between P0 and P3 making no stops at either P1 or P2.  Here and there under OS Control Mode I've seen cores in all four p-states.  In Static High Performance mode they are locked into P0 state.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 15 Jun 2009 17:17:45 GMT</pubDate>
      <guid>https://community.hpe.com/t5/bladesystem-general/slow-bl645c-node-through-infiniband/m-p/5177882#M15712</guid>
      <dc:creator>rick jones</dc:creator>
      <dc:date>2009-06-15T17:17:45Z</dc:date>
    </item>
  </channel>
</rss>

