- Community Home
- >
- Servers and Operating Systems
- >
- HPE BladeSystem
- >
- BladeSystem - General
- >
- Re: Slow BL645c node through Infiniband
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-27-2009 05:48 AM
тАО05-27-2009 05:48 AM
Slow BL645c node through Infiniband
One of them had the motherboard replaced and now we noted it runs slower than the other nodes, around 20-25% slower, but only in multinode jobs.
So we started experimenting:
- With jobs that run inside the node, it behaves exactly as the other nodes.
- iLO & BIOS are the same version.
- We swapped Mezzanine cards between "the node" and a "normal" and the problems stays at the node.
- We swapped bays to see if the problem was at the enclosure, but the problem stays at the node too.
So it looks the problem is in the motherboard, someway, or the connection to the Mezzanine card. We've compared the transmission though Infiniband with collectl and it shows that drop of 20% in the transmission:
#HCA KBIn PktIn SizeIn KBOut PktOut SizeOut Errors
0 23730 15848 1 23689 15835 1 0
0 23487 15701 1 23522 15718 1 0
0 24021 16014 1 23915 15972 1 0
0 24138 16078 1 23911 15976 1 0
0 23455 15674 1 23556 15726 1 0
0 19665 13283 1 19924 13416 1 0
0 13640 9066 1 13605 9052 1 0
0 19898 13396 1 20180 13548 1 0
0 22485 14947 1 22299 14860 1 0
0 23293 15452 1 22870 15271 1 0
0 23500 15704 1 23599 15761 1 0
0 18080 12097 1 18237 12178 1 0
0 23663 15778 1 23426 15685 1 0
0 23695 15811 1 23611 15787 1 0
0 26494 17822 1 27162 18156 1 0
0 26702 17705 1 26042 17408 1 0
0 23845 15907 1 23781 15894 1 0
0 23065 15486 1 23457 15675 1 0
0 22316 14173 1 17785 11985 1 0
0 23777 15917 1 24177 16127 1 0
0 24563 16537 1 24729 16640 1 0
0 22804 15079 1 22585 15004 1 0
yei20
0 29449 19836 1 30230 20180 1 0
0 34207 22994 1 34893 23287 1 0
0 29751 19953 1 30065 20080 1 0
0 26190 17760 1 27606 18423 1 0
0 31813 21245 1 31725 21204 1 0
0 29496 19792 1 30044 20061 1 0
0 34131 22993 1 35194 23510 1 0
0 14868 9954 1 14968 10002 1 0
0 23639 15904 1 24293 16214 1 0
0 34303 23078 1 35023 23427 1 0
0 29756 19949 1 30067 20107 1 0
0 30751 20801 1 31788 21302 1 0
0 32752 21921 1 33397 22228 1 0
0 27596 17866 1 23892 16047 1 0
0 29323 19719 1 30084 20077 1 0
0 34359 23108 1 35005 23418 1 0
0 29810 19971 1 30099 20110 1 0
0 30319 20550 1 31879 21293 1 0
0 27610 18439 1 27461 18351 1 0
0 29517 19839 1 29991 20051 1 0
0 31908 21608 1 33411 22315 1 0
0 31921 21336 1 31776 21253 1 0
Has anyone experienced anything like this? Solved it? Any hint? Any way to check it deeper? Or should I open a case?
Thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-27-2009 04:35 PM
тАО05-27-2009 04:35 PM
Re: Slow BL645c node through Infiniband
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-28-2009 01:13 AM
тАО05-28-2009 01:13 AM
Re: Slow BL645c node through Infiniband
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-28-2009 08:47 AM
тАО05-28-2009 08:47 AM
Re: Slow BL645c node through Infiniband
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-28-2009 08:48 AM
тАО05-28-2009 08:48 AM
Re: Slow BL645c node through Infiniband
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-29-2009 08:21 AM
тАО05-29-2009 08:21 AM
Re: Slow BL645c node through Infiniband
-mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-15-2009 10:00 AM
тАО06-15-2009 10:00 AM
Re: Slow BL645c node through Infiniband
I can't remember now the exact tab or name, but it was in teh 1st tab and it was about some kind of power management option. I don't know exactly what it did , but we changed it to something like "OS controlled" and now it looks it is running again at the same speed as the other nodes.
Thank you all!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-15-2009 10:03 AM
тАО06-15-2009 10:03 AM
Re: Slow BL645c node through Infiniband
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-15-2009 10:17 AM
тАО06-15-2009 10:17 AM
Re: Slow BL645c node through Infiniband
This all controls who/how processor "p-states" (the "Processor States" selection on the left of the "Power Management" page get set. P0 means highest performance and highest power consumption, P3 means lowest performance and lowest power consumption.
My experience as an end-user is that Dynamic Power Savings Mode is when the BIOS takes its best guess as to what mode should be selected for each core, and it will go between P0 and P3 making no stops at either P1 or P2. Here and there under OS Control Mode I've seen cores in all four p-states. In Static High Performance mode they are locked into P0 state.