BladeSystem - General
1752781 Members
5908 Online
108789 Solutions
New Discussion

Poor Infiniband performance on HP blade system.

 
ciopsa
Occasional Contributor

Poor Infiniband performance on HP blade system.

Hello all,
we are experiencing very poor infiniband performance on our small HP-blade server cluster.

 

The cluster is composed by a blade system with 8 blades + 3 additional external nodes (1 master node + 2 iosrv which provides a lustre FS over infiniband network)

 

HP blades are BL685C (4-socket motherboard with  4 AMD processors 6376)

while external nodes are DL385P gen 8  systems.

 

Infiniband components are the following:
Infiniband cards are QLogic 4X QDR IB PCI-e G2 HCA (HP part number:583211-B21)
Infiniband switch is BLc QLogic 4X QDR IB Switch (HP part number:505958-B21)

 

We are running Centos 6.4 Linux distribution on all the machines.
Blades did not like the standard kernel coming with centos6.4:on such machine infiniband card was detected  but "Physical state" was DOWN.

We therefore downgraded to kernel 2.6.32-220.el6.x86_64, as it satisfies official compatibility of IntelIB 7.2 (latest version, released on september 2013) as provided by Intel web site.

We therefore installed on all blades such package and infinband card is now correctly detected and physical state is active.

 

With such a configuration MPI performance obtained by openmpi version 1.6.7 compiled against psm libraries provided by IntelIB7.2 package are satisfactory.

 

We reach 2.6/2.7 gigabyte/sec on the standard Intel MPI benchmark: also latency is ok ( ~ 2 microseconds)

 

Poor performance are actually using RDMA protocol (measured by means of ib_read/ib_write tools) which actually uses ibverbs protocol. 

Performance are here really bad: ib_read_bw reports something between 700Mb and 800 Mb/sec. This hinders all the performance we can get out of our lustre FS.
We indeed measure awful lnet performance of our lustre installation are reported below:

- between the two I/O servers we got about 400MB/s;
- between two blades we just got about 280MB/s.

 

We tried some tuning as indicated in the True Scale documentation but so fat we did not see any significant improvements.

 

What we did iwas  the following:

 

at BIOS level we set:
- gen2 force for the PCI-express slot
- no-cstate

 

at boot time we load the ib_qib module with the following options:
"options ib_qib singleport=1 pcie_caps=0x51 cache_bypass_copy=1 ibmtu=5"

 

There are several  questions we are posing:

 

1. Are we missing something fundamental to get acceptable ibverbs performance over Intel true scale software stack ?

2. Are there any special tricks to improve ibverbs (as reported byib_read_bw/ib_write_bw tools ) performance on AMD multisocket based motherboard we are using ?

 

3. Is there any way to know/upgrade the HCA firmware on the infiniband card ? We tried to understand how to upgrade the firmware of the HCAs and we are lost as ibv_devinfo reports 0.0.0 as fw version, and the "iba_hca_firmware_tool" says that there's no need to upgrade the firwmare, but looking at the code, we actually discovered that it doesn't support anything other than mellanox cards (as it uses mstflint)

 

4.Is there anybody out-there able to measure decent performance on similar installation ? (i.e. Qlogic card and AMD cpus ?)

 

thanks in advance for all info you can provide.


Stefano