Servers - General

DL380 G10/G9 Low I/O performance (Disk R/W bottleneck)

 
kartong22
Occasional Collector

DL380 G10/G9 Low I/O performance (Disk R/W bottleneck)

hello, I have a problem with two physical servers (specification below). Both servers have the same system and kernel, the hardware is different and in both cases the problem is the same (bottleneck when writing / reading from disk).

I ran tests with iperf3
https://fasterdata.es.net/performance-testing/network-troubleshooting-tools/iperf/disk-testing-using-iperf/
And the data transfer rate dropped from 1GB / s to 200-300MB / s for the file upload/download test (if it is not in the memory, more information about the test results https://pastebin.ubuntu.com/p/Z45y5zPqPs/ ).

I returned the contents of /proc/interrupts and saw that on the DL380 G10 server the smartpqi only handles one interrupt line on CPU#0. On the other hand, network adapters work only on CPUs from NUMA_NODE 0 sector.
irq


In the case of DL380 G9, hpsa is already better at distributing interrupts, but it does not look very good either. It is the same with the network adapters as with the G10.
irq g9

The lshw command shows all the information with the tag "UNCLAIMED"

Anybody have idea what's is configured wrong, and how resolve the bottlenecks problem?
Thanks


Operating System: Red Hat Enterprise Linux
CPE OS Name: cpe:/o:redhat:enterprise_linux:7.7:GA:server
Kernel: Linux 3.10.0-1062.26.1.el7.x86_64
Architecture: x86-64


DL380 G10:
-2x HPE DL380 Gen10 6252 Xeon-G 2.1GHz 24 Core Processor Kit,
-6x HPE 128GB 8Rx4 PC4-2933Y-L 3DS Smart Kit,
-2x HPE 900GB SAS 12G 15k SFF SC DS HDD,
-6x HPE 3.2TB SAS MU SFF DS SSD,
-1x HPE Smart Array P408i-a SR Gen10 Controller,
-1x HPE DL38X 12G SAS Expander,
-2x HPE Ethernet 10Gb 2-port 562FLR-SFP+Adpter

 

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 88
On-line CPU(s) list: 0-87
Thread(s) per core: 2
Core(s) per socket: 22
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz
Stepping: 7
CPU MHz: 3001.574
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4200.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-21,44-65
NUMA node1 CPU(s): 22-43,66-87
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_pkg_req pku ospke avx512_vnni md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities


DL380 G9:
-2x HPE DL380 Gen9 E5-2680v4, 2.4GHz 14 Core Processor,
-12x HPE 32GB 2Rx4 PC4-2400T-R Memory Kit,
-2x HPE 900GB 12G SAS 15K SFF SC DS HDD,
-10x HPE 1.92TB SATA 6G MU SFF SC DS SSD,
-1x HPE Smart Array P440ar/2G FIO Controller,
-1x HPE 12Gb DL380 Gen9 SAS Expander Card,
-1x HPE Ethernet 10Gb 2P 560FLR-SFP+ Adapter,
-1x HPE Ethernet 10Gb 2P 560SFP+ Adapter,

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 56
On-line CPU(s) list: 0-55
Thread(s) per core: 2
Core(s) per socket: 14
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
Stepping: 1
CPU MHz: 2400.000
CPU max MHz: 2400.0000
CPU min MHz: 1200.0000
BogoMIPS: 4794.48
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 35840K
NUMA node0 CPU(s): 0-13,28-41
NUMA node1 CPU(s): 14-27,42-55
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d