High Performance Computing
1830906 Members
2172 Online
110017 Solutions
New Discussion

Clarification regarding NUMA group size optimization on ProLiant Gen10

 
vitduck
Senior Member

Clarification regarding NUMA group size optimization on ProLiant Gen10

Hello, 

The 'Red Hat Enterprise Linux NUMA support for HPE ProLiant servers' has this excerpt (page #7): 

NUMA group size optimization

Use this option to configure how the System BIOS reports the size of a NUMA node (number of logical processors), which assists the operating system in grouping processors for application use (referred to as Kgroups). The default setting of Clustered provides better performance due to optimizing the resulting groups along the NUMA boundaries. However, some application might not be optimized to take advantage of processors spanning multiple groups. In such cases, selecting the Flat option might be necessary for those applications to utilize more logical processors. 

 The introduction of this option can be traced back to the following bug report on Gen 9 and HPE's subsequent patch note: 

https://stackoverflow.com/questions/28098082/unable-to-use-more-than-one-processor-group-for-my-threads-in-a-c-sharp-app 

https://support.hpe.com/hpesc/public/docDisplay?sp4ts.oid=7271227&docLocale=en_US&docId=emr_na-c04650594

  • Does this option affect performance of HPC applications on Linux with EPYC 7543 ? 

If I understand correctly, this option does not apply to Linux. Unlike Windows, Linux does not require kGroup to support more than 64 logical cores.

This means whether this option is set to 'Flat' or 'Clustred', it does not alter the Linux kernel's perception on number of of logical core. 

  • In case it does affect Linux, how can I verify  such difference in NUMA partition ? 

For instance, numactl produces exactly same output regardless of 'Flat' or 'Clustered' 

available: 8 nodes (0-7)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 128291 MB
node 0 free: 51701 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 129021 MB
node 1 free: 40302 MB
node 2 cpus: 16 17 18 19 20 21 22 23
node 2 size: 129021 MB
node 2 free: 44506 MB
node 3 cpus: 24 25 26 27 28 29 30 31
node 3 size: 128993 MB
node 3 free: 32118 MB
node 4 cpus: 32 33 34 35 36 37 38 39
node 4 size: 129021 MB
node 4 free: 51282 MB
node 5 cpus: 40 41 42 43 44 45 46 47
node 5 size: 129021 MB
node 5 free: 33823 MB
node 6 cpus: 48 49 50 51 52 53 54 55
node 6 size: 129021 MB
node 6 free: 42897 MB
node 7 cpus: 56 57 58 59 60 61 62 63
node 7 size: 129020 MB
node 7 free: 45067 MB
node distances:
node   0   1   2   3   4   5   6   7 
  0:  10  12  12  12  32  32  32  32 
  1:  12  10  12  12  32  32  32  32 
  2:  12  12  10  12  32  32  32  32 
  3:  12  12  12  10  32  32  32  32 
  4:  32  32  32  32  10  12  12  12 
  5:  32  32  32  32  12  10  12  12 
  6:  32  32  32  32  12  12  10  12 
  7:  32  32  32  32  12  12  12  10 ​

 I appreciate if someone can help to clear the confusion about this option. 

Regards. 

5 REPLIES 5
support_s
System Recommended

Query: Clarification regarding NUMA group size optimization on ProLiant Gen10

System recommended content:

1. Configuring and Tuning HP ProLiant Servers for Low-Latency Applications White Paper

 

Please click on "Thumbs Up/Kudo" icon to give a "Kudo".

 

Thank you for being a HPE valuable community member.


Accept or Kudo

ksram
HPE Pro

Re: Clarification regarding NUMA group size optimization on ProLiant Gen10

Hi @vitduck,

 

Thank you for the Post.

 

You can try to run any commands to view the load / usage like "top" / vmstat / any other commands to see if there are any difference.

 

From Hardware end you may not get such tests to verify the above. Checking Linux  / Operating System's portal may help.

 

Thank you
RamKS

 

 

 


I work for HPE.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Accept or Kudo

vitduck
Senior Member

Re: Clarification regarding NUMA group size optimization on ProLiant Gen10

Hi @ksram

Thanks for taking time to answer my question. 

However, tool such as 'top' or 'vmstat' does not provide a way to to verify the effect of this bios option. 

As shown in my opening post, numactl reports exactly same output in both cases-flat or clustered. Benchmarks also shows no meaningful difference. 

This option confuses us since no other vendors have implemented it in their bioses (Dell, Levono). 

We are striving to provide best performance to our users in time critical projects. If possible, we would appreciate a concrete answer.

It is either 'yes' or 'no' and the bios team who implemented this feature in Gen 9 should have a definite answer. 

Once a again, thanks for your time. 

Regards. 

Suman_1978
HPE Pro

Re: Clarification regarding NUMA group size optimization on ProLiant Gen10

Hi,

As far as I know, kgroups is in Windows.  I dont know if there are any specific Linux distributions that support this.

Regarding AMD EPYC 7543, should be a 3rd Gen processor.  Here is some info on NUMA for this processor.
https://support.hpe.com/hpesc/public/docDisplay?docId=sd00002435en_us&page=GUID-D348723B-00F6-49F0-A791-C484A05DAB25.html

https://support.hpe.com/hpesc/public/docDisplay?docLocale=en_US&docId=a00019612en_us

Thank You!
I work with HPE but opinions expressed here are mine.
HPE Tech Tips videos on How To and Troubleshooting topics



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
vitduck
Senior Member

Re: Clarification regarding NUMA group size optimization on ProLiant Gen10

@Suman_1978 

We simply want to confirm with HPE that this option is only for Windows.

I will report to upper management that the "NUMA Group Size Option" has no effect on Gen 10 running Linux.

Thanks for your reply.

Regards.