- Integrated Systems
- About Us
- Integrated Systems
- About Us
01-12-2015 02:08 PM
HP ProLiant Gen9 & NUMA support
A question from Rob:
I can’t seem to find any documentation on Gen9’s that indicate support for NUMA in Gen9.
I find it on prior generations.
Do we not support NUMA on our Gen9 servers ?
Reply from Kai:
NUMA stands for Non-Uniform Memory Access. So basically any system in which the processor cannot access the whole memory address in a uniformed path/speed shall theoretically be categorized as NUMA system.
In those E5 v1/v2/v3 processors Intel has integrated memory controller in to the same die. Accessing the memory connected to CPU2 from CPU1 need to go through the QPI link between the two processors, and vice versa. The latency/speed is of course different from accessing its locally attached DIMMs. So it is NUMA. I guess it is quite common now so there’s no much words about it in Gen9.
Input from Duncan:
If you dig into the UEFI documentation for Gen9 you will find options to enable/disable node interleaving – enabling interleaving means memory is shared out between all NUMA nodes in the system… it is usually turned off so that the memory in each NUMA node is local. Most modern OS are NUMA aware, so you leave interleaving disabled, but there may be cases where you want it turned on.
Point is – node interleaving is a way of configuring the memory in a NUMA system, and is present in Gen9 UEFI as an option – ergo, we do support NUMA – as others have said, all Intel/AMD server processors are NUMA now, so it generally doesn’t get mentioned – but it’s there.
And from Dan:
Ivy Bridge 12c and now Haswell make this a little bit more complicated as well because of the Dual Ring design and the placement of the memory controllers. There isn’t anything your customer needs to do about it per se right now, but here is the technical background on what I mean.
Warning, geeky content ahead.
Sandy Bridge had a single Die design whether you had 2 cores or 8. Some cores were just disabled in HW if you didn’t have an 8 core model.
Ivy and Haswell actually have 3 different Die designs depending on how many Cores you have.
In the 12 core Ivy and the 10+ core Haswell designs, the cores are on separate internal Ring buses, with each main ring bus owning 2 of the 4 memory channels as well.
In Haswell there is some Intel function (Cluster on Die if I remember right) that allows the CPU to cache data from other rings to give the absolute lowest latency to the app.
The Snooping mode is supposed to be BIOS configurable, but I haven’t played with Gen9 enough to know if we’ve exposed that or not.
My point in all this is simply, with a 12c Gen9 CPU, a single thread on a single CPU might be pulling memory from 2 different rings within the same die and the remote ring will have just a hair of extra latency unless the data is already in the Snoop buffer.
There is also an Intel Errata (CPU Bug) that has already been published that says when in CoD mode, your 2 x 12core Haswell procs might show up in the OS as 4 x 6 core instead. This allows the OS to understand the separation happening under the hood of each CPU better, but for obvious reasons absolutely kills any per socket licensing (vSphere).
If you want to see more on the Ring bus design, here are some articles that spell it out nicely: