- Integrated Systems
- About Us
- Integrated Systems
- About Us
HPE AMD Rome DL3x5 Gen10 Plus Reinforcements Hit The Field
In my last blog, Roma Victor, I covered how AMD has emerged ascendant with its "Rome" EPYC 7002 series server processors. Here at HPE we enabled "Rome" immediately into our shipping AMD platforms, taking advantage of socket compatibility without need for enterprise customers to requalify systems.
With that immediate strengthening of our portfolio and a ton of benchmark world records we could have sat back and rested on our laurels. Sadly, for the competition, this was only the vanguard and the legions are still marching with the very first HPE Gen10 Plus servers hitting the field, bringing PCIe Gen4 to mainstream ProLiant for the first time.
PCIe Gen4 is enabled via the AMD EPYC "Rome" processor but is not something that can be retrofitted into PCIe Gen3 platforms (for electrical signal integrity reasons) thus whilst HPE AMD Gen10 servers can get full benefit of Rome's increased and improved cores - they cannot leverage the faster I/O capabilities, that is where Gen10 Plus comes in.
For applications requiring maximum bandwidth, typically NVMe storage based solutions linked to very fast networking or GPUs, then PCIe Gen4 is ideal. I always like a proof point to such effect and its telling that Cray Clusterstor E1000 "Zero Bottleneck" controllers leverage AMD EPYC "Rome" to support 80GBs read / 60GBs write capability per controller. Cray is renowned as the HPC market leader and if its good enough for them, well I'd happily say that if you are designing any high I/O requirement NVMe or GPU solution you are going to want to take a long hard look at HPE AMD Gen10 Plus.
I was really happy to see us move so fast with the Gen10 Plus models and even more so when I realised how we'd ensured the HPE part of the bargain leverages the potential of the faster interconnect. Lets look at how the first two (yep, these Legions aren't done yet...more incoming soon) Gen10 Plus models differ from their Gen10 namesakes.
Firstly, the DL325Gen10 which has had the greatest uplift. Already unique in its single socket, full I/O, value proposition it is now even more differentiated with the addition of PCIe Gen4 and shedloads more storage capacity. This is a 1U server that can support 24 NVMe drives off a single CPU, that's 4 more than a dual socket Xeon Cascade Lake system. The simplest way to look at this is that a 1S DL325Gen10 Plus can now replace and even exceed most of the requirements a dual socket server would have fulfilled just a few months ago. This of course gives huge benefits when working with software licenced per physical CPU along with power, density and infrastructure cost advantages.
Then we have the DL385Gen10 Plus which in my view has a slightly different position with the launch of the DL325Gen10 Plus. Storage capacities are now closer between the 1S and 2S platforms meaning that in many case one can now chose based on core count and memory capacity rather than perhaps having to go 2S when 1S would do computationally. Where the DL385Gen10 Plus really shines is in the amount of PCIe Gen4 x16 capacity that is available, this is the ideal system for GPU compute (especially as the NVIDIA and AMD PCIe Gen4 accelerators arrive) and 100Gb/s+ networking.
For simplicity I'll compare the HPE Gen10 and Gen10 Plus platforms in a table. I'm not including competitors in there to avoid to much bun throwing but I'll point out that all our NVMe slots are U.3 for maximum flexibility and that you might want to check that before accidentally selecting a competing platform (of course you won't given all the additional management and security features in ProLiant now will you dear reader?).
Now that we've had a look at the high level differences between the Gen10 and Gen10 Plus AMD platforms then its worth considering use cases. For those already on the AMD Gen10 platforms then it really comes down to how important the additional latency and I/O benefits are to you. Should you have no requirement for the absolute best I/O and latency then the Gen10 models remain, concurrently, as the entry tier to AMD on HPE DL series.
For those who are still purely on Intel but have seen the power of the 7002 series processors and want in on the action (or who have seen the news of Intel's armies being stretched a tad thin this year and want some peace of mind) then I'd suggest going straight to HPE AMD Gen10 Plus. You get the absolute best performance for AMD "Rome" with respect to CPU SKUs and memory, plus you get the significant benefits of PCIe Gen4 that are simply not availablle on your Intel platforms today. Couple the possibility of significant cost optimisation, particularly with high performance single socket systems, and it makes sound sense to diversify server estates now.
Hopefully this blog has given a good overview of the recent changes to the HPE AMD Portfolio, even if it does feel a little like kicking a man whilst he is down (don't feel too bad, The Man is fine). We've got a lot more coming and pretty soon too! Channel Partners make sure you attend HPE TSS Paris March 2020 as you'll certainly get a lot of information and hands on information there. For our valued customers, please contact your HPE account team and we can arrange NDA updates for you upon request. For now all I'll say is that with the excellent HPC performance of EPYC 7002 series and the GPU compute benefits of PCIe Gen4 to NVIDIA and AMD, the Temple of Apollo Sosianus may become a little busier soon....
I've recently had some enquiries around using Gen10 Plus in areas where typically the uncore (Intel term for items outside of the core itself such as cache) has not been an AMD strong point. This and featues such as AVX have been significantly improved in the EPYC 7002 Rome series CPUs and the following may be helpful:
General Benchmarks and Comparisons
Some cache and latency tests : Note that there are separate but very fast 16MB L3 cache blocks (detailed in Serve The Home Article above)
GROMACS AVX2 / AVX512 – Benchmark
Note that whilst EPYC doesn't support AVX512 it carries out an AVX2 instruction in one clock with no boost transitions and thus drastically reduces jitter. Lack of AVX512 is compensated for by higher core counts and lower power consumption to provide comparable performance even with early non optimised code. *As a tip when using Intel MKL on EPYC then to get best AVX2 perfromance then see: MKL_DEBUG_CPU_TYPE=5
Get in touch
If you’d like to pick our brains or discuss anything with our specialists, please get in touch today.