Disclaimer June 25th: The benchmark figures in this review have been superseded by our second follow-up Milan review article, where we observe improved performance figures on a production platform compared to AMD’s reference system in this piece.

SPEC - Multi-Threaded Performance

Picking up from the power efficiency discussion, let’s dive directly into the multi-threaded SPEC results. As usual, because these are not officially submitted scores to SPEC, we’re labelling the results as “estimates” as per the SPEC rules and license.

We compile the binaries with GCC 10.2 on their respective platforms, with simple -Ofast optimisation flags and relevant architecture and machine tuning flags (-march/-mtune=Neoverse-n1 ; -march/-mtune=skylake-avx512 ; -march/-mtune=znver2). For the new Zen3 Milan parts, as GCC 10.2 at the time didn’t have support for the new microarchitecture, we’re using the same Zen2 binaries as on Rome, as otherwise we’d have to rerun all numbers on all platforms with a newer GCC 11 baseline – something we will do in the future but out of the scope of this piece.

In terms of data-points, we’re comparing the new 7763 against a 7742, as well as the top socketed SKUs from the competition, including a Xeon 8280 (Equivalent to a Xeon 6258R), and Ampere’s Arm-based Altra Q80-33. It’s to be noted that the 7763 is a 280W part, and thus lands in higher than the other three chips, but we wanted to compare the top of the stack with the best available parts we had available for testing.

SPECint2017 Rate-N Estimated Scores (1 Socket)

Generationally, the new 7763 does outperform the 7742 across the board, but generally the magnitude isn’t quite as large as you’d expect given the 40W higher TDP of the chip – a performance delta that gets even tighter when configuring the 7742 to a 240W cTDP.

Against the competition, AMD’s traditional adversary Intel doesn’t really stand a chance as the Milan chip is posting well over double the performance in almost all workloads.

The newer competition AMD should worry about is Ampere’s new Altra system – which currently still outperforms the top-end Milan based 7763 in several compute-heavy benchmarks by notably margins. In more memory-heavy workloads, the EPYC more easily beats the Altra due to having essentially 8x the total cache per chip at 256MB vs 32MB.

SPECfp2017 Rate-N Estimated Scores (1 Socket)

We’re seeing a similar story in SPECfp2017, more oriented towards HPC workloads. The one result that stands out here is 511.povray, in which the 7763 loses out to the previous generation 7742 due to the workload being more core-bound, and the Milan chip having a lesser effective thermal envelope available for the cores, even at the higher 280W TDP.

Intel’s 8280 again really isn’t a viable competition to AMD’s chips, with the Ampere Altra being a closer match for AMD’s EPYC, winning some, losing some.

SPECint2017 Base Rate-N Estimated Performance 

With Milan, AMD is now retaking the performance lead in SPECint2017, although by only a small margin.

Generationally, the EPYC 7763 is 12.8% faster than the 7742 – unfortunately we don’t have figures of the corresponding 280W 7H12 Rome part.

Comparing the 7713 against the 7742, things aren’t looking as great, as the new Milan part sees a 4% performance regression. It’s to be noted again that AMD had claimed the 7713 is a direct successor to the 7662, where it does fare 10% better, however that is a $900 cheaper part.

Amongst the rather luke-warm results of the top-stack Milan SKUs, one result that stands out more is the 32-core frequency optimised 75F3 SKU. Featuring only half the cores, the part still manages to easily compete amongst its 64 core siblings, showcasing 82% of the performance of a 7713. This has rather large implications for per-thread performance of this part which we’ll cover in a later page.

Although AMD’s presentation slides using totally different SPEC result numbers due to very different compiler and optimisation settings, the actual relative positioning we are getting in our internal results actually exceed that of what AMD is presenting, with the 7763 coming in with a +128% advantage over the Intel 8280, a part that’s performance equivalent to the 6258R.

SPECfp2017 Base Rate-N Estimated Performance

In the SPECfp2017 suite which is more representative of HPC workloads and has a focus more towards memory performance, AMD had always retained their performance leadership, and has now widened it with the new Milan generation. The 7763 performs 14.4% better than the Rome 7742, while the 7713 almost outperforms it by a margin of error.

It’s again the 75F3 which is actually stealing the show, as it manages 97.8% of a 7713, and 85.8% of a 7763 even though it has only half the cores.

Against the competition and the figures AMD is showing, we’re measuring the 7763 outperforming the 8280 by 108% - near to the 106% the presentation material is showcasing. Intel should be able to make a larger leap with the next generation Ice Lake-SP server chips as the company moves from 6-channel to 8-channel memory, though we’ll have to see if that’s actually enough to catch up to AMD.

Power & Efficiency: IOD Power Overhead? SPEC - Single-Threaded Performance
Comments Locked

120 Comments

View All Comments

  • Oxford Guy - Tuesday, April 6, 2021 - link

    PSP, as far as I know.
  • Linustechtips12#6900xt - Monday, March 15, 2021 - link

    I understand that "zen" architecture is for x86 but with modifications could it be transplanted to the ARM instruction set, as i see it, it definitely could so the real question is when will the transition really start i think around the theoretical zen 5th gen or 6th gen, theres gonna be a lot of arm around here especially with apple. and yes it will defenitly start wiht servers it always does.
  • Gomez Addams - Monday, March 15, 2021 - link

    There are really two things at work : the instruction set of the processor and its topology. AMD has been improving both quite a bit. The instruction set enhancements won't transfer quite so well to ARM but the topology certainly can. Since ARM processors are much smaller, they could probably work in chiplets with possibly 32 cores in each or maybe 16 cores and 4-way SMT. That could make for a very impressive server processor. Four chiplets would give 64 cores and 256 threads. Yikes!
  • rahvin - Monday, March 15, 2021 - link

    So much wrong.
  • mode_13h - Monday, March 15, 2021 - link

    There are pieces of it that can be reused (on the same manufacturing node, at least), but making a truly-competitive ARM chip is probably going to involve some serious tinkering with the pipeline stages & architecture. And there are significant parts of an x86 chip that you'd have to throw out and redo, most notably the instruction decoder.

    In all, it's a different core that you're talking about. Not like CPU vs. GPU level of difference, but it's a lot more than just cosmetics.
  • coder543 - Monday, March 15, 2021 - link

    "For this launch, both the 16-core F and 24-core F have the same TDP, so the only reason I can think of for AMD to have a higher price on the 16-core processor is that it only has 2 cores per chiplet active, rather than three? Perhaps it is easier to bin a processor with an even number of cores active."

    If I were to speculate, I would strongly guess that the actual reason is licensing. AMD knows that more people are going to want the 16 core CPUs in order to fit into certain brackets of software licensing, so AMD charges more for those to maximize profit and availability of the 16 core parts. For those customers, moving to a 24 core processor would probably mean paying *significantly* more for whatever software they're licensing.
  • SarahKerrigan - Monday, March 15, 2021 - link

    Yep.

    Intel sold quad-core Xeon E7's for impressively high prices for a similar reason.
  • Mikewind Dale - Monday, March 15, 2021 - link

    Why couldn't you run a 16 core software license on a 24 core CPU? I run a 4 core licensed version of Stata MP on an 8 core Ryzen just fine.
  • Ithaqua - Monday, March 15, 2021 - link

    Compliance and lawsuits.
    You have to pay for all the cores you use for some software.

    Yes if you're only running 4 cores on your 8 core Ryzen then your fine but Stata MP is using all 8, there could be a lawsuit.

    Now for you I'm sure they wouldn't care. For a larger firm with 10,000+ machines, then that's going to be a big lawsuit.
  • arashi - Wednesday, March 17, 2021 - link

    Some licenses charge for ALL cores, regardless of how many cores you would actually be using.

Log in

Don't have an account? Sign up now