The Ampere Altra Review: 2x 80 Cores Arm Server Performance Monsterby Andrei Frumusanu on December 18, 2020 6:00 AM EST
- Posted in
- Neoverse N1
As we’re wrapping up 2020, one last large review item for the year is Ampere’s long promised new Altra Arm server processor. This year has indeed been the year where Arm servers have had a breakthrough; Arm’s new Neoverse-N1 CPU core had been the IP designer’s first true dedicated server core, promising focused performance and efficiency for the datacentre.
Earlier in the year we had the chance to test out the first Neoverse-N1 silicon in the form of Amazon’s Graviton2 inside of AWS EC2 cloud compute offering. The Graviton2 seemed like a very impressive design, but was rather conservative in its goals, and it’s also a piece of hardware that the general public cannot access outside of Amazon’s own cloud services.
Ampere Computing, founded in 2017 by former Intel president Renée James, built upon initial IP and design talent of AppliedMicro’s X-Gene CPUs, and with Arm Holdings becoming an investor in 2019, is at this moment in time the sole “true” merchant silicon vendor designing and offering up Neoverse-N1 server designs.
To date, the company has had a few products out in the form of the eMAG chips, but with rather disappointing performance figures - understandable given that those were essentially legacy products based on the old X-Gene microarchitecture.
Ampere’s new Altra product line, on the other hand is the culmination of several years of work and close collaboration with Arm – and the company first “true” product which can be viewed as Ampere pedigree.
Today, with hardware in hand, we’re finally taking a look at the very first publicly available high-performance Neoverse based Arm server hardware, designed for nothing less than maximum achievable performance, aiming to battle the best designs from Intel and AMD.
Mount Jade Server with Altra Quicksilver
Ampere has supplied us with the company’s server reference design, dubbed “Mount Jade”, a 2-socket 2U rack unit sever. The server came supplied with two Altra Q80-33 processors, Ampere’s top-of-the-line SKU with each featuring 80 cores running at up to 3.3GHz, with TDP reaching up to 250W per socket.
The server was designed with close collaboration with Wiwynn for this dual socket, and with GIGABYTE for the single socket variant, as previously hinted by the two company’s announcements of leading hyperscale deployments of the Altra platforms. The Ampere-branded Mount Jade DVT reference motherboard comes in a typical server blue colour scheme and features 2 sockets with up to 16 DIMM slots per socket, reaching up to 4TB DRAM capacity per socket, although our review unit came equipped with 256GB per socket across 8 DIMMs to fully populate the chip’s 8-channel memory controllers.
This is also our first look at Ampere’s first-generation socket design. The company doesn’t really market any particular name to the socket, but it’s a massive LGA4926 socket with a pin-count in excess of any other commercial server socket from AMD or Intel. The holding mechanism is somewhat similar to that of AMD’s SP3 system, with a holding mechanism tensioned by a 5-point screw system.
The chip itself is absolutely humongous and amongst the current publicly available processors is the biggest in the industry, out-sizing AMD’s SP3 form-factor packaging, coming in at around 77 x 66.8mm – about the same length but considerably wider than AMD’s counterparts.
Although it’s a massive chip with a huge IHS, the Mount Jade server surprised me with its cooling solution as the included 250W type cooler only made contact with about 1/4th the surface area of the heat spreader.
Ampere here doesn’t have a recessed “lip” around the IHS for the mounting bracket to hold onto the chip like on AMD or Intel systems, so the actual IHS surface is actually recessed in relation to the bracket which means you cannot have a flat surface cooler design across the whole of the chip surface.
Instead, the included 250W design cooler uses a huge vapour chamber design with a “pedestal” to make contact with the chip. Ampere explains that they’ve experimented with different designs and found that a smaller area pedestal actually worked better for heat dissipation – siphoning heat off from the actual chip die which is notably smaller than the IHS and chip package.
The cooler design is quite complex, with vertical fin stacks dissipating heat directly off the vapour chamber, with additional large horizontal fins dissipating heat from 6 U-shaped heat pipes that draw heat from the vapour chamber. It’s definitely a more complex and high-end design than what we’re used to in server coolers.
Although the Mount Jade server is definitely a very interesting piece of hardware, our focus today lies around the actual new Altra processors themselves, so let’s dive into the new Q80-33 80-core chip next.
Post Your CommentPlease log in or sign up to comment.
View All Comments
Wilco1 - Saturday, December 19, 2020 - linkSMT gives very little benefit (only 15% faster on SPECINT and 3.5% *slower* on SPECFP), adds a lot of area and complexity, and results in very bad per-thread performance.
It's always better to use 2 real cores instead of 1 SMT core. So if you have small cores like Neoverse N1, adding SMT makes no sense at all.
mode_13h - Sunday, December 20, 2020 - linkWhether SMT makes sense depends on the workload. Many tasks have greater branch-density and less computational density, in which case SMT is a massive win. I'm compiling code all the time and see huge speedups from SMT.
That said, of course I'll take two real cores instead of 2-way SMT, all else being equal, but that always costs more. If SMT really made as little sense as you say, then it wouldn't be nearly so widespread.
Wilco1 - Monday, December 21, 2020 - linkThere are certainly cases where SMT helps, but having some wins doesn't mean it is worth adding SMT. All too often people talk up the upsides and ignore the downsides. Let's see how Altra Max does vs Milan next year, that should answer which is best.
Note almost none of the billions of CPUs sold every year have SMT (even if we exclude embedded). Adding another core is simpler, cheaper and gives more performance.
mode_13h - Monday, December 21, 2020 - link> Let's see how Altra Max does vs Milan next year, that should answer which is best.
I disagree. That's a bit apples and oranges. The differences between Zen2/Skylake and N1 are too big. You need to look at the overhead of SMT for a particular core vs. the benefits for that core.
These x86 cores are large not just because of SMT, but also the x86 tax, their wide vector units, and other things. It could be that SMT adds just 5% overhead, and that's not enough to increase your core count hardly at all, if you drop it.
Wilco1 - Monday, December 21, 2020 - linkIt's never going to be a perfect comparison - nobody will design SMT and non-SMT variants of the same core! There are many differences because they use different design principles. However it will clearly show which of these designs works out best for top-end server performance.
Spunjji - Monday, December 21, 2020 - linkSMT is a clear win where you already have large cores with a lot of execution resources, whereby the extra resources required aren't a large proportion of overall die area. It also helps if your tasks are focused on overall performance for a given number of threads, rather than performance-per-thread.
Where the cores are this small, though, simply adding more of them seems to be the better option.
mode_13h - Monday, December 21, 2020 - linkWell said.
mode_13h - Sunday, December 20, 2020 - linkPlus, per-thread performance is only an issue if that's how you're paying for CPU time, and then what you actually care about is thread performance per dollar, which would compensate for any cost differences due to SMT, as well.
Wilco1 - Monday, December 21, 2020 - linkAnd Graviton 2 shows which is cheaper and faster.
mode_13h - Monday, December 21, 2020 - linkThat comparison is only valid for Amazon customers and in the short term. It can't be used to support a broader conclusion about SMT, because we lack transparency into the cost structure of Amazon's hosting, like whether they're subsidizing Graviton2 servers or even just charging enough for them to simply break even on the hardware.