AMD Releases Milan-X CPUs With 3D V-Cache: EPYC 7003 Up to 64 Cores and 768 MB L3 Cacheby Gavin Bonshor on March 21, 2022 9:00 AM EST
There's been a lot of focus on how both Intel and AMD are planning for the future in packaging their dies to increase overall performance and mitigate higher manufacturing costs. For AMD, that next step has been V-cache, an additional L3 cache (SRAM) chiplet that's designed to be 3D die stacked on top of an existing Zen 3 chiplet, tripling the total about of L3 cache available. Today, AMD's V-cache technology is finally available to the wider market, as AMD is announcing that their EPYC 7003X "Milan-X" server CPUs have now reached general availability.
As first announced late last year, AMD is bringing its 3D V-Cache technology to the enterprise market through Milan-X, an advanced variant of its current-generation 3rd Gen Milan-based EPYC 7003 processors. AMD is launching four new processors ranging from 16-cores to 64-cores, all of them with Zen 3 cores and 768 MB L3 cache via 3D stacked V-Cache.
AMD's Milan-X processors are an upgraded version of its current 3rd generation Milan-based processors, EPYC 7003. Adding to its preexisting Milan-based EPYC 7003 line-up, which we reviewed back in June last year, the most significant advancement from Milan-X is through its large 768 MB of L3 cache using AMD's 3D V-Cache stacking technology. The AMD 3D V-Cache uses TSMC's N7 process node – the same node Milan's Zen 3 chiplets are built upon – and it measures at 36 mm², with a 64 MiB chip on top of the existing 32 MiB found on the Zen 3 chiplets.
Focusing on the key specifications and technologies, the latest Milan-X AMD EPYC 7003-X processors have 128 available PCIe 4.0 lanes that can be utilized through full-length PCIe 4.0 slots and controllers selection. This is dependent on how motherboard and server vendors want to use them. There are also four memory controllers that are capable of supporting two DIMMs per controller which allows the use of eight-channel DDR4 memory.
The overall chip configuration for Milan-X is a giant, nine chiplet MCM, with eight CCD dies and a large I/O die, and this goes for all of the Milan-X SKUs. Critically, AMD has opted to equip all of their new V-cache EPYC chips with the maximum 768 MB of L3 cache, which in turn means all 8 CCDs must be present, from the top SKU (EPYC 7773X) to the bottom SKU (EPYC 7373X). Instead, AMD will be varying the number of CPU cores enabled in each CCD. Drilling down, each CCD includes 32 MB of L3 cache, with a further 64 MB of 3D V-Cache layered on top for a total of 96 MB of L3 cache per CCD (8 x 96 = 768).
In terms of memory compatibility, nothing has changed from the previous Milan chips. Each EPYC 7003-X chip supports eight DDR4-3200 memory modules per socket, with capacities of up to 4 TB per chip and 8 TB across a 2P system. It's worth noting that the new Milan-X EPYC 7003-X chips share the same SP3 socket as the existing line-up and, as such, are compatible with current LGA 4094 motherboards through a firmware update.
|AMD EPYC 7003 Milan/Milan-X Processors|
|EYPC 7773X||64||128||2200||3500||768 MB||128 x 4.0||8 x DDR4-3200||280||$8800|
|EPYC 7763||64||128||2450||3400||256 MB||128 x 4.0||8 x DDR4-3200||280||$7890|
|EPYC 7573X||32||64||2800||3600||768 MB||128 x 4.0||8 x DDR4-3200||280||$5590|
|EPYC 75F3||32||64||2950||4000||256 MB||128 x 4.0||8 x DDR4-3200||280||$4860|
|EPYC 7473X||24||48||2800||3700||768 MB||128 x 4.0||8 x DDR4-3200||240||$3900|
|EPYC 74F3||24||48||3200||4000||256 MB||128 x 4.0||8 x DDR4-3200||240||$2900|
|EPYC 7373X||16||32||3050||3800||768 MB||128 x 4.0||8 x DDR4-3200||240||$4185|
|EPYC 73F3||16||32||3500||4000||256 MB||128 x 4.0||8 x DDR4-3200||240||$3521|
Looking at the new EPYC 7003 stack with 3D V-Cache technology, the top SKU is the EPYC 7773X. It features 64 Zen3 cores with 128 threads has a base frequency of 2.2 GHz and a maximum boost frequency of 3.5 GHz. The EPYC 7573X has 32-cores and 64 threads, with a higher base frequency of 2.8 GHz and a boost frequency of up to 3.6 GHz. Both the EPYC 7773X and 7573X have a base TDP of 280 W, although AMD specifies that all four EPYC 7003-X chips have a configurable TDP of between 225 and 280 W.
The lowest spec chip in the new line-up is the EPYC 7373X, which has 16 cores with 32 threads, a base frequency of 3.05 GHz, and a boost frequency of 3.8 GHz. Moving up the stack, it also has a 24c/48t option with a base frequency of 2.8 GHz and a boost frequency of up to 3.7 GHz. Both include a TDP of 240 W, but like the bigger parts, AMD has confirmed that both 16-core and 24-core models will have a configurable TDP of between 225 W and 280 W.
Notable, all of these new Milan-X chips have some kind of clockspeed regression over their regular Milan (max core performance) counterparts. In the case of the 7773X, this is the base clockspeed, while the other SKUs all drop a bit on both base and boost clockspeeds. The drop is necessitated by the V-cache, which at about 26 billion extra transistors for a full Milan-X configuration, eats into the chips' power budget. So with AMD opting to keep TDPs consistent, clockspeeds have been dialed down a bit to compensate. As always, AMD's CPUs will run as fast as heat and TDP headroom allows, but the V-cache equipped chips are going to reach those limits a bit sooner.
AMD's target market for the new Milan-X chips is customers who need to maximize per-core performance; specifically, the subset of workloads that benefit from the extra cache. This is why the Milan-X chips aren't replacing the EPYC 70F3 chips entirely, as not all workloads are going to respond to the extra cache. So both lineups will be sharing the top spot as AMD's fastest-per-core EPYC SKUs.
For their part, AMD is particularly pitching the new chips at the CAD/CAM market, for tasks such as finite element analysis and electronic design automation. According to the company, they've seen upwards of a 66% increase in RTL verification speeds on Synopsys' VCS verification software in an apples-to-apples comparison between Milan processors with and without V-cache. As with other chips that incorporate larger caches, the greatest benefits are going to be found in workloads that spill out of contemporary-sized caches, but will neatly fit into the larger cache. Minimizing expensive trips to main memory means that the CPU cores can remain working that much more often.
Microsoft found something similar last year, when they unveiled a public preview of its Azure HBv3 virtual machines back in November. At the time, the company published some performance figures from its in-house testing, mainly on workloads associated with HPC. Comparing Milan-X directly to Milan, Microsoft used data from both EPYC 7003 and EPYC 7003-X inside its HBv3 VM platforms. It's also worth noting that the testing was done on dual-socket systems, as all of the EPYC 7003-X processors announced today could be used in both 1P and 2P deployments.
Performance data published by Microsoft Azure is encouraging and using its in-house testing, it looks as though the extra L3 cache is playing a big part. In Computational Fluid Dynamics, it was noted that there was a better speed up with fewer elements, so that has to be taken into consideration. Microsoft stated that with its current HBv3 series, its customers can expect maximum gains of up to 80% performance in Computational Fluid Dynamics compared to the previous HBv3 VM systems with Milan.
Wrapping things up, AMD's EPYC 7003-X processors are now generally available to the public. With prices listed on a 1K unit order basis, AMD says the EPYC 7773X with 64C/128T will be available for around $8800, while the 32C/64T model, the EPYC 7573X, will cost about $5590. Moving down, the EPYC 7473X with 24C/48T will cost $3900, and the entry EPYC 7373X with 16C/32T will cost slightly more with a cost of $4185.
Given the large order sizes required, the overall retail price is likely to be slightly higher for one unit. Though with the majority of AMD's customers being server and cloud providers, no doubt AMD will have some customers buying in bulk. Many of AMD's major server OEM partners are also slated to begin offering systems using the new chips, including Dell, Supermicro, Lenovo, and HPE.
Finally, consumers will get their own chance to get their hands on some AMD V-cache enabled CPUs next month, when AMD's second V-cache product, the Ryzen 7 5800X3D, is released. The desktop processor is based around a single CCD with a whopping 96 MB of L3 cache available, all of which contrasts nicely with the much bigger EPYC chips.
Post Your CommentPlease log in or sign up to comment.
View All Comments
back2future - Monday, March 21, 2022 - link(if mostly reasonable) translates to 1.5-3W/(1 billion transistors of AMD 3D V-Cache) on full load performances for TSMC's N7 process node including interconnector resistances (?)
nandnandnand - Monday, March 21, 2022 - linkBTW, there's talk of RDNA3 GPUs having a 3D implementation of Infinity Cache, so we could see a similar technology with a different amount of SRAM (up to 512 MiB total) on a new node.
magtknp101 - Wednesday, March 23, 2022 - linkWho is the <a href="https://www.linkedin.com/in/syed-bilal-ahmad/"... founder</a> in India is Mr Syed Bilal Ahmad who take this brand to another level.
mode_13h - Thursday, March 24, 2022 - link@magtknp101, don't be a spammer.
back2future - Wednesday, March 23, 2022 - linksRAM write endurance (for sustained 4GHz cpu cache clock speeds, 24/7) is about 10-30yrs and somewhat verified 7nm power requirements for 8T sRAM cells (4kB cells arrangement) transferred to this 6T sRAM V-cache might top out on a comparable 120-130W/512MB for absolute maximum power dissipation. This, around 1-4W(?)/mm² of V-cache chiplet, seems pretty high numbers?
(From 0.25um-65nm high-power CMOS era, this cache sizes would require 1-2(-3)digit kW power transfer and heat dissipation, and/but 3nm for sRAM (gate length) will be getting a tough challenge)
back2future - Wednesday, March 23, 2022 - linkCorrection: "This, around 1-4W(?)/mm² of V-cache chiplet, seems pretty high numbers?"
This, around 0.15-0.45W/mm² for V-cache chiplet (since 36mm² each 64MB, so divided by 8), sound reasonable and within public numbers
back2future - Wednesday, March 23, 2022 - linkfor comparison: NVIDIA H100 (gpu GH100 area 814mm²) is ~0.86W/mm²
myself248 - Monday, March 21, 2022 - linkIs there any hint that the thicker die may also impact thermals, because heat from the CCX has to pass through the additional silicon on its way to the heatspreader? I know it's very thin, but the impact can't be truly zero...
nandnandnand - Monday, March 21, 2022 - linkA hint like the lower clock speeds and disabled overclocking on the 5800X3D?
WaltC - Monday, March 21, 2022 - linkI think the lower clocks and clock speed locks are to maintain the same TDP between the v-cache and standard versions.