A Rocket Lake Retrofit: 10nm onto 14nm

The new generation Rocket Lake processor family is the combination of two different backported technologies. Intel took the Sunny Cove core from its 10nm Ice Lake processor, and re-built it on 14nm, calling it now Cypress Cove. Intel also took the Xe graphics from 10nm Tiger Lake and re-built those on 14nm, but these are still called Xe graphics, albeit labelled UHD 750.

We can see that the new design is an amalgam of new technologies, by comparing Rocket Lake to Comet Lake, Ice Lake, and Tiger Lake:

Microarchitecture Comparison
AnandTech Comet
Lake
Rocket
Lake
Ice
Lake
Tiger
Lake
Ryzen
5000
Form Factor Desktop Desktop Laptop Laptop Desktop
Max Cores 10 8 4 4 16
TDP 125 W 125 W 28 W 35 W 105 W
uArch Comet Cypress Sunny Willow Zen 3
IGP Gen 9 Xe-LP Gen 11 Xe -
IGP Cores 24 32 64 96 -
L1-D 32 KB /c 48 KB /c 48 KB /c 48 KB/c 32 KB/c
L2 Cache 256 KB /c 512 KB /c 512 KB/c 1280KB /c 512 KB/c
L3 Cache 20 MB 16 MB 8 MB 12 MB 64 MB
PCIe 3.0 x16 4.0 x20 3.0 x8 4.0 x4 4.0 x24
DDR4 2 x 2933 2 x 3200 2 x 3200 2 x 3200 2 x 3200
LPDDR4X - - 4 x 3733 4 x 4266 -

There are obviously some differences between the notebook and desktop parts, most noticeably that the new platform at the high-end has only eight cores, two fewer than Comet Lake.

Additional improvements over Comet Lake include AVX512 units, support for 20 PCIe 4.0 lanes, and faster memory. With the new chipsets, Intel has already disclosed that the Rocket Lake platform will have native USB 3.2 Gen 2x2 (20 Gbps), and with the Z590 motherboards, a double bandwidth link from CPU to the chipset, moving from DMI x4 to DMI x8, effectively a PCIe 3.0 x8 link.

Rocket Lake on 14nm: The Best of a Bad Situation

The delays around the viability of Intel’s 10nm manufacturing have been well documented. To date, the company has launched several products on its 10nm process for notebooks, such as Cannon Lake, Ice Lake, Jasper Lake, Elkhart Lake, and Tiger Lake. There have been other non-consumer products, such as Agilex FPGAs and Snow Ridge 5G SoCs, and Intel has confirmed that its 10nm server products ‘Ice Lake Xeon Scalable’, are currently in volume production for an early Q2 launch on April 6th.

The one product line missing from that list is the desktop and enthusiast segments that typically use socketed processors paired with discrete graphics. Intel has always committed to launching desktop processors on its 10nm process, however we are yet to see the results of their efforts. The issues Intel is having with 10nm have not been fully disclosed at this time, with Intel instead happy to promote some of the improvements made, such as its new SuperFin technology, which is in Tiger Lake and the next-generation server platform beyond Ice Lake Xeon Scalable (for those keeping track, that would be Sapphire Rapids). The 10nm improvements so far has enabled Intel to launch notebook processors and server processors, both of which have lower power-per-core than a typical desktop offering.

As 10nm has not been able to meet the standards required for desktop-level performance, rather than leave a potential 3 year gap in the desktop product family, Intel has been in a holding pattern releasing slightly upgraded versions of Skylake on slightly improved variants of 14nm. The first two members of the Skylake family, Skylake and Kaby Lake were released as expected. While waiting, we saw Intel release Coffee Lake, Coffee Lake Refresh, and Comet Lake. Each of these afforded minor updates in frequency, or core count, or power, but very little in the way of fundamental microarchitectural improvement. The goal all along was to move to 10nm with the same architecture as the mobile Ice Lake processors, but that wasn’t feasible due to manufacturing limitations limiting how well the processors scaled to desktop level power.

  • Skylake, Core 6th Gen in August 2015
  • Kaby Lake, Core 7th Gen in January 2017 (+17mo)
  • Coffee Lake, Core 8th Gen in October 2017 (+9mo)
  • Coffee Lake Refresh, Core 9th Gen in October 2018 (+12mo)
  • Comet Lake, Core 10th Gen in April 2020 (+18mo)
  • Rocket Lake, Core 11th Gen in March 2021 (+11mo)

With each generation, Intel traditionally has either upgraded the process node technology, or updated the microarchitecture – a process that Intel called Tick-Tock. Originally Intel was set to perform a normal ‘Tick’ after Kaby Lake, and have Cannon Lake with the same effective Skylake microarchitecture move to 10nm. Cannon Lake ending up only as a laptop processor with no working graphics in a small number of notebooks in China as it was a hot mess (as shown in our review). As a result, Intel refocused its 10nm for notebook processors hoping that advances would also be applicable to desktop, but the company had to release minor upgrades on desktop from Coffee Lake onwards to keep the product line going.

This meant that at some level Intel knew that it would have to combine both a new architecture and a new process node jump into one product cycle. At some point however, Intel realized that the intercept point with having a new microarchitecture and the jump for the desktop to 10nm was very blurry, and somewhat intangible, and at a time when its main competitor was starting to make noise about a new product that could reach parity in single core performance. In order to keep these important product lines going, drastic measures would have to be taken.

After many meetings with many biscuits, we presume, the decision was made that Intel would take the core microarchitecture design from 10nm Ice Lake, which couldn’t reach high enough frequencies under desktop power, and repackage that design for the more dependable 14nm node which could reach the required absolute performance numbers. This is known as a ‘backport’.

Sunny Cove becomes Cypress Cove

 

The new Core 11th Gen processor which we are looking at today has the codename Rocket Lake. That’s the name for the whole processor, which consists of cores, graphics, interconnect, and other different accelerators and IP blocks, each of which also have their own codenames, just for the sake of making it easier for the engineers to understand what parts are in use. We use these codenames a lot, and the one to focus on here is the CPU core.

Intel’s 10nm Ice Lake notebook processor family uses Sunny Cove cores in the design. It is these cores that have been backported to 14nm for use in the Rocket Lake processors, and because it is on a different process node and there are some minor design changes, Intel calls them Cypress Cove cores.

The reason behind this is because taking a design for one manufacturing process and designing it for a second is no easy task, especially if it’s a regressive step – transistors are bigger, which means logic blocks are bigger, and all the work done with respect to signaling and data paths in the silicon has to be redone. Even with a rework, signal integrity needs to be upgraded for longer distances, or additional path delays and buffers need to be implemented. Any which way you cut it, a 10nm core is bigger when designed for 14nm, consumes more power, and has the potential to be fundamentally slower at execution level.

Intel’s official disclosures to date on the new Cypress Cove cores and Rocket Lake stem from a general briefing back in October, as well as a more product oriented announcement at CES in January. Intel is promoting that the new Cypress Cove core offers ‘up to a +19%’ instruction per clock (IPC) generational improvement over the cores used in Comet Lake, which are higher frequency variants of Skylake from 2015. However, the underlying microarchitecture is promoted as being identical to Ice Lake for mobile processors, such as caches and execution, and overall the new Rocket Lake SoC has a number of other generational improvements new to Intel’s desktop processors.

Eight Cores, Not Ten?

Enabling core designs through this backporting process is more complex than simply photocopying the design into the larger format. With every process node improvement, different density scalers and features are used in that process node that might not be available elsewhere. Undoubtedly the original 10nm Sunny Cove design had these in mind, and so having to re-architect the same floorplan with 14nm requires a lot of extra work. This adds transistors and buffers and ways to manage voltage differences and signal integrity in itself, increasing die size.

Note that Intel has in the past said that its 10nm process node offers a 2.7x transistor density increase moving from 14nm to 10nm. Naturally doing the reverse with a design, going from 10nm to 14nm, hasn’t made the core suddenly 270% bigger, namely because those numbers often refer to the densest transistors, and a high-performance microprocessor core often uses less dense transistors in logic to enable high-frequency with enough inactive silicon (dark silicon) to assist with power and thermals. We are still waiting on official numbers for core sizes, so it will be an interesting comparison between Sunny Cove and Cypress Cove.

Nonetheless, there is a core size increase, and this has to be factored into what silicon is produced. Designing a mass-production silicon layout requires balancing overall die size with expected yields, expected retail costs, required profit margins, and final product performance. Intel could easily make a 20+ core processor with these Cypress Cove cores, however the die size would be too large to be economical, and perhaps the power consumption when all the cores are loaded would necessitate a severe reduction in frequency to keep the power under control. To that end, Intel finalised its design on eight cores.

For die sizes, even with enabling only eight cores, the new Rocket Lake design is substantially bigger than the 10-core variant on Comet Lake.

Intel Consumer Die Size Comparison
All on Intel 14nm
AnandTech uArch Cores Die Size
Core i7-8700K Coffee Lake 6 C 9.2 x 16.7 mm 153.6 mm2
Core i9-9900K CFL Refresh 8 C 9.2 x 19.6 mm 180.3 mm2
Core i9-10900K Comet Lake 10 C 9.2 x 22.4 mm 206.1 mm2
Core i9-11900K Rocket Lake 8 C 11.5 x 24.0 mm 276.0 mm2
HEDT for Comparison
Core i9-7900X Skylake-X 10 C 14.6 x 22.3 mm 325.4 mm2
Core i7-6950X Broadwell-E 10 C   246.3 mm2

So it's worth noting that Intel's new 8 core Rocket Lake processor is actually bigger than the 10 core Broadwell-E processor from 2016. One major difference between those two however is AVX-512, which does have a slight die-size increase. Nonetheless, Intel is approaching its HEDT platform die size with Rocket Lake, but can't sell them for as much as the HEDT has historically sold for. The Core i7-6950X sold for $1723, while the Core i7-7900X was $999. Intel's bulk of interest with this silicon is going to be the Core i7-11700K, which is a $420 processor.

Backport vs Co-Design

One of the critical elements to Rocket Lake is what it means for Intel going forward. With this project, Intel has taken a core designed for 10 nm and recreated the performance on 14 nm, with additional implications for power and efficiency. Intel has stated that in the future it will have cores designed for multiple process nodes at the same time, and so given Rocket Lake’s efficiency at the high frequencies, doesn’t this mean the experiment has failed?

I say no, because it teaches Intel a lot in how it designs its silicon. The issue with Rocket Lake is that the core was originally designed for 10 nm, and that won’t necessarily happen again.

Future cores from Intel are going to be designed, from the ground-up, for multiple process node technologies. Given Intel’s announcements about developing cores on external Intel manufacturing facilities, as well as licensing out its core designs, this means Intel might have to design a core that works at both Intel and TSMC. The point is that if Intel is going to do this, it will design for both from the start. The core will have been built taking into account the different elements of the process nodes in advance, and likely cater for the intricacies of both.

Rocket Lake by contrast, was an ‘after the fact’ redesign, with all of its special features built for 10 nm and then retrofit to 14 nm. Rocket Lake shows it can be done, but the way Intel went about this is unlikely to happen in the future. All future cores that require multiple process nodes, even across multiple foundry partners, are going to be co-designed from day one.

Ultimately, the future of how and when Intel will initiate additional co-design, even given suggested roadmaps, is likely to be in flux based on Intel’s own ability to produce high single-core frequency desktop processors. Cypress Cove, by most measures, is a reflex response to a widening gap in Intel’s desktop roadmap, and takes a core specifically designed for a different process. Intel is/has/has likely learned a lot from this process, but in the future we can expect specific cores to be co-designed with both process nodes in mind. This is akin to Intel’s new stance on ‘enabling the right product on the right node at the right time’. A co-designed approach, rather than a post-production realisation a backport is required, will mean that future core designs that straddle two process nodes are likely to be more similar and optimized on both processes at the same time.

Intel Core i9-11900K Review Motherboards and Overclocking Support
POST A COMMENT

278 Comments

View All Comments

  • SystemsBuilder - Wednesday, March 31, 2021 - link

    and you are a Computer Science graduate? What Linus T. is saying is that AVX-512 is a power hog and he is right about that. Linus T. is not saying that "a couple dozen or so people" are able to program it. Power requirements and programing hardness are 2 different things.
    On the second point, I 100% stand by that any decent Computer Science/Engineering graduate should be able to program AVX-512 effectively (overcoming hardness not power requirements).
    Also, I do program AVX-512 and I 100% stand by what I said. You just need to know what you are doing and vectorize algorithms. If you use the good old sequential algorithms you will not archive anything with AVX-512, but it you vectorize you're classical algorithms you will also achieve >100% benefits in many inner loops in so called mainstream programming. AVX-512 can give you 2x uplift if you know how to utilize both FMA units on port 0+1 and 5 and it's not hard.
    Lastly, with decent negative AVX-512 offsets in BIOS, you can bring down the power utilization to ok levels AND still get 2x improvements in the inner loops (because of vectorized algorithmic improvement).
    Reply
  • Hifihedgehog - Wednesday, March 31, 2021 - link

    > and you are a Computer Science graduate?

    No, I am a Computer Engineering graduate. Sorry, but you are grasping at straws. Plus you are overcomplicating the obvious to try to be an Intel apologist. Just see this and this. Read it and weep. Intel flopped big time this release:

    https://i.imgur.com/HZVC03T.png

    https://i.imgflip.com/53vqce.jpg
    Reply
  • SystemsBuilder - Wednesday, March 31, 2021 - link

    So fellow CS/CE grad. I'm not arguing that AVX-512 is a power hog (it is) or that the AVX-512 offsets slows down the rest of the CPU (they do). I am arguing the premise that AVX-512 is supposed to be so incredibly hard to do that only "couple dozen or so people" can do is wrong today - Skylake-X with AVX-512 was launched 2017 for heaven's sake. Surely, I can't be the only CS/CE guy how figured it out by now. I mean really? When Ian wrote what Keller said (and keep on writing it) that that this AVX-512 is sooo hard to do that only a few guys on the planet can do it well, my reaction was "let's see about that". I mean come on guys, really! Reply
  • SystemsBuilder - Wednesday, March 31, 2021 - link

    More specifically Linus is concerned that because you need to use negative offsets to keep the power utilization down when engaging AVX-512 it slows down everything else going on. i.e. AVX-512 power requirements overall CPU impact. The new cores designs (already Cypress Cove maybe? but Sapphire Rapids definitely!) will allow AVX-512 workloads to run at one frequency (with lower negative offsets that for instance Skylake-X) and non AVX-512 workloads at a different frequency on various cores and keep within the power budget. this is ideal. Reply
  • arashi - Wednesday, March 31, 2021 - link

    This belongs in r/ConfidentlyIncorrect and r/IAmVerySmart, anyone who thinks coding for AVX512 PROPERLY is doable by "any CS/CE major graduate worth their salt" would be laughed out of the industry. Reply
  • Hifihedgehog - Wednesday, March 31, 2021 - link

    Exactly. The real reason for the nonsensical wall of text is SystemsBuilder is trying desperately to overexplain things to put lipstick on a pig. And he repeats himself too like I am listening to an automated bot caught in a recursive loop which is quite funny actually. Reply
  • SystemsBuilder - Wednesday, March 31, 2021 - link

    So you are a CE major, have you actually tried to program in AVX 512? If not, try to do a matrix by matrix multiplication of 16x16 FP32 matrices for instance and come back. You'll notice incredible performed increase. It's not lipstick on a pig, it actually is very powerful, especially computing through large volumes of related data SIMD style. Reply
  • Meteor2 - Saturday, April 17, 2021 - link

    Disappointing response. You throw insults but not rebuttals.

    Me thinks SB has a point.
    Reply
  • SystemsBuilder - Wednesday, March 31, 2021 - link

    really? any you are one CS graduate? have you tried? Reply
  • MS - Tuesday, March 30, 2021 - link

    What the he'll is that supposed to mean that you can't you can't get the frequency at 10 nm and therefore you have to stick with the 14 nm node? That's pure nonsense, AND is at 7 nm and they are getting the target frequencies. Maybe stop spreading the Coolaid and call a spade a spade.... Reply

Log in

Don't have an account? Sign up now