Rocket Lake Redux: 0x34 Microcode Offers Small Performance Gains on Core i7-11700Kby Dr. Ian Cutress on March 14, 2021 12:00 PM EST
One of the leading questions as to our original Core i7-11700K review was the validity of those results given that, as usual with launches, motherboard vendors push BIOS updates as we move closer to the official launch. At the time, we were testing on Intel’s microcode 0x2C, the latest version available to the motherboard vendor. Intel has since released microcode 0x34, and we have retested our results on this new update.
The March on Microcode
What is microcode? Most consider it to be the underlying firmware for an Intel system. Microcode ultimately controls processor operation as well as the translation between the user-exposed instruction set and the underlying processor design. Within this microcode, the processor can control and react to what code is flowing into the processor, where to send it, how to adjust voltage/frequency, as well as a number of optimizations or fixes that are needed.
For example, some of the initial variants of Spectre and Meltdown were ‘fixed’ by adjusting the microcode to no longer make the processor vulnerable. This required those systems to be updated to the new microcode, and there was a performance penalty. Future processors were built with these fixes in hardware, and so incur less of a performance drop. Microcode allows for after-production adjustments in the name of security, or performance.
Leading up to a launch of a new processor family, microcode is King. Intel takes its microcode, and couples with it updated power management control, initialization code, management engine firmware, device drivers/keys and UEFI drivers. This package creates an update candidate, which is sent to the motherboard manufacturers. On top of this, motherboard vendors sprinkle their own garnish, but can also modify elements such as turbo or memory training, which we’ve seen extensively done in the past. As we’ve discussed previously, a number of suggested settings that Intel makes can (and often are) ignored by the motherboard vendors. But the microcode is fairly rigid in what it does.
The key point to note here is that motherboard vendors do not always update their BIOS offerings each time a new microcode package is made available to them. So, for example, a motherboard BIOS vendor might only deploy one new update a month, even if Intel is supplying new updates for a week.
Moving Closer To Launch
So you might imagine, as a new processor family and motherboard stack gets closer to launch, there is an impetus to enable the latest microcode in a new BIOS every time one is made available. Motherboards are manufactured up to 3 months in advance with very early microcode, and they have to be updated continuously, both for Intel’s latest updates but also the motherboard vendor’s optimizations.
Even boards for retail can have super early editions of microcode. Z590 boards are already on the market, with the latest versions at the time of manufacture. So far in our Rocket Lake testing we have been sent three different motherboards, with the following microcode versions:
- Board 1: BIOS from January 8th, microcode 0x1B (version 27)
- Board 2: BIOS from February 6th, microcode 0x24 (version 36)
- Board 3: BIOS from February 7th, microcode 0x23 (version 35)
These numbers are all in hexadecimal, but there are nine different versions between the first motherboard and the other two. Since then, to March 6th, there have been at least 16 more versions (0x34, version 52), and likely another few to come.
Not all motherboard manufacturers will release BIOSes for all of their motherboards with every microcode release, just due to time and personnel, plus they don’t envisage the systems being updated every 36 hours. After each new revision, they would have to wrap the microcode package into a BIOS and perform regression tests, make sure all of the features work as expected, and if any new adjustments have been added, test those too. That being said, Intel might not fully release every microcode revision to the motherboard vendors, but we can confirm that vendors do pick and choose what to use.
Our Testing, and This Update Today
When we reviewed our Core i7-11700K prior to launch, we stated that at the time we had used the latest BIOS available from the motherboard vendor. In our communications with that vendor were told that there was no indication if/when the next BIOS would drop. At the time, we were using microcode 0x2C (version 44), part of a February 18th BIOS package. To date, this still remains the latest BIOS for that motherboard, and we’ve been told another update is to come in a week or so.
Since our review, we have obtained a second motherboard. This motherboard was sent to us with microcode 0x1B, from January 8th – super old in a rapid launch cycle. We did some testing on 0x1B, to get a baseline of where the performance was at this time. After that testing, we updated to the latest BIOS. This BIOS was compiled on March 6th version (one day after our review), with the 0x34 microcode.
In our testing today, we will be focusing on the performance delta between the first motherboard on microcode 0x2C (still the latest BIOS for that board), and the second motherboard on microcode 0x34, but also refer back to some of those raw 0x1B numbers. There is some slight variation between the two boards when it comes to AVX-512 response, which we will also cover.
This article will have a few choice results, however our original review with the Core i7-11700K will be updated to showcase both sets of 0x2C and 0x34 numbers.
A Side Note about Motherboard Defaults
One of the comments about our original review was the state of our main memory frequency ratios compared to the memory controller frequency. Historically, these ratios have been fixed in a 1:1 arrangement and are not user configurable.
Some commentary has appeared to suggest that the default setting for these ratios change between Core i9 and Core i7 – specifically that the Core i7 should default to 2:1 mode when run in DDR4-3200, effectively halving the memory controller which has historically been a limiting factor in DRAM overclocking. We cannot confirm if those are the official specifications at this point. However we can confirm that the motherboards we are testing do offer the user the choice of selecting a 1:1 ratio or a 2:1 ratio.
It should be noted that on all of the motherboards we have tested, all BIOS versions, the actual default operation for a Core i7 running at DDR4-3200 does appear to be the 1:1 mode. For the avoidance of doubt, in our testing on every microcode to date, all of our motherboards were running at a 1:1 ratio.
Performance in Microcode 0x34
So this is the bit that Intel never tells anyone externally. There may be updates for internal frequency adjustments, how aggressive to enable voltage/frequency changes, or even changes to default settings. Motherboard vendors very rarely publish changelogs from BIOS to BIOS – at best we get ‘better CPU support’ or ‘updated memory QVL’. So from that standpoint, we have a blank page.
On the performance side, the article title says it all: small performance gains.
In our CPU performance tests, we are seeing an average +1.8% performance gain across all workloads. This varies between a -4.3% loss in some workloads (Handbrake) up to a +9.7% gain (Compilation). SPEC2017 single thread saw a bigger than average +3.4% gain, however SPEC2017 multi thread saw a -2.1% adjustment, making it slower.
In our GPU performance tests, using our RTX 2080 Ti, we are seeing an average +3% performance gain across all configurations. In one case at low resolution settings it was as high as +12%, however +2-3% was typical at 1080p maximum quality.
Our Core i7-11700K review has all the updated benchmark numbers, and microcode versions 0x2C vs 0x34 are clearly marked.
The reasons for this seem to come down to two main areas of updates that we can determine.
Indirect Cache and Memory Updates
In our original review, we posted that the memory of Rocket Lake with our setup was underperforming, with a regression compared to the previous generation Comet Lake. This was a ‘to be expected’ effect of backporting the design and losing some inefficiencies in that migration, however the original review results showed that the memory latency increase was bigger than expected. Through the new microcode, Intel has fixed this to a degree – we’re still seeing a cache structure performance regression, however it is not as severe.
The L1 cache structure remains at 5 cycle vs 4 cycle, as expected, and the L3 is also as expected, with 13 cycles, similar to the Ice Lake design. In our 0x2C test, the L3 latency was 50.9 cycles, but with the new microcode is now at 45.1 cycles, and is now more in line with the L3 cache on Comet Lake. Despite this change however, we saw no adjustment in core-to-core latency.
Out at DRAM, our 128 MB point reduced from 82.4 nanoseconds to 72.8 nanoseconds, which is a 12% reduction, and more in line with the Comet Lake memory latency response.
It is worth noting that our 12% reduction in DRAM latency is not the +40% reduction that other media outlets are reporting. Whereas others use commercial tools, we feel our internal tools are more accurate. Similarly, for overall DRAM bandwidth, we are seeing a +12% memory bandwidth increase between 0x2C and 0x34, and not the +50% bandwidth others are claiming. (We do see a +50% bandwidth increase from 0x1B to 0x34, but between 0x2C and 0x34 it is only 12%.)
AVX-512 Performance and Power Regression: Microcode, or Motherboard?
One of the big talking points about our initial review was the power draw and temperature of our AVX-512 testing. In our test with the 0x2C BIOS in our first motherboard, we saw a 292 W peak power draw, along with a 104ºC peak temperature. In our metrics gathering, this setup ran the processor at 4.6 GHz during AVX-512 workloads.
On our second motherboard with the 0x34 BIOS, the frequency of the processor under AVX-512 started at 4.6 GHz, but within two seconds dropped to 200 MHz lower, running at 4.4 GHz. This meant that the chip experienced 276 W peak power, which very quickly dropped to 225W. At the peak, this processor has saved 18 W, but in the rest of the test, 60 W was saved for 200 MHz. When you are pushing the silicon to those limits, it really is at the inefficient end of the spectrum.
Despite this reduction in frequency and power, the processor still recorded 103ºC at peak, although levelled to 90ºC after hitting that limit.
The reduction in frequency and power does come at the cost of 3% performance between the 292 W and 276 W peak power modes.
The question here is though whether this change is related to microcode or motherboard. We know that different motherboard vendors apply different strategies when it comes to turbo, how the system responds to high power or temperature, and it very well may come to pass that they also have different attitudes when it comes to applying AVX-512 turbo frequencies. This second board here may have decided to go down the lower AVX-512 frequency route in order to increase the longevity of the components, for example. Exactly how much went to microcode vs motherboard is difficult to determine, and your mileage may vary.
Conclusions to Draw
Given the historically volatile nature of early processor microcode releases, some of our readers were quick to question whether future BIOS updates would heavily change Rocket Lake's performance. The answer, in a nutshell, is no, not significantly (thus far).
As with all reviews, they are just segments in time. A conclusion today may be different in several weeks, depending on updates for increased performance through optimization, decreased performance due to security or longevity adjustments, or simple increase in validated peripheral support. There are motherboards out on the market today with earlier BIOSes than the one we tested, and not everyone updates the BIOS. There are advantages to updating, and motherboard vendors do try and make it as easy as possible – one vendor has implemented an updater in both its AMD and Intel motherboards that starts automatically with a fresh Win10 install.
With the latest BIOS, comparing the 0x2C microcode to the 0x34 microcode, while the needle has been moved in the positive direction overall, it hasn’t changed much, and the conclusions don't change either. The CPU performance is overall an uptick, especially for compilation and web workloads, and gaming performance pushes the i7-11700K nearer towards the 8-core Ryzen 7 5800X, now going beyond the Comet Lake version where before there was a trade-off. This means that the i7-11700K now achieves the result as being the best Intel processor in that span, but until we get pricing, there will be question marks as to its recommendation to the rest of the market.
Our Core i7-11700K review is being updated with the new benchmark test results. This may take up to an hour.
Post Your CommentPlease log in or sign up to comment.
View All Comments
CiccioB - Sunday, March 14, 2021 - linkHT was born as a "hack" of the basic design that allowed Intel to execute two threads on the same core.
At the time Intel cited that the HT support in their core cost them "5% of transistor count" with respect to the transistor used for the core.
So with a cost of 5% of the core size they gave the feature to support a second thread that could give up to twice the performance. Even tough it was quite rare it could do so, with an average contribution more near 30-40%.
Now things have surely changed and core front and back ends complexity probably does not require only 5% of core transistor count to give support to a second thread.
So the general idea is how many transistors (and design complexity) are required to have that 30-40% average performance improvement.
As you can see ARM has never designed a core with a dual thread support. They rely on smaller, simpler, faster, less power hungry designs with more predictable performance. And they are not shy of performance (as a powerful implementation done by Apple demonstrates) and even less shy of available cores. Being smaller they can put more in the same area, even though each of them perform a little less that a bigger core which may use up to 3 or 4 the amount of transistors.
Given that Alder Lake small core design is not an "Atom design" as you may intend as being "the poor slow useless brother" of the bigger one, but it is reported by Intel itself to have an IPC near Skylake core (so near to the same core IPC you are using today), it may be that these little cores can just be an hindrance for Windows/Linux schedulers (that have to decide how to manage them for which job and instruction support) more than a real performance limiters if well managed.
They are also the first try to have real heterogeneous computation in X86 systems with support to different cores that the OS has to support. A preview of what may be needed when Intel will use their MCM solutions with possibly different chiplet mix (high single thread performance cores, low single thread performance cores but at higher number for better multi threading performance tasks, GPU and on Xeon maybe even some sort of FPGA ASICS or highly specialized cores for particular tasks, also improving, I may say finally, customization).
Silver5urfer - Sunday, March 14, 2021 - linkNo Apple M1 is not powerful as it shows. Ryzen already beat it and now with upcoming U series BGA Renoir it will be left in dust.
And too many assumptions out of thin air, MCM and Intel ? Chiplet ? FPGA ? do these exist or even remotely have any product showcase, no they do not. AMD already reinforced they are not chasing this dumb trend of the Big Little joke on the Desktop. EPYC and Icelake do you see them ? Nope we do not. Until the big enterprises get this improvement and massive change in efficiency and fast design, it is nothing. Agilex (Intel) and Xilinx acq by AMD yeah we all know but there has to be products. ADL is simply trying to achieve 8C16T on their new cores whatever lake and add a bunch of trash cores, to bloat up the lost SMT performance, 8C16T+8C of low power cores, that will at max beat the 11700K or a 5800X but no way it's going to beat the 5950X, that is a big thing if Intel can really do that. Esp on a troubled node history 10nm on top. The fact that these do not exist on Server market clearly shows how Intel is literally in shambles, if they had them, which proves their confidence (like how AMD did with MCM)
Silver5urfer - Monday, March 15, 2021 - linkCezanne* Zen 3, Renoir was old Zen 2 design, which already beat M1.
GeoffreyA - Monday, March 15, 2021 - linkMy view is that this big/little is a mistake on Intel's part, and they will learn the hard way. Their efforts are better spent working on a single design but toning down its power, dynamically, as much as possible.
GeoffreyA - Monday, March 15, 2021 - linkOn the other hand, leaving aside big/little for a moment, Gracemont seems to have some novel ideas. It's possible this lower-power design, taken further, could become their chief one in the future. Pentium M, anyone? Or, some of the tricks tested out in GM, could be worked into their main architecture.
CiccioB - Monday, March 15, 2021 - linkSo Apple M1 is not fast because Zen3 in its monolithic form is (probably) faster?
A F1 car that arrive in second place is not fast because it was not first? Maybe with half the cylinders and a smaller tank?
As power consumption seems to be the base for comparing AMD vs Intel, what are the power consumption of the new Zen3 dies vs the consumption of the M1?
How many full featured cores does the Zen3 have, and how many the M1? And in which configuration? Ouch, yes they are.. drum roll, 4 big and 4 small! Not that bad, are they?
And in M1 size and power consumption you have to add all those specialized parts that are not present on AMD APU and, if ever present, are in the external chipset (like Thunderbold, you know.. the fast interface the AMD doesn't provide at all but through... drum rolls... Intel chip! Plus the SATA and USB interfaces).
Despite your try to have a point, M1 cores ARE FAST, and they are not SMT//CMP/HT or whatever.
I think you believe Intel approach to MCM will be the same AMD did, that is one single chiplet for everyone but APUs (And they are not chiplets at all. Guess why?) but it may surprise you as Intel has other more complex and also some simpler (less power hungry) ways to connect dies.
AMD Xilinx acquisition was not something made by chance. It will take time for AMD to get something good out of it (if ever, seen the poor skills AMD has ever shown in anything outside x86 design, often also failing in that) but they are probably pointing to the same target, that is providing some sort of better and easier customization (than to embed it into the chip for each customer = costs in developing, masks and all the QA chain just for few thousands pieces each or bigger designs with all in but disabled part depending on the customer = more costs and worse yields).
You know that Intel already provides Xeon with integrated FPGA, do you? What is so strange to think they could provide that by a separate chiplet in a MCM architecture? Just because AMD can't develop more than 2 dies per generation without going bankruptcy does not mean others cannot create more (and different) ways to exploit the MCM designs.
By introducing (real) heterogeneous computational parts that Linux/Windows have to handle they just pave the road for the future (some years since now).
I think Intel is thinking beyond the simple "more cores for everyone" even if they do not really need them or even when more core = slower frequencies to not melt everything and with scaling advantage even going slim and slim as you add cores (and make them slower).
When you reach a big enough number of cores in the mainstram market you have two chances to sell more:
1. hope that all applications become so multi threaded that the number of cores you have already sold are not enough to handle them, and at the same time they provide faster performance (not that obvious) so you can sell a bigger CPU
2. hope to continuously increase IPC/frequencies to increase the performance to the same user with the same thread limited apps
I can't see any of those become real, so guess who is preparing for then next era?
RanFodar - Sunday, March 14, 2021 - linkYou don't like Intel doing something, don't ya? As long as they bring competition with ADL, then I'm fine with it.
ET - Sunday, March 14, 2021 - linkThanks for the update, Ian.
Makaveli - Sunday, March 14, 2021 - linkThanks for the update Ian.
What AGESA version are you running on the AM4 platform?
Can we make sure that is AM4 AGESA V2 PI 184.108.40.206 for the March 30th review?
JoeDuarte - Sunday, March 14, 2021 - linkAs usual, there appear to be several errors and typos in your results:
"The L1 cache structure remains at 5 cycle vs 4 cycle, as expected, and the L3 is also as expected, with 13 cycles..."
Did you mean L2 above where you said L3?
"In our 0x2C test, the L3 latency was 50.9 cycles, but with the new microcode is now at 45.1 cycles..."
The graph shows 46.2, not 45.1. Which is correct?
"Out at DRAM, our 128 MB point reduced from 82.4 nanoseconds to 72.8 nanoseconds..."
The graph shows 257.5, not 72.8. Which is correct?
You only have one latency graph here – is it the wrong graph? The numbers don't match your graphs from the initial review article either.