Investigating Performance of Multi-Threading on Zen 3 and AMD Ryzen 5000
by Dr. Ian Cutress on December 3, 2020 10:00 AM EST- Posted in
- CPUs
- AMD
- Zen 3
- X570
- Ryzen 5000
- Ryzen 9 5950X
- SMT
- Multi-Threading
Gaming Performance (Discrete GPU)
For our gaming tests, we are using our AMD Ryzen 9 5950X paired with an NVIDIA RTX 2080 Ti graphics card. Our standard test suite consists of 12 titles, tested at four configurations:
- Stage 1: Actual Gaming (1080p Maximum Quality, or equivalent)
- Stage 2: All About Pixels (‘4K Minimum’ Quality)
- Stage 3: Medium Low (‘1440p Minimum’)
- Stage 4: Lowest Lows (720p Minimum or lower)
The final three settings are a set of CPU-limited gaming, and help find the limit of where we move from CPU limited to GPU limited. Some users baulk at this testing finding it irrelevant, however these configurations have been widely requested over the years. The contraire to this testing is the first setting, at 1080p Maximum: this being requested given that 1080p is the most popular gaming resolution, and Maximum Quality because this graphics card should be able to handle almost everything at that resolution at very playable framerates.
All the details for our gaming tests can be found in our #CPUOverload article.
Stage 1: Actual Gaming AMD Ryzen 9 5950X, SMT On vs SMT Off |
|||
AnandTech | Settings | Average FPS |
95th Percentile |
Chernobylite | 1080p Max | 100% | - |
Civilization 6 | 1080p Max | 103% | - |
Deus Ex: MD | 1080p Max | 99% | 100% |
Final Fantasy 14 | 1080p Max | 102% | - |
Final Fantasy 15 | 8K Standard | 100% | 99% |
World of Tanks | 1080p Max | 100% | 102% |
World of Tanks | 4K Max | 103% | 102% |
Borderlands 3 | 1080p Max | 101% | 103% |
F1 2019 | 1080p Ultra | 103% | 106% |
Far Cry 5 | 1080p Ultra | 104% | 104% |
GTA V | 1080p Max | 99% | 100% |
RDR 2 | 1080p Max | 100% | 100% |
Strange Brigate | 1080p Ultra | 101% | 101% |
In real-world gaming situations, there’s very little to pick between having SMT enabled or disabled. Almost universally it is either beneficial or a smidgen better to have it enabled, with F1 2019, Civilization 6, and Far Cry 5 seemingly the best recipients. I’ve also added in the Stage 3 result from World of Tanks, just because that benchmark doesn’t really have a proper settings menu.
Stage 2: All About Pixels AMD Ryzen 9 5950X, SMT On vs SMT Off |
|||
AnandTech | Settings | Average FPS |
95th Percentile |
Chernobylite | 4K Low | 99% | - |
Civilization 6 | 4K Min | 105% | - |
Deus Ex: MD | 4K Min | 98% | 100% |
Final Fantasy 14 | 4K Min | 102% | - |
Final Fantasy 15 | 4K Standard | 100% | 100% |
Borderlands 3 | 4K Very Low | 101% | 104% |
F1 2019 | 4K Ultra Low | 100% | 100% |
Far Cry 5 | 4K Low | 101% | 100% |
GTA V | 4K Low | 100% | 101% |
RDR 2 | 8K Min | 100% | 100% |
Strange Brigate | 4K Low | 100% | 100% |
With our high resolution settings with minimal quality, there is only one outlier in Civilization 6 on the average frame rates, which seem to be a bit higher when SMT is enabled.
Stage 3: Medium Low AMD Ryzen 9 5950X, SMT On vs SMT Off |
|||
AnandTech | Settings | Average FPS |
95th Percentile |
Chernobylite | 1440p Low | 100% | - |
Civilization 6 | 1440p Min | 105% | - |
Deus Ex: MD | 1440p Min | 97% | 96% |
Final Fantasy 14 | 1440p Min | 102% | - |
Final Fantasy 15 | 1080p Standard | 101% | 105% |
World of Tanks | 1080p Standard | 101% | 101% |
Borderlands 3 | 1440p Very Low | 103% | 105% |
F1 2019 | 1440p Ultra Low | 99% | 99% |
Far Cry 5 | 1440p Low | 99% | 99% |
GTA V | 1440p Low | 100% | 99% |
RDR 2 | 1440p Low | 100% | 100% |
Strange Brigate | 1440p Low | 100% | 100% |
At the more medium settings, we’re starting to see some more variation (Borderlands gets a few percent from SMT). We’re starting to see Deus Ex:MD drop off a bit with SMT enabled.
Stage 4: Lowest Lows AMD Ryzen 9 5950X, SMT On vs SMT Off |
|||
AnandTech | Settings | Average FPS |
95th Percentile |
Chernobylite | 360p Low | 106% | - |
Civilization 6 | 480p Min | 102% | - |
Deus Ex: MD | 600p Min | 91% | 91% |
Final Fantasy 14 | 768p Min | 102% | - |
Final Fantasy 15 | 720p Standard | 99% | 102% |
World of Tanks | 768p Min | 101% | 100% |
Borderlands 3 | 360p Very Low | 108% | 110% |
F1 2019 | 768p Ultra Low | 102% | 105% |
Far Cry 5 | 720p Low | 100% | 101% |
GTA V | 720p Low | 99% | 98% |
RDR 2 | 384p Low | 100% | 103% |
Strange Brigate | 720p Low | 95% | 95% |
This is perhaps our most varied set of results, with Deus Ex:MD showing an almost 10% drop with SMT enabled. DEMD is usually considered a CPU title, but so is Chernobylite, which sees a 6% gain. Borderlands is +8-10% with SMT enabled, which is more of a modern game. However, I doubt anyone is playing at these resolutions.
Overall Gaming Performance
If we take full averages from all the data points, then we’re seeing a rough +1% gain in performance in the more complex scenarios across the board.
Resolution Average Comparison AMD Ryzen 9 5950X, SMT On vs SMT Off |
||||
AnandTech | Setting | aka | Average FPS |
95th Percentile |
Stage 1 | 1080p Max | Actual Gaming | 101% | 101% |
Stage 2 | 4K+ Min | All About Pixels | 101% | 101% |
Stage 3 | 1440p Min | Medium Lows | 101% | 101% |
Stage 4 | < 768p Min | Lowest Lows | 100% | 101% |
In reality, any loss or gain is highly dependent on the title in question, and can swing from one side of the line to the other. It’s clear that Deus Ex prefers SMT off, and F1 2019 or Borderlands prefers SMT on, but we are talking fine margins here.
126 Comments
View All Comments
Bomiman - Saturday, December 5, 2020 - link
That common knowledge is a few years old now. It was once common knowledge that games only used one thread.Consoles now have 3 times as many threads as before, and that's in a situation where 4t Cpus are barely usable and 4c 8t Cpus are obsolete.
MrPotatoeHead - Tuesday, December 15, 2020 - link
Xbox360 came out in 2005. 3C/6T. Even the PS3 had a 1C/2T PowerPC PPE and 6 SPEs, so a total of 8T. PS4/XO is 8C/8T. Though I guess we could blame lack of CPU utilization still on this last generation using pretty weak cores from the get go. IIRC 8 core Jaguar would be on par with an Intel i3 at the time of these console releases.Though, the only other option AMD had was Piledriver. Piledriver still poor performer, a power hog, and it would likely only been worth it over 8 Jaguar cores if they went with a 3 or 4 module chip.
It is nice that this generation MS and Sony both went all out on the CPU. Just too bad they aren't Zen 3 based. :(
Dolda2000 - Friday, December 4, 2020 - link
It should be kept in mind that, at the time when AMD criticized Intel for that, that was when AMD had actual dual-cores (A64x2) and Intel still had single-cores with HT, which makes the criticism rather fair.Xajel - Sunday, December 6, 2020 - link
"Intel's HT approach proved superior".Intel's approach wasn't that much superior. In fact, in the early days of Intel's HTT processors, many Applications, even ones which supposed to be optimised for MC code path was getting lower scores with HTT enabled than when HTT was disabled.
The main culprit was that Applications were designed to handle each thread in a real core, not two threads in a single core, the threads were fighting for resources that weren't there.
Intel knew this and worked hard with developers to make them know the difference and apply this change to the code path. This actually took sometime till Multi-Core applications were SMT aware and had a code path for this.
For AMD's case, AMD's couldn't work hard enough like Intel with developers to make them have a new code path just for AMD CPU's. Not to mention that intel was playing it dirty starting with their famous compiler which was -and still- used by most developers to compile applications, the compilers will optimise the code for intel's CPU's and have an optimised code path for every CPU and CPU feature intel have, but when the application detect a non-Intel CPU, including AMD's it will select the slowest code path, and will not try to test the feature and choose a code path.
This applied also to AMD's CPU's, while sure the CPU's lacked FPU performance, and was not competitive enough (even when the software was optimised), but the whole optimisation thing made AMD's CPU inefficient, the idea should work better than Intel, because there's an actual real hardware there (at least for Integer), but developers didn't work harder, and the intel compiler played a major role for smaller developers also.
TL'DR, the main issue was the intel compiler and lack of developers interest, then the actual cores were also not that much stronger than intel's (IPC side), AMD's idea should have worked, but things weren't in their side.
And by the time AMD came with their design, they were already late, applications were already optimised for Intel HTT which became very good as almost all applications became SMT aware. AMD acknowledged this and knew that they must take what developers already have and work on it, they also worked hard on their SMT implementation that it is touted now that their SMT is better intel's own SMT implementation (HTT).
Keljian - Sunday, January 10, 2021 - link
Urm no, intel’s compiler isn’t used often these days unless you’re doing really heavy maths. Microsoft’s compiler is used much more often, though clang is taking offpogsnet - Tuesday, December 29, 2020 - link
During P4, HT gives no difference in performance compared to AMD64 but on Core2Duo there it shows better performance. Probably because we have only 2-4 cores and not enough for our multi tasking needs, Now we have 4-32 cores plus much powerful and efficient cores, hence, SMT maybe not that significant already that is why on most test it shows no big performance lift.willis936 - Thursday, December 3, 2020 - link
5%? I think more than 5% is needed for a whole second set of registers plus the logic needed to properly handle context switching. Everything in between the cache and pipeline needs to be doubled.tygrus - Thursday, December 3, 2020 - link
Register rename means they already have more registers that don't need to be copied. The register renaming means they have more physical registers than logical registers exposed to programmer. Say you have: 16 logical registers exposed to coder per thread; 128 rename registers in HW; SMT 2tgreads/core = same 16 logical but each thread has 64 rename registers instead of 128.Compare mixing the workloads eg. 8 int/branch heavy with 8 FP heavy on 8 core; or OS background tasks like indexing/search/AntiVirus.
MrSpadge - Thursday, December 3, 2020 - link
The 5% is from Intel for the original Pentium 4. At some point in the last 10 years I think I read a comparable number, probably here on AT, regarding a more modern chip.Wilco1 - Friday, December 4, 2020 - link
There is little accurate info about it, but the fact is that x86 cores are many times larger than Arm cores with similar performance, so it must be a lot more than 5%. Graviton 2 gives 75-80% of the performance of the fastest Rome at less than a third of the area (and half the power).