DDR5 Demystified - Feat. Samsung DDR5-4800: A Look at Ranks, DPCs, and Do Manufacturers Matter?by Gavin Bonshor on April 7, 2022 8:00 AM EST
- Posted in
- SK Hynix
- Alder Lake
- 12th Gen Core
CPU Performance Benchmarks: DDR5-4800
To show the performance of DDR5 memory in different configurations, we've opted for a more selective and short-form selection of benchmarks from our test suite. This ranges from tests on application opening, rendering, web, and compression.
All of the tests were run with all of the memory at default (JEDEC) settings, which means DDR5-4800 CL40, regardless of the configuration, e.g, 2x16, 2x32, and 4x16 GB.
Web: Speedometer 2
Our test goes through the list of frameworks, and produces a final score indicative of ‘rpm’, one of the benchmarks internal metrics.
We repeat over the benchmark for a dozen loops, taking the average of the last five.
In Speedometer, the 2Rx8/1PDC DDR5-4800 kit performed best of all out of the Samsung memory, with the 1Rx8/1DPC performing closely behind the 2 x 32 GB kit. The 1Rx8/2PDC (4 x 16 GB) kit from Samsung technically performed the slowest of all, but the performance difference was within a 3% margin of error from top to bottom.
The Micron 2 x 32 GB proved the best out of all the memory we tested, albeit without much difference from the rest of the 2 x 32 GB kits tested.
AIDA64: 6.60: link
AIDA64 Extreme has a hardware detection engine unrivaled in its class. It provides detailed information about installed software and offers diagnostic functions and support for overclocking. As it is monitoring sensors in real-time, it can gather accurate voltage, temperature, and fan speed readings, while its diagnostic functions help detect and prevent hardware issues. It also offers a couple of benchmarks for measuring either the performance of individual hardware components or the whole system. It is compatible with all 32-bit and 64-bit Windows editions, including Windows 11 and Windows Server 2022.
We are using AIDA64 in this instance to gather memory bandwidth data based on read speed, write speed, copy speed, and memory latency.
Looking at raw memory benchmarks from AIDA64, all of the 2 x 32 GB kits perform competitively against each other. Meanwhile the Samsung 4 x 16 GB experienced drops in performance across the board, with both read bandwidth and write bandwidth being impacted. There's also a notable latency penalty to consider when using four DIMMs (2DPC) versus two DIMMs (1DPC).
The most interesting result here may very well be the Samsung 2 x 16 GB (1Rx8) kit. While it's fully competitive with read speeds, it loses just a little bit of ground on write speeds, and a little more ground on all-out copies. In what's admittedly a memory-focused test, it's a very early indicator that dual ranked DIMMs are the sweet spot in terms of performance, and that losing a rank does incur penalties. All of which is then further exacerbated by going to 2DPC.
WinRAR 5.90: link
Our WinRAR test from 2013 is updated to the latest version of WinRAR at the start of 2014. We compress a set of 2867 files across 320 folders totaling 1.52 GB in size – 95% of these files are small typical website files, and the rest (90% of the size) are small 30-second 720p videos.
Looking at performance in WinRAR, this is where the higher density 2Rx8 memory showed its dominance. The kits with 16 Gb chips in 2Rx8 outperformed the 16 Gb 1Rx8, with the 2 x 16 GB Samsung kit notably outperforming the same memory running with four UDIMMs in a 2DPC configuration.
Rendering - Blender 2.79b: 3D Creation Suite
A high-profile rendering tool, Blender is open-source allowing for massive amounts of configurability, and is used by a number of high-profile animation studios worldwide. The organization recently released a Blender benchmark package, a couple of weeks after we had narrowed our Blender test for our new suite, however their test can take over an hour. For our results, we run one of the sub-tests in that suite through the command line - a standard ‘bmw27’ scene in CPU only mode, and measure the time to complete the render.
Focusing on rendering, the difference between the 2 x 32 and 2 x 16 GB kit was marginal. The 4 x 16 GB Samsung kit was technically the worst performer out of the bunch, but for all practical purposes, all 5 kits may as well be tied.
Rendering - Cinebench R23: link
Maxon's real-world and cross-platform Cinebench test suite has been a staple in benchmarking and rendering performance for many years. Its latest installment is the R23 version, which is based on its latest 23 code which uses updated compilers. It acts as a real-world system benchmark that incorporates common tasks and rendering workloads as opposed to less diverse benchmarks which only take measurements based on certain CPU functions. Cinebench R23 can also measure both single-threaded and multi-threaded performance.
Using CIneBench 23, there wasn't much difference between the 2 x 32 GB kits in the single-threaded test. In the multi-threaded test, the Samsung 2 x 16 GB kit actually performed better than the 2 x 32 GB kits, underscoring how all of the kits are essentially tied in this workload.
Rendering – POV-Ray 3.7.1: Ray Tracing - link
The Persistence of Vision Ray Tracer, or POV-Ray, is a freeware package for as the name suggests, ray tracing. It is a pure renderer, rather than modeling software, but the latest beta version contains a handy benchmark for stressing all processing threads on a platform. We have been using this test in motherboard reviews to test memory stability at various CPU speeds to good effect – if it passes the test, the IMC in the CPU is stable for a given CPU speed. As a CPU test, it runs for approximately 1-2 minutes on high-end platforms.
In our POV-Ray testing, the Micron kit performed slightly better than the rest, with Samsung's 2 x 32 GB kit coming a close second. Both variations tested with the 16 GB sticks were slightly behind its higher density counterparts. There was around a 0.36% hit in performance when using four 16 GB memory sticks versus using two.
Post Your CommentPlease log in or sign up to comment.
View All Comments
repoman27 - Thursday, April 14, 2022 - linkBut what if you left Chrome running with more than say 4 tabs open while you're gaming?
No, I totally get what you're saying, and I'm fine with the gaming focus in general. But I'm sure there are plenty of regular visitors to this site that are more likely to be running a bunch of VMs or some other workload that might be memory bound in ways that differ from gaming scenarios.
RSAUser - Tuesday, April 19, 2022 - linkA case where you care about this, you're probably a power user, at that point in time it would make sense to also test 64GB/memory exhaustion, as people are not taking old sticks with this, they'd directly buy as much as they need since DDR5.
I can't run my work stack on 32GB RAM, and at home I often enough hit 32GB if I work on a hobby project as I like running my entire stack at once.
Jp7188 - Wednesday, April 13, 2022 - link4x16 (64GB) performed worse in every test vs. 32GB. Thats reasonable assurance mem exhaustion wasn't much of a factor.
Dolda2000 - Thursday, April 7, 2022 - linkI have to admit I don't quite understand the results. I'd expect the disadvantage of 2DPC to be that they may not be able to sustain the same frequencies as 1DPC, but clearly that's not the case here since all kits are in fact running at the same frequency. That being the case, I would expect 1R, 2DPC memory to behave functionally identically to 2R, 1DPC memory, since, at least in my understanding, that's basically the same thing as far as the memory controller is concerned.
What would account for the differences? Were the secondary and/or tertiary timings controlled for?
MrCommunistGen - Thursday, April 7, 2022 - linkI've seen passing comments that running 2DPC really messes with signal integrity on current setups but didn't read into it any further. Since DDR5 has SOME built in error handling, even on non-ECC chips, it could be that signal losses are causing transmission retries which slow things down.
Assuming that signal integrity is the issue, I'm wondering if rev2 or next gen DDR5 motherboards will try to improve the DDR5 memory traces to combat this or if it's something that needs to happen on the memory controller side.
Also, even though the clockspeeds and primary timings are listed as being the same, the motherboard may be automatically adjusting some of the tertiary timings behind the scenes when using 2DPC, which can also have a measurable impact.
Dolda2000 - Thursday, April 7, 2022 - link>Since DDR5 has SOME built in error handling, even on non-ECC chips, it could be that signal losses are causing transmission retries which slow things down.
I had that thought as well, but as far as I understand, DDR5's builtin error-handling is limited entirely to what happens on the die. I don't think there are any error-handling mechanisms on the wire that would allow the memory system to detect errors in transfer and retransmit.
thomasg - Thursday, April 7, 2022 - linkAs far as I know, there are no error correction techniques (such as forward error correction) used for the transmission paths of DDR ram, apart from ECC, thus there are no automatic retransmissions.
The reason why frequencies or timings will suffer for multiple DIMMs per channel may be as simple as signal runtime.
Electrical signals theoretically travel at the speed of light, but high frequency signals exhibit significant propagation delay, depending on trace design and PCB material. About half the speed of light (~150,000 km/s) is a fair assumption for typical PCB traces with DIMM sockets.
With DDR5-4800, we're talking about clock cycles of 2400 MHz, which translates to 1 cycle per 400 femtoseconds.
In 400 femtoseconds, the electrical high-frequency signal can travel 6 centimeters.
Thus, with 3 centimeters longer traces between DIMM_A and DIMM_B their signals would be 180°out of phase.
Since we're talking DDR, the rising and falling edge of the clock is used to transmit data, which means the signal timings need to be a lot tighter than 180˚, likely below 90˚, which limits the difference to 1.5 cm.
It's not hard to imagine that this is a significant constraint to PCB layout.
Traces can be length matched, but with wide parallel channels (64/72 traces), this is very tricky and cannot be done exactly, as it would be for narrower channels (i.e. 4 or 8 traces).
As you might have noticed, I'm a radio guy and don't have the slightest clue about DDR memory, so take this with a grain of salt.
repoman27 - Friday, April 8, 2022 - linkJust to add a few grains of salt...
DDR5 actually does support cyclical redundancy check (CRC) for read and write operations.
Depending on the material used for the PCB, the signal speed for microstrips might be slightly better than 1/2 c, maybe closer to 1/1.7 c or 58.5% of the speed of light.
And according to my calculator at least, 1 ÷ 2,400,000,000 = 0.000000000416667 = 416.667 picoseconds for the unit interval.
And not to downplay the issues you point out in designing DDR5 memory systems, but Alder Lake also supports PCI Express Gen5, which involves routing 64 traces operating at 16.0 GHz for an x16 slot. Serial point-to-point using differential signaling, so not the same thing, but still bonkers nonetheless.
Jp7188 - Wednesday, April 13, 2022 - linkCorrect me if I'm wrong, but crc without fec = chance of retransmission = increased latency?
repoman27 - Thursday, April 14, 2022 - linkYes, but if your BER is even close to reasonable, the additional latency from retries should be negligible. And it's not like ECC or FEC are exactly free. You want to do whatever you can to keep the error rate within acceptable tolerances before resorting to the additional overhead / complexity of error correction.