DDR5 Demystified - Feat. Samsung DDR5-4800: A Look at Ranks, DPCs, and Do Manufacturers Matter?
by Gavin Bonshor on April 7, 2022 8:00 AM EST- Posted in
- Memory
- Intel
- Samsung
- Micron
- SK Hynix
- DDR5
- Alder Lake
- DDR5-4800
- 12th Gen Core
- Z690
The hottest advancement in memory technology for desktop computers in recent years is undoubtedly the release of DDR5 memory and Intel's 12th Gen Core series of processors. Not only does DDR5 memory yield higher memory bandwidth for many different use cases, but DDR5 also offers a generational increase in memory capacity, allowing for higher capacity UDIMMs over time.
But, as always, the memory market is anything but homogenous. Even with just three actual DRAM manufacturers, DIMM vendors are offering DDR5 at a slew of clockspeeds, both official JEDEC speeds and X.M.P. profile memory that essentially comes overclocked out of the box. There are also notable differences in today's common DDR5 DIMM configurations, including single-sided UDIMMs (1Rx8), and dual-sided memory (2Rx8), as well as UDIMMs with different capacities.
In today's piece, we're looking at DDR5-4800 memory from Samsung, including 2 x 32 GB, 2 x 16 GB, and 4 x 16 GB, to measure the performance differences between single and dual rank memory, as well as any differences between running DDR5 in one DIMM Per Channel (DPC) or two. Finally, as we have DDR5-4800 DIMMs with DRAM from Micron and SK Hynix, too, we'll also be looking at these in our results, to see if there are any performance differences among the three memory manufacturers.
Scaling With DDR5 Memory: The Story So Far
In December 2021, we tested the performance scalability of DDR5 memory using G.Skill's Trident Z5 2 x 16 GB kit of DDR5-6000 CL36 memory at a range of different frequencies. Our findings showed that going from JEDEC settings of DDR4-4800 CL36 up to DDR4-6400 CL36 yielded a performance gain of 14% when using one of our most memory-sensitive benchmarks, WinRAR 5.90. The consensus here was that using faster memory did allow for an uplift in practically all of the scenarios we tested. Still, the caveat was that due to the high prices of faster kits, there currently isn't any kind of price/performance sweet spot beyond current DDR5-4800 JEDEC kits – the price premium for high-speed kits is currently greater than the performance benefits.
Samsung DDR5-4800B CL40 memory (2 x 32 GB) (2Rx8)
Today's Test: Ranks, DPCs, and Do Memory Manufacturers Matter?
Since that initial article was focused on memory frequencies and latencies, we wanted to take a look at the other elements of the equation for DDR5 performance. This includes DIMM ranks, the number of DIMMs in a single memory channel, and even the memory manufacturers themselves. We've seen what happens when we play with the first two variables, now what happens when we play with the rest?
Focusing on DDR5 DIMM configurations, the DDR5 memory modules currently available for consumers using Intel's 12th Gen Core series come in four different combinations. This includes single rank (1Rx8) and dual rank (2Rx8) DIMMs, which in turn typically come in kits of two or four, making for 1 DIMM Per Channel (1DPC) or 2 DIMMs Per Channel (2DPC) respectively. And, as we'll see in our testing, both ranks and DPCs do impact DDR5 performance, so there is much more to getting the most out of DDR5 memory than just frequency and latencies.
The fundamental questions we want to answer in this article are:
- Is there a difference in performance between 1Rx8 and 2Rx8 with DDR5 memory?
- Is there a difference in performance when using 1DPC versus 2DPC (2x32GB vs. 4x16GB)?
- Is there a difference in performance between memory from different manufacturers at identical timings?
To explore the performance differences in of DDR5 operating in different ranks and DPCs, Samsung has sent over a collection of their latest DDR5-4800B DIMMs in two different configurations/capacities: 16GB 1Rx8 DIMMs, and 32GB 2Rx8 DIMMs. As one of the Big Three DRAM manufacturers, Samsung has an enormous presence in the memory market, but this is actually the first time the company has ever sampled consumer-grade UDIMMs. So we're excited to see how their own in-house DIMMs do in this respect.
With Samsung's DIMMs in hand, we've been able to test between the two different configurations, to see if 1Rx8 versus 2Rx8 is better from a performance perspective. We're also able to measure the impact of moving from 1DPC to 2DPC, an always interesting matter as DDR memory signaling has become increasingly difficult with each generation.
Crucial (Micron) DDR5-4800B CL40 memory (2 x 32 GB) (2Rx8)
Finally, as we already have 32 GB (2Rx8) kits from SK Hynix and Micron, this has also allowed us to do a true apples-to-apples comparison among different DIMM kits at stock JEDEC speeds. With JEDEC timings all memory should be equal, so this is the perfect opportunity to test and confirm that notion. Plus the additional DIMMs give us a good control to compare the performance of the Samsung DIMMs to, to make sure there aren't any Samsung-specific oddities going on.
Ranks & DPCs: A Quick Refresher
Looking at the configurations of DDR5 UDIMMs, there are two main types, 1Rx8 and 2Rx8. The R in the 1Rx8 stands for rank, so 1Rx8 means it has one rank with eight memory chips per rank. 2Rx8, in turn, means there are two ranks of eight memory chips. In practice, with today's DDR5 DIMMs, 1Rx8 will be a single-sided DIMM, whereas 2Rx8 is always dual-sided. We should note that there is also quad rank (4Rx8) memory as well, but this is typically found in servers and not usually meant for consumer platforms (where it may not even be supported to begin with).
Because the number of chips per rank is fixed – and so is the density of the first-gen DDR5 dies available today – the capacity of today's DDR5 DIMMs is directly proportional to the number of ranks. 32GB DIMMs are exclusively 2Rx8, using a total of 16 16Gb dies, while 16GB DIMMs will drop down to 8 of those dies.
And, of course, on most motherboards it's possible to populate either one or two DIMMs per Channel (DPC). Adding a second DIMM to a channel allows for doubling the total memory capacity of a system – going up to 128GB with today's systems – but it also comes with a potential performance tradeoff. Driving multiple DIMMs in a single memory channel is harder on the memory controller than driving a lone DIMM, which means going beyond two DIMMs can be a tradeoff between capacity and performance, rather than a means of increasing both. Which makes for a particularly interesting question right now when trying to build a 64GB system: is it better to go with two 2Rx8 DIMMs, or four 1Rx8 DIMMs?
Samsung, SK Hynix, and Micron DDR5 Manufacturing: The Differences
We have dedicated many column inches to DDR5 memory over the last couple of years; each memory manufacturer has its own process used to produce its memory ICs. Each different memory manufacturer also has its own manufacturing process and design application to create its dies for the memory chips.
DDR5 Manufacturing Differences/Specifications (Samsung, SK Hynix, Micron) |
|||
Manufacturer | Samsung | SK Hynix | Micron |
Speed (MT/s) | DDR5-4800 | DDR5-4800 | DDR5-4800 |
Bandwidth | 38.4 GB/s | 38.4 GB/s | 38.4 GB/s |
Die Area | 73.58 mm² | 75.21 mm² | 66.26 mm² |
Die Capacity | 16 Gb | 16 Gb | 16 Gb |
Process Node | Samsung D1y | SK Hynix D1y | Micron D1z |
All of the DDR5-4800B memory we are testing in this article is what I like to call the 'early adopter DDR5', which means it's the using first iteration of DDR5 memory from each of the vendors.
On the early release silicon for DDR5 DRAM, each of its packaged dies measures up slightly different as each manufacturer uses its process. On the Samsung UDIMMs, it uses its SD1y node, with the die package area measuring at 73.55 mm². In contrast, the SK Hynix memory ICs measure at 75.21 mm² and uses that company's D1y node, while the smallest of the three is the Micron, with a package size of 66.26 mm² and is made using Micron's D1z node.
Looking forward, each of the Big Three is planning to use the latest in Extreme Ultraviolet (EUV) lithography to produce bigger, higher density, and faster dies in the future, many of which are already in early production. This includes the Samsung 14 nm node with EUV lithography, which should allow the mass production of DDR5-7200 memory (and hopefully, a JEDEC standard to go with it). Though as we're also dealing with the first generation of DDR5 memory controllers, the speeds attainable on Intel's 12th Gen Core, for example, are limited. When further advancements can be made, this should allow for faster DDR5 memory to be used in on future PC platforms.
ADATA (SK Hynix) DDR5-4800B CL40 memory (2 x 32 GB) (2Rx8)
Test Bed and Setup
Given DDR5 is a premium product, we have opted to use a premium platform for our test bed. This includes Intel's highest performing Core Series processor, the Core i9-12900K, and paired it up with a premium motherboard from MSI, the MSI MPG Z690 Carbon Wi-Fi. For our testing, we've left the Intel Core i9-12900K at default settings as per the firmware, with no changes made to CPU core frequency, memory frequency, or memory latencies.
For our testing, we are using the following:
DDR5 Memory Test Setup (Alder Lake) | |
Processor | Intel Core i9-12900K, $589 125 W, 8+8 Cores, 24 Threads 3.2 GHz Base, 5.2 GHz P-Core Turbo |
Motherboard | MSI Z690 Carbon Wi-Fi |
Cooling | MSI Coreliquid 360mm AIO |
Power Supply | Corsair HX850 |
Memory | Samsung DDR5-4800B CL40 (2 x 16 GB) - 1Rx8/1PDC Samsung DDR5-4800B CL40 (4 x 16 GB) - 1Rx8/2DPC Samsung DDR5-4800B CL40 (2 x 32 GB) - 2Rx8/1DPC SK Hynix DDR5-4800B CL40 (2 x 32 GB) - 2Rx8/1DPC Micron DDR5-4800B CL40 (2 x 32 GB) - 2Rx8/1DPC |
Video Card | NVIDIA RTX 2080 Ti, Driver 496.49 |
Hard Drive | Crucial MX300 1TB |
Case | Open Benchtable BC1.1 (Silver) |
Operating System | Windows 11 Up to Date |
Head on to the next page as we test the different DDR5-4800B memory kits and give our analysis.
66 Comments
View All Comments
repoman27 - Thursday, April 14, 2022 - link
But what if you left Chrome running with more than say 4 tabs open while you're gaming?No, I totally get what you're saying, and I'm fine with the gaming focus in general. But I'm sure there are plenty of regular visitors to this site that are more likely to be running a bunch of VMs or some other workload that might be memory bound in ways that differ from gaming scenarios.
RSAUser - Tuesday, April 19, 2022 - link
A case where you care about this, you're probably a power user, at that point in time it would make sense to also test 64GB/memory exhaustion, as people are not taking old sticks with this, they'd directly buy as much as they need since DDR5.I can't run my work stack on 32GB RAM, and at home I often enough hit 32GB if I work on a hobby project as I like running my entire stack at once.
Jp7188 - Wednesday, April 13, 2022 - link
4x16 (64GB) performed worse in every test vs. 32GB. Thats reasonable assurance mem exhaustion wasn't much of a factor.Dolda2000 - Thursday, April 7, 2022 - link
I have to admit I don't quite understand the results. I'd expect the disadvantage of 2DPC to be that they may not be able to sustain the same frequencies as 1DPC, but clearly that's not the case here since all kits are in fact running at the same frequency. That being the case, I would expect 1R, 2DPC memory to behave functionally identically to 2R, 1DPC memory, since, at least in my understanding, that's basically the same thing as far as the memory controller is concerned.What would account for the differences? Were the secondary and/or tertiary timings controlled for?
MrCommunistGen - Thursday, April 7, 2022 - link
I've seen passing comments that running 2DPC really messes with signal integrity on current setups but didn't read into it any further. Since DDR5 has SOME built in error handling, even on non-ECC chips, it could be that signal losses are causing transmission retries which slow things down.Assuming that signal integrity is the issue, I'm wondering if rev2 or next gen DDR5 motherboards will try to improve the DDR5 memory traces to combat this or if it's something that needs to happen on the memory controller side.
Also, even though the clockspeeds and primary timings are listed as being the same, the motherboard may be automatically adjusting some of the tertiary timings behind the scenes when using 2DPC, which can also have a measurable impact.
Dolda2000 - Thursday, April 7, 2022 - link
>Since DDR5 has SOME built in error handling, even on non-ECC chips, it could be that signal losses are causing transmission retries which slow things down.I had that thought as well, but as far as I understand, DDR5's builtin error-handling is limited entirely to what happens on the die. I don't think there are any error-handling mechanisms on the wire that would allow the memory system to detect errors in transfer and retransmit.
thomasg - Thursday, April 7, 2022 - link
As far as I know, there are no error correction techniques (such as forward error correction) used for the transmission paths of DDR ram, apart from ECC, thus there are no automatic retransmissions.The reason why frequencies or timings will suffer for multiple DIMMs per channel may be as simple as signal runtime.
Electrical signals theoretically travel at the speed of light, but high frequency signals exhibit significant propagation delay, depending on trace design and PCB material. About half the speed of light (~150,000 km/s) is a fair assumption for typical PCB traces with DIMM sockets.
With DDR5-4800, we're talking about clock cycles of 2400 MHz, which translates to 1 cycle per 400 femtoseconds.
In 400 femtoseconds, the electrical high-frequency signal can travel 6 centimeters.
Thus, with 3 centimeters longer traces between DIMM_A and DIMM_B their signals would be 180°out of phase.
Since we're talking DDR, the rising and falling edge of the clock is used to transmit data, which means the signal timings need to be a lot tighter than 180˚, likely below 90˚, which limits the difference to 1.5 cm.
It's not hard to imagine that this is a significant constraint to PCB layout.
Traces can be length matched, but with wide parallel channels (64/72 traces), this is very tricky and cannot be done exactly, as it would be for narrower channels (i.e. 4 or 8 traces).
As you might have noticed, I'm a radio guy and don't have the slightest clue about DDR memory, so take this with a grain of salt.
repoman27 - Friday, April 8, 2022 - link
Just to add a few grains of salt...DDR5 actually does support cyclical redundancy check (CRC) for read and write operations.
Depending on the material used for the PCB, the signal speed for microstrips might be slightly better than 1/2 c, maybe closer to 1/1.7 c or 58.5% of the speed of light.
And according to my calculator at least, 1 ÷ 2,400,000,000 = 0.000000000416667 = 416.667 picoseconds for the unit interval.
And not to downplay the issues you point out in designing DDR5 memory systems, but Alder Lake also supports PCI Express Gen5, which involves routing 64 traces operating at 16.0 GHz for an x16 slot. Serial point-to-point using differential signaling, so not the same thing, but still bonkers nonetheless.
Jp7188 - Wednesday, April 13, 2022 - link
Correct me if I'm wrong, but crc without fec = chance of retransmission = increased latency?repoman27 - Thursday, April 14, 2022 - link
Yes, but if your BER is even close to reasonable, the additional latency from retries should be negligible. And it's not like ECC or FEC are exactly free. You want to do whatever you can to keep the error rate within acceptable tolerances before resorting to the additional overhead / complexity of error correction.