Enmotus MiDrive: Rethinking SLC Caching For QLC SSDs
by Billy Tallis on January 30, 2020 8:00 AM ESTFor consumer storage, CES 2020 brought a new wave of competition for PCIe 4.0 SSDs and promise of faster portable SSDs, but the most intriguing product demo was from Enmotus. They are planning a profound change to how consumer SSDs work, ditching drive-managed SLC caching in favor of host-managed tiered storage.
Enmotus is a well-established provider of storage management software. Their most familiar product to consumers is probably FuzeDrive, a limited edition of which is bundled with recent generations of AMD motherboards as AMD StoreMI. This serves as AMD's answer to Intel's Smart Response Technology (SRT) and Optane Memory storage caching systems. Enmotus also has enterprise-oriented products in the same vein. Their new MiDrive technology builds on their existing tiering software to manage a combination of SLC and QLC NAND on a single consumer SSD.
Caching and Tiering Challenges
All software-driven caching or tiering solutions tend to have limited consumer appeal due to the complexity of setting up the system. At least two physical drives are required, and the OS needs to load an extra driver to manage data placement. Any compatibility issue or other glitch can easily render a PC unbootable, and data recovery isn't as straightforward as for a single drive. These hurdles don't scare off enthusiasts and power users, but PC OEMs aren't eager to market and support these configurations. But without some form of caching or tiering, consumer SSDs would be limited to the raw performance of TLC or QLC NAND. SLC caching managed transparently by the SSD's firmware has been adopted by almost all consumer SSDs in order to improve burst performance, and it has proven to be very effective for consumer workloads. The fundamental limitation of this strategy is that the SSD must work with limited information about the nature and purpose of the user data it is reading and writing.
Most SSDs rely on fairly simple procedures for managing their SLC caches: sending all writes to the cache unless it's full, and using idle time to fold data from SLC into more compact TLC representation, freeing up cache space for future bursts of writes. There are still some choices to be made in implementing SLC caching for consumer SSDs: whether to use a fixed-size cache or dynamically sized, and whether to stall when the cache fills up or divert writes straight to TLC/QLC. As QLC drives become more common, we're also seeing drives that prefer to keep data in the SLC cache long-term until the drive starts to fill up, so that the cache can help with read performance in addition to write performance.
Enmotus FuzeDrive manual data placement controls
Host-managed caching or tiering opens the door to more intelligent management of data placement, since the host OS has better information: about which chunks of data belong to what file, and about the processes and users that interact with those files. It is easier for the host OS to accurately track the history of access patterns for hot vs. cold files. It is also possible to expose manual control of data placement directly to the user.
Two Drives In One
The Enmotus MiDrive technology allows one SSD to present the host with access to two separate pools of flash storage: QLC and SLC managed by the same SSD controller. To implement this, they have partnered with Phison to modify SSD controller firmware. For server products, a single NVMe SSD would expose two separate NVMe namespaces that Linux treats as different block devices. But for consumers, Enmotus has chosen to maximize backwards compatibility by having the MiDrive present itself as a single block device, with the first 32 or 64 GB initially mapped to SLC NAND and the rest of the drive mapped to QLC NAND. This makes it possible (and fast!) to install an OS to a MiDrive without needing any special Enmotus software or drivers. Once the Enmotus driver has been loaded, it takes over the management of data placement using vendor-specific commands that instruct the SSD to promote or demote ranges of Logical Block Addresses (LBAs) between the QLC and SLC pools of flash. (The initialization process for this tiering currently takes about a quarter of a second, because very little data needs to be moved until there's history indicating what should be in QLC vs SLC.)
Enmotus MiDrive 800GB appearing as a single device
This is a lot simpler for the host side than the strategy Intel uses for their Optane Memory H10, which is two separate PCIe devices on one M.2 card and requires special motherboard support to properly detect both halves before the caching software can even get involved. Enmotus is working to make MiDrive even simpler by having Microsoft distribute the Enmotus driver with Windows, so that a MiDrive will be automatically detected and managed by the Enmotus software without requiring any user intervention. For now, Windows will default to using its standard NVMe driver for a MiDrive, but that should change by the time products hit the shelves.
Example of how MiDrive LBA allocation will change with use
(for illustration purposes only, not based on real testing)
Enmotus supports assigning data to SLC or QLC in 4MB chunks, which is probably the size of a single NAND flash erase block in SLC mode, and thus the smallest chunk size that can easily be remapped between the QLC and SLC portions of the drive without contributing to unnecessary write amplification. That 4MB block size means that a small file moved to SLC is likely to bring along other nearby files, which will often contain related data that may also benefit from being in SLC. It also means that large files can be partially resident in SLC and partially in QLC. Since this process doesn't change the logical block addresses a file occupies, Enmotus MiDrive doesn't need to change anything about how NTFS organizes data, and it doesn't need to behave like an advanced disk defragmenter that tries to move important data toward the beginning of the disk. The MiDrive software only needs to look up what LBAs are used by a file and tell the SSD whether to move that data to SLC or QLC blocks. The only side-effect visible to the rest of the OS is a change in the performance characteristics for accessing that part of the SSD.
The SLC portion of an Enmotus MiDrive differs from a traditional SLC cache not only by being host-managed, but also in how the SSD treats it for wear leveling purposes. A typical SSD's SLC cache may have a static or dynamic size, but in either case when new write commands arrive the SSD will write the data to whatever NAND flash block is currently empty. When the cache is flushed, data from several SLC blocks will be rewritten in TLC or QLC mode to a different empty block, and the SLC blocks are then free to be erased and put back into the pool of available blocks. Managing just one pool of empty blocks means that the actual physical location of the SLC cache can move around over time, and a block that was last used as TLC might end up being used as SLC the next time data is written to it.
By contrast, Enmotus MiDrive technology has the SSD track two entirely separate pools. When the drive is manufactured, the SLC portion is permanently allocated for the lifetime of the drive. Any physical NAND pages and blocks that are used as SLC will always be treated as SLC for the lifetime of the drive, and the same for the QLC portion. The two pools of flash are subject to completely independent wear leveling, even though SLC and QLC portions will exist side by side on each physical flash chip on the drive. This means that the QLC blocks will never be subjected to the short-term Program/Erase cycles of SLC cache filling and flushing. For the SLC blocks, the error correction can be tuned specifically to SLC usage, and that allows Enmotus to achieve around 30k Program/Erase cycles for the SLC portion of the drive (based on Micron QLC NAND). MiDrives will expose separate SMART indicators for the SLC and QLC portions of the drive, so monitoring software will need to be updated to properly interpret this information.
In principle, it would be possible for either the SLC or QLC portion of the drive to be worn out prematurely, but in practice Enmotus is confident that their tiered storage management software will lead to longer overall drive lifespans than drive-managed SLC caching. Files that are known to be frequently modified will permanently reside on SLC and not be automatically flushed out to QLC during idle time. If the Enmotus software is smart enough, it will also be able to determine which files should skip the SLC and go straight to QLC until it becomes clear that a file is frequently accessed. For example, a file download coming into the machine over gigabit Ethernet will not initially need SLC performance because raw QLC can generally handle sequential writes at that speed (especially with no background SLC cache flushing to slow things down). And if that file is a movie which is infrequently accessed and only read sequentially, there's no reason for it to ever be promoted up to SLC. In general, the tiered storage management done by Enmotus should result in less data movement between SLC and QLC, rather than the increased write amplification that traditional SLC caching causes.
Since the SLC portion of an Enmotus MiDrive is a slice carved out of regular QLC NAND, it cannot offer all the benefits of specialized low-latency SLC NAND like Samsung's Z-NAND or Kioxia/Toshiba XL-Flash. The SLC portion of a MiDrive won't be appreciably faster than the SLC cache of a traditional consumer SSD, but that performance will be more consistent and predictable when working with files that are kept entirely on the SLC portion of the drive.
The Business Model
Enmotus MiDrive is currently implemented as a combination of Windows driver software and custom SSD firmware for Phison NVMe controllers, but it does not require any custom hardware. This means that any vendor currently selling Phison E12 NVMe SSDs can make a MiDrive-based product by licensing and shipping Enmotus firmware. PC OEMs can adopt MiDrives by switching to drives with Enmotus firmware and ensuring that they either include the Enmotus drivers in their Windows images, or relying on them to be distributed through Windows Update. No motherboard firmware or hardware modifications are required, or any changes to the process of provisioning a machine and preparing it for delivery to the end user. Enmotus is engaging both with PC OEMs and vendors of retail SSDs, so we can expect pre-built systems with Enmotus MiDrive technology and and upgrade options usable on any Windows 10 PC that already supports standard M.2 NVMe SSDs. Enmotus is optimistic about uptake from PC OEMs, expecting MiDrive to get a much better reception than Intel's Optane H10 did.
The basic MiDrive products will be fully automatic, with the Enmotus driver pre-installed or installed automatically when a MiDrive is detected. Data placement decisions will be completely behind-the-scenes. For enthusiasts, there will also be a premium tier similar to their current FuzeDrive software, which includes Windows Explorer shell integration so that individual files can be manually promoted or demoted, either permanently or for a limited period of time. Enmotus will also be providing a drive health monitoring tool that will include their estimate for how much extra drive lifetime has been won by using their tiering instead of ordinary SLC caching.
Mockup of Enmotus MiDrive SSD health monitoring tool
Enmotus expects SSDs with MiDrive technology to mostly use either 32GB or 64GB SLC portions and offer total capacities from about 400GB up to around 2TB, but the exact configurations will be determined by what their partners want to bring to market. Enmotus is also planning enthusiast-oriented solutions supporting RAID-0 style striping across multiple physical drives, and solutions for single-package BGA SSDs that go into small form factor and embedded devices.
Enmotus MiDrive technology will add to the price of SSDs, but since we're talking about QLC storage that's only relative to the cheapest NVMe SSDs available, and the final sticker prices will still be competitive for consumer SSDs. In return for that, users should get better real-world performance and enough effective write endurance to justify a 5-year warranty. We're looking forward to testing out this technology later this year, even though it will further complicate our benchmarking process. Enmotus is already sampling to interested OEMs.
42 Comments
View All Comments
azazel1024 - Monday, February 3, 2020 - link
It isn't simply a case of endurance. For the vast majority of users, yes, QLC endurance is just fine. The issue is performance. SLC caching of some sort hides the true performance for a lot of use cases for QLC drives, other than very large writes. The issue though is, if you exhaust the cache, the performance for large writes is significantly slower than a HARD DRIVE. You are looking at ~70MB/sec performance, compared to a recent hard drive which is going to be in the 140-180MB/sec range.TLC drives at least can hit around 170-200MB/sec for large writes once their cache is exhausted.
Of course QLC drives have better small file performance compared to a HDD once the cache is exhausted, but even there, they perform massively worse than a TLC drive in small writes if SLC caching cannot be used.
I DO think QLC drives are fine for 95% of consumers and okay for bulk storage for about half of the folks who are left, at least if they can be made cheaper enough than TLC to make sense (at least a 15-20% discount).
My concern though are things like PLC flash, which we know is coming. TLC->QLC at least nets a 33% increase in storage and seems to be around 20% or so cheaper.
PLC only increase storage density over QLC by 25%. I don't think I've seen anything definitive on endurance or performance, but if TLC->QLC is any guide at all, endurance will end up actually being an issue for some common consumer end users and performance is going to be horrible once their SLC cache is exhausted. Which also some common users might end up hitting with not unrealistic workloads.
One of the considerations is the lower the bare metal performance of the NAND is, the longer it takes to flush the SLC cache to TLC/QLC/PLC. Also depending on how that is managed, it can also create pretty drastic changes in performance if the drive sees access during the flushing process.
If QLC is ~70MB/sec writes for the underlying flash, PLC is likely in mid 2010's eMMC territory of 30-40MB/sec. Endurance is likely to be less than half of QLC. A lot of consumers end up using a device like a laptop for 4 or 5 or longer years. Also entry level storage capacities are typically most commonly purchased types...
DyneCorp - Friday, February 7, 2020 - link
"TLC drives at least can hit around 170-200MB/sec"64-layer 3D TLC, yes. Not 32-layer 3D TLC nor planar TLC.
"The issue though is, if you exhaust the cache, the performance for large writes is significantly slower than a HARD DRIVE. You are looking at ~70MB/sec performance, compared to a recent hard drive which is going to be in the 140-180MB/sec range."
For large writes? No. Not automatically. Why not give the entire story instead of just half-assing it. In sequential file transfers DRIVE TO DRIVE you MAY surpass the pSLC cache IF you can copy to it fast enough. IF. Have you even read the reviews?
Additionally, I'd love to see a modern hard drive that can write/ read sequential and random data faster than the 660p. Even modern hard disks can't write random data faster than 1MB/s, and sequential MAY hit above 100MB/s IF the drive isn't full and IF there aren't a lot of requests.
You WILL NOT hit the QLC NAND under the majority of workloads, period. End of story.
extide - Thursday, January 30, 2020 - link
Go buy MLC today..OH CRAP, you can't
R0H1T - Thursday, January 30, 2020 - link
Technical these are MLC, what you are referring to is probably "DLC" 😅yetanotherhuman - Thursday, January 30, 2020 - link
Samsung 970 Pro.Guspaz - Tuesday, February 4, 2020 - link
The MLC 970 Pro has lower performance specs in every category than the TLC 970 Evo Plus, excepting when the SLC cache on the Evo Plus is full. MLC with no SLC cache is not a clear win over TLC with an SLC cache.Dragonstongue - Thursday, January 30, 2020 - link
I had quite old OCZ drive Agility 3 (gave to mom, but it was starting to "forget" things...then again, mom did not use system that often so likely it not powered on as often as should have been not helps.(I fib, I have a 970 pro m.2 which is either 2bit or 3 bit MLC..confused samsung information they post about it LOL) the others (crucial MX 100 (256gb) 200 (500gb) mx500 (1tb) all use varying different style SLC/MLC/TLC style
TLC works well, pretty much all of them use a fancier cache method to speed up what they do, wouldn't it be great if we could have "modern" MLC or SLC at pricing they offer TLC at.
QLC, no thanks, higher chance of fail in shorter amount of time (robustness) unless the drive in question is intelligent enough to bounce around what is "written" to avoid data loss, am sure the makers take all the time in the world to proper design to avoid data loss, but then again, the price point of pretty much all of the QLC might as well stick with MLC or TLC
^.^
Tomatotech - Thursday, January 30, 2020 - link
I don’t think you understand how hard drives work. HDDs suck at random access and that’s what slows down a computer. I’ve never seen a modern SSD anything less than 10x faster at random rw than a hdd. To be fair, most are around 100x faster, even the QLC ones you seem to hate so much.Tell you what, I’ll happily take these horrible QLC SSDs off your hands. I’ll even give you a wonderful 1TB hdd for free in exchange for each one (250gb+ please) you send me. I’ll even pay postage both ways!
How’s that for a bargain? I have a stack of HDDs from work computers just waiting to sent to you and several dozen staff who will be overjoyed to have these nasty old QLCs you don’t want and will happily donate their speedy HDDs to you. Happy yet?
Farfolomew - Sunday, February 2, 2020 - link
+1People get obsessed with seq. read/write and forget how terrible HDDs are for a main OS disk, especially when you start adding in resource-intensitive applications like aggressive AntiVirus programs or all the ridiculous amount of services running in Win10 now
linuxgeex - Thursday, January 30, 2020 - link
My personal view of QLC is that it provides 2 key functional roles: First, it's an alternative technology to HDD for long-term storage, ie backups of critical files. Using 2 different technologies is highly recommended for avoid SPOF. To get the same reliability for 20 years of storage on HDD you will need to use a NAS or rely on regular human interaction, both of which will escalate the HDD costs beyond the QLC SSD cost. The second role is as a frontcache for a cold storage NAS where you expect less than 1 frontcache DW per week. At that rate it will last 6 years, which is right on target.