How to Enable NVMe Zoned Namespaces

Hardware changes for ZNS

At a high level, in order to enable ZNS, most drives on the market only require a firmware update. ZNS doesn't put any new requirements on SSD controllers or other hardware components; this feature can be implemented for existing drives with firmware changes alone.

The critical element in hardware comes down to when an SSD is designed to support only zoned namespaces. First and foremost, a ZNS-only SSD doesn't need anywhere near as much overprovisioning as a traditional enterprise SSD. ZNS SSDs are still responsible for performing wear leveling, but this no longer requires a large spare area for the garbage collection process. Used properly, ZNS allows the host software to avoid almost all of the circumstances that would lead to write amplification inside the SSD. Enterprise SSDs commonly use overprovisioning ratios up to 28% (800GB usable per 1024GB of flash on typical 3 DWPD models) and ZNS SSDs can expose almost all of that capacity to the host system without compromising the ability to deliver high sustained write performance. ZNS SSDs still need some reserve capacity (for example, to cope with failures that crop up in flash memory as it wears out), but Western Digital says we can expect ZNS to allow roughly a factor of 10 reduction in overprovisioning ratios.

Better control over write amplification also means QLC NAND is a more viable option for use cases that would otherwise require TLC NAND. Enterprise storage workloads often lead to write amplification factors of 2-5x. With ZNS, the SSD itself causes virtually no write amplification and clever host software can avoid causing much write amplification, so the overall effect is a boost to drive lifespan that offsets the lower endurance of QLC compared to TLC (or beyond QLC). Even in a ZNS SSD, QLC NAND is still fundamentally slower than TLC, but that same near-elimination of background data management within the SSD means a QLC-based ZNS SSD can probably compete with TLC-based traditional SSDs on QoS metrics even if the total throughput is lower.


The other major hardware change enabled by ZNS is a drastic reduction in DRAM requirements. The Flash Translation Layer (FTL) in traditional block-based SSDs requires about 1GB of DRAM for every 1TB of NAND flash. This is used to store the address mapping or indirection tables that record the physical NAND flash memory address that is currently storing each Logical Block Address (LBA). The 1GB per 1TB ratio is a consequence of the FTL managing the flash with a granularity of 4kB. Right off the bat, ZNS gets rid of that requirement by letting the SSD manage whole zones that are hundreds of MB each. Tracking which physical NAND erase blocks comprise each zone now requires so little memory that it could be done with on-controller SRAM even for SSDs with tens of TB of flash. ZNS doesn't completely eliminate the need for SSDs to include DRAM, because the metadata that the drive needs to store about each zone is larger than what a traditional FTL needs to store for each LBA, and drives are likely to also use some DRAM for caching writes - more on this later.

NVMe Zoned Namespaces Explained The Software Model
Comments Locked


View All Comments

  • FreckledTrout - Thursday, August 6, 2020 - link

    Like most things its the cost. I bet the testing alone is prohibitive to back port this into older SSD drives.
  • xenol - Thursday, August 6, 2020 - link

    Bingo. Testing and support costs something. Though I suppose they could release it for older drives under a no-support provision.

    Except depending on who tries this, I'm sure it's inevitable someone will break something and complain that they're not getting support.
  • DigitalFreak - Thursday, August 6, 2020 - link

    Why spend the money to make a retroactive firmware, when you can just sell the user a new drive with the updated spec? If someone cares enough about this, they'll shell out the $$$ for a new drive.
  • IT Mamba - Monday, December 14, 2020 - link

    Easier said then done.
  • Grizzlebee11 - Thursday, August 6, 2020 - link

    I wonder how this will affect Optane performance.
  • Billy Tallis - Thursday, August 6, 2020 - link

    Optane has no reason to adopt a zoned model, because the underlying 3D XPoint memory supports in-place modification of data.
  • name99 - Saturday, August 8, 2020 - link

    Does it really? I know Intel made a big deal about this, but isn't the reality (not that it changes your point, but getting the technical details right)
    - the minimum Optane granularity unit is a 64B line (which, admittedly, is the effective same as DRAM, but DRAM could be smaller if necessary, Optane???)

    - the PRACTICAL Optane granularity unit (which is what I am getting at in terms of "in-place"), giving 4x the bandwidth, is 256B.

    Yeah, I'm right. Looking around I found this
    which says "the 3D-XPoint physical media access granularity is 256 bytes" with everything that flows from that: need for write combining buffers, RMW if you can't write-combine, write amplification power/lifetime concerns, etc etc.

    So, sure, you don't have BIG zones/pages like flash -- but it's also incorrect (both technically, and for optimal use of the technology) to suggest that it's "true" random access, as much so as DRAM.

    It remains unclear to me how much of the current disappointment around Optane DIMM performance, eg
    derives from this. Certainly the Optane-targeted algorithms and new file systems I was reading say 5 years ago, when Intel was promising essentially "flash density, RAM performance" seemed very much optimized for "true" random access with no attempts at clustering larger than a cache line.
    Wouldn't be the first time Intel advertising department's lies landed up tanking a technology because of the ultimate gap between what was promised (and designed for) vs what was delivered...
  • MFinn3333 - Sunday, August 9, 2020 - link

    Um... Optane DIMM's have not disappointed anybody in their performance. Shows just how
  • brucethemoose - Thursday, August 6, 2020 - link

    Optane is byte addressable like DRAM and fairly durable, isn't it? I don't think this "multi kilobyte zoned storage" approach would be any more appropriate than the spinning rust block/sector model.

    Then again, running Optane over PCIe/NVMe always seemed like a waste to me.
  • FunBunny2 - Friday, August 7, 2020 - link

    "Optane is byte addressable like DRAM and fairly durable, isn't it?"

    yes, and my first notion was that Optane would *replace* DRAM/HDD/SSD in a 'true' 64 bit address single level storage space. although slower than DRAM, such an architecture would write program variables as they change direct to 'storage' without all that data migration. completely forgot that current cpu use many levels of buffers between registers and durable storage. iow, there's really no byte addressed update in today's machines.

    back in the 70s and early 80s, TI (and some others, I think) built machines that had no data registers in/on the cpu, all instructions happened in main memory and all data was written directly in memory and then to disc. the morphing to load/store architectures with scads of buffering means that optimum use of an Optane store with such an architecture looks to be a waste of time until/if cpu architecture writes data based on transaction scope of applications, not buffer fill.

Log in

Don't have an account? Sign up now