Guest blog post sponsored by Lightbits Labs
If you’ve kept abreast of evolving NVMe technology, then you’ve no doubt heard about NVMe over Fabrics (NVMe-oF). This new storage networking protocol promises to deliver the same latencies as internal PCIe-based direct attached storage (DAS) for shared, high performance NVMe-based storage. What makes it even more interesting is that shared storage brings along with it a number of advantages that don’t exist with DAS no matter how fast it is: better storage capacity utilization (through sharing), much higher capacities, and access to enterprise-class storage management (thin provisioning, RAID, in-line data reduction, snapshots, encryption, quality of service, replication, etc.) that just aren’t generally available for DAS. This is welcome news for those digitally transforming businesses that need NVMe performance but don’t want to be limited to DAS architectures.
Administrators deploying NVMe-oF have three fabric options: InfiniBand, Fibre Channel (FC) and Ethernet. InfiniBand has not really caught on in commercial environments and requires the expensive deployment of InfiniBand adapters and switches. FC is more popular in the enterprise than InfiniBand but can also be expensive to deploy. Every business using information technology (IT) already has Ethernet deployed (although maybe not for storage networking). Ethernet components cost less than either FC or InfiniBand, and businesses already have a working knowledge of it. Ethernet-based storage networking has been available for years (iSCSI), and with each successive increase in network bandwidth has been narrowing the performance gap between itself and FC in terms of its ability to handle even performance-sensitive storage workloads.
Most FC sales these days are to existing customers who already have FC networks and FC management expertise and aren’t ready to go to Ethernet yet. Most customers that are deploying high performance storage networks for the first time now go with Ethernet because of lower component costs, pre-existing management expertise and the fact that Ethernet storage networking can, in most cases, meet the performance requirement. The long-term trend in storage networking in the enterprise is away from FC and towards Ethernet but administrators need to keep in mind that there is a huge difference between the performance of iSCSI and that of NVMe-oF using Ethernet. NVMe is a much more efficient protocol, supporting an order of magnitude higher bandwidth, one to two orders of magnitude lower latencies (depending on the specific solid-state media used), and more than three orders of magnitude higher parallelism. And NVMe-oF was designed to take advantage of all of these advantages.
When Ethernet is chosen as the transport for NVMe-oF networks, there are three protocol options: RDMA over Converged Ethernet (RoCE), iWARP, and TCP. RoCE and iWARP both support Remote Direct Memory Access (RDMA) and require special adapters and switch configurations but do support shared storage access latencies in the 20 microsecond (µsec) range. TCP leverages standard Ethernet adapters and switching components to provide a lower cost solution and delivers storage access latencies in the 40 µsec range. NVMe/TCP is a lower cost option than the other two, and does not require custom hardware, software and management expertise to configure, yet provides storage latencies that are significantly better than that offered by iSCSI. For those that need RoCE-like latencies while remaining on industry-standard TCP networking, network adapters supporting Application Device Queues (an advanced traffic steering technology that improves response time predictability and scalability) will be something to consider.
Early customer choices in this market already indicate the future direction. When NVMe-based All-Flash Arrays (NAFAs) first started to ship in 2017, the NVMe/TCP option did not yet exist. Back then, NAFAs that only supported NVMe-oF mostly chose RoCE, but a limited number of these were sold. In talking to those customers, it was clear to me that the performance advantage outweighed the cost disadvantage for them, but many of these customers were anxious for a lower cost, more industry-standard Ethernet option that would be easier to deploy, manage and upgrade. I spoke to a number of other customers, particularly those with larger webscale infrastructure, that had played with NVMe-oF but were waiting for a more industry-standard solution before actually buying and deploying it. Discussion about this option, which would run the NVMe protocol over industry standard TCP networks, started in 2018 as the SNIA standard was being written. Intel Corporation and Lightbits Labs were the main promoters of the standard with Lightbits contributing the kernel code for the initiator and target that would later be incorporated and released in the Linux 5.0 kernel.
Fast forward to 2020, and NVMe/TCP is here. IDC predicted that RoCE would remain a relatively niche play for those customers that valued low access latencies above all else, and that has in fact been the case. For other NAFA customers, the fact that NVMe/TCP drivers shipped with popular Linux distributions and they could use the standard Converged Ethernet adapters that were already shipping on their x86 servers anyway was an attractive draw. Most customers will deploy dedicated Ethernet networks for NVMe/TCP, so moving to it won’t be “free” but customers will not have to buy and qualify new drivers and switching hardware, learn the intricacies of managing RoCE networks or deal with maintenance and upgrade activities outside of industry-standard Ethernet. NVMe/TCP is just a simpler, high performance choice for networking NAFAs.
Primary research performed by IDC in 2020 indicates that many customers are already thinking about how they will leverage NVMe technologies in their shops. 62.5% of survey respondents indicated that they see an increasing need to support the type of real-time workloads that require NVMe performance in the next 12 to 24 months, and 23.3% of them have already deployed NAFAs (although not all of them are using NVMe-oF yet). NVMe/TCP lowers the overall cost of end-to-end NVMe, and that will only accelerate the rapid transition from SCSI-based to NVMe-based enterprise storage that is already occurring. By 2021, over 50% of “external storage array for performance-sensitive workloads” revenues will be driven by NAFAs, and that increased penetration will drive interest in NVMe-oF as well. IDC is not shy in predicting that NVMe/TCP will dominate all other NVMe-oF implementations by the end of 2023. All of this growth will take place at the expense of SCSI-based array revenues, cannibalizing from the SCSI-based All Flash Array (SAFA), Hybrid Flash Array (HFA), and Hard Disk Drive (HDD)-based Array market segments.
In closing, there’s one more point to mention. Deployment models have become an important purchase criterium in storage infrastructure decisions over the last two to three years. There are five different deployment models – appliances, software-only (or software-defined), converged infrastructure, hyperconverged infrastructure, and cloud-based (as-a-service model). Today, NVMe/TCP is only available in two of those deployment models: appliance-based and software-defined. Each model offers its own set of advantages so customers will want to consider those as they make a decision about NVMe/TCP deployment going forward. From my discussions with customers interested in the “software-defined” model, the primary advantages of that approach are hardware flexibility (option to run the software on their hardware of choice) that removes vendor hardware lock-in, ease of management (these are newer systems that are designed around more of a self-managed storage model than older, more hardware-defined offerings), and economics (software-only options are just plain less expensive to buy than appliance-based offerings).
Guest blog post sponsored by Lightbits Labs