NVMe/TCP Conquers the UNH-IOL Plugfest

Entering the New Hampshire University InterOperability Lab (UNH-IOL) in November 2018, I was surrounded by servers, network cards, Ethernet cables hanging from the ceiling, and a lot of cool nerdy lab rats—like me.  We were there to run the first-ever NVMe Over TCP (NVMe/TCP) plugfest

This Proof of Concept (PoC) plugfest for the new NVMe/TCP protocol would bring together multiple vendors for the first time to validate their hardware and software implementations, specification compliance, and interoperability against each other. As one of the pioneers for this protocol, my team at Lightbits Labs was thrilled to find out that the NVMe/TCP code was going to be broadly tested versus different vendors who share the same enthusiasm for Ethernet-based storage protocols.

I am a software engineer who has worked with RDMA standards, so I knew that storage and networking interoperability is hard. Now, our team was going to prove how our NVMe/TCP solution could be easily integrated with the work of other vendors and take another step toward becoming as common a protocol as its better-established sibling standards; NVMe and NVMe Over Fabrics.

Our plugfest preparation began when my team started to learn how the plugfest was going to work, what the other vendors were bringing, and would there be enough coffee and snacks. Although we were working with a conventional TCP-based protocol, you can never be too careful regarding network topologies that involve different vendors switches and network cards.

First, we had to understand the code base that the plugfest was going to use. Because we were testing our full Lightbits disaggregated open storage solution, it was essential for us to confirm that our NVMe/TCP version was fully compliant with all other participants and the host UNH-IOL versions. The NVMe/TCP protocol was finally ratified by the NVMe working group a few weeks after the plugfest.

Next, UNH-IOL provided a full test suite, and we ran in-house testing, simulating, and fixing. Although we had a decent mileage with our full NVMe/TCP storage solution, we found some issues and corner cases that could get us into nasty places.

Finally, we prepared our box for shipment. Ironically, one of our biggest shocks was the FedEx shipment cost to get the equipment to New Hampshire.

After all those long hours of preparing, testing, and debugging, we finally reached our destination, Durham, New Hampshire. As a new plugfest participant, I was excited to meet the lab’s staff face to face and all the other plugfest partners, mostly engineers from different vendors who share my passion for network and storage protocols.

The first few hours took up general information with introductions, sharing schedules, and all the participants reviewing the various network configurations. It was all about setting goals for the entire event and collaborating to make it a big success.

The actual games began with the initial bring ups and each team starting to get their connections up and running. NVMe/TCP hosts were communicating with NVMe Over TCP targets over different network configurations.

What followed was a real plugfest party meshing systems with off-the-shelf network devices all interconnected with commodity Ethernet switches. The days were all about compliance, compliance, compliance and at night we tried to decompress together over beers.

We devoted ourselves to thoroughly checking each nasty corner case, which is usually ignored when protocols get set for broad deployment. Just then we understood how important and valuable it was to run all those test-suites in-house before the plugfest. Every checkpoint during the preparations become essential for our success.

I would be lying if I give the impression that everything went smoothly right from the beginning. Every time we hit a problem I kept recalling something that a colleague once told me, “If it works on the first try, it probably doesn’t work at all.”

It isn’t possible to describe all the ups and downs we experienced every day. The plugfest was an adventure and another big step forward for NVMe/TCP. We will be at future plugfests to again test our LightOS open storage solution, and I can only recommend to other software engineers try and join one of those awesome plugfests.

Finally, a big thank you to all of the talented people at UNH-IOL who gave us visitors a home-like feeling and taught us more about our product and the entire ecosystem. Hope to see you at the next plugfest!

About the Writer:

Senior Solutions Engineer