Disaggregation isn’t a new concept. Hard drives have been disaggregated with storage-area networks (SANs), and CPUs with virtualization. It’s simply about sharing server resources across applications. In the datacenter, many companies are separating storage from compute to increase utilization and lower infrastructure cost.
I recently participated in a panel discussion on this topic at the Open Computer Project’s Virtual Summit. The other panelists and I discussed disaggregation of the datacenter. At one point an audience member asked me whether disaggregation wasn’t already happening with storage. It’s a good question.
Disaggregated Storage Adoption
To get to the answer, it’s important to know there are different types of storage. First, there is high latency, low bandwidth storage – you might associate this with spinning disks. Today, those are already largely disaggregated. Commercial products like SANs and open source projects like Ceph and HDSF Hadoop disaggregate slower storage.
But when it comes to high speed, high throughput, low-latency storage, disaggregation is a more recent phenomenon. Adoption is happening now with solutions like Lightbits, as well as open source NVME/TCP drivers that enable high speed easy to manage networking between storage and compute.
Disaggregating even faster storage that resides on the server, such as RAM, is another area to ponder. Because of its sensitivity to latency, which is typically measured in the tens of nanoseconds, it’s difficult to disaggregate it over the network today. That’s one we’ll list under future possibilities.
Disaggregated Storage Benefits
There are lots of benefits to disaggregated storage. For one, it helps enterprises save money, and can even simplify the supply chain while allowing product development teams to focus on features or performance and not infrastructure.
Infrastructure and product growth often lead the push for disaggregation. Companies with few applications typically build their infrastructure to suit those applications and they likely optimize the configuration of each server to fit a particular application. With database applications for example, one might put in large CPUs, lots of memory and fast storage to serve that specific need. In doing so, they’re optimizing their server design per application.
But as these companies grow and products become more complex, the number of applications they rely upon grows as well. Whereas previously they may have had 10 key applications with 10 unique server designs for each, their growth may mean they now have 1000 different applications in use. It’s simply not practical to manage different server configurations for each application type.
Even knowing this, many operators still manage far more server configurations than they should. I worked with a company that had 30 different server configurations. They finally said, “Enough! We need to simplify,” and moved to an all-purpose, single server configuration (with some exceptions here and there). Even that was problematic, though, because they had no way to disaggregate. What they found was with a single server configuration across their fleet, their flash and HDD utilization was just 3% to 4%. Their CPU usage was just 4% to 5% in most cases. DRAM was just 10%. Those numbers are shockingly low. This company was spending a lot on infrastructure, and only using about 10% of it. The financial ramifications were huge.
Emerging Technologies in Disaggregated Storage
Today, emerging technologies including our own Lightbits scale out software-defined storage solution allow companies to disaggregate storage from compute, making it easier to tailor the numbers of commodity servers in use, which improves utilization. Disaggregating makes sure the storage can be shared across multiple applications and scaled independently from compute. Whether it is disaggregating spinning drives, flash, or virtualizing CPUs – the company can start building servers that are optimized around those commodities: CPUs, drives, flash and maybe DRAM in the future. That simplifies the supply chain and it allows the company more flexibility.
Web scalers have implemented such technologies for years. If you have used public clouds like Amazon AWS or Microsoft’s Azure then the odds are that you have used a disaggregated storage system. Amazon’s EBS (Elastic Block Storage) io1, io2, gp2 and the newly announced io2 Block Express are great examples of a disaggregated flash. They are built with large numbers of servers optimized for flash throughput and capacity. Software ties them together for the purposes of redundancy, compression, security, and provisioning. Ultimately the optimized software and hardware infrastructure work together to serve remote block devices to millions of AWS compute instances. While implementing such a system for a private cloud or a container / virtual environment may seem daunting, it’s actually easy with LightOS from Lightbits.
With Lightbits, you can build your own disaggregated storage system for private cloud, hybrid cloud and edge cloud. You can operate like the hyperscalers; buying fewer types of server configurations, while planning compute/performance capacity around commodities like DRAM, and CPUs rather than being constrained by local flash capacity for applications. Servers are designed for specific hardware commodities and not necessarily for applications that will run on them. When disaggregation storage systems have built in reliability and redundancy the application developers can ignore these lower level concerns and focus instead on what’s important… applications.
Additional Resources
NVMe over TCP
Kubernetes Persistent Storage
Edge Cloud Storage
Ceph Storage
Disaggregated Storage for Private and Edge Clouds