Lightbits Block Storage for OpenStack Accelerates Live Migrations

Introduction

OpenStack live migration is a critical feature for ensuring high availability, load balancing, and seamless maintenance in virtualized environments. However, live migration often faces challenges such as prolonged migration times, storage bottlenecks, and network congestion, particularly when dealing with large datasets.

In this blog, we explore how Lightbits NVMe over TCP (NVMe/TCP) block storage for OpenStack transforms live migrations, delivering high performance, reduced downtime, and enhanced scalability.

The Challenges of OpenStack Live Migration

Live migration involves transferring a running VM’s memory, CPU state, and disk I/O from one compute node to another. While memory and CPU transfers are relatively efficient, disk I/O often becomes the bottleneck. Key challenges include:

  1. Slow Storage Performance: Traditional storage solutions struggle to deliver the IOPS and low latency required for fast migrations.
  2. Large Disk Transfers: Migrating VMs with large disk sizes can significantly extend migration times.
  3. Downtime Risks: Longer migrations increase the risk of service disruption or timeout failures.
  4. Network Congestion: Data transfer during migration can impact other workloads that are sharing the network.

How Lightbits Block Storage for  OpenStack Transforms Live Migrations

Lightbits NVMe/TCP storage is purpose-built to address these challenges. By delivering NVMe performance over standard TCP/IP networks, Lightbits eliminates the need for expensive proprietary storage hardware while ensuring high performance and low latency.

 

Key Benefits of Lightbits for OpenStack Live Migration:

    • High-Performance Storage Backend: Lightbits NVMe/TCP volumes provide the speed and throughput needed to transfer large disk images quickly.
    • Seamless Integration: Lightbits integrates natively with OpenStack using Cinder for block storage, simplifying deployment.
    • Reduced Downtime: Faster disk transfers mean shorter migration windows and minimal service disruption.
    • High Availability (HA): Lightbits supports storage redundancy, ensuring no disruption during migrations or failovers.
    • Scalable Performance: Lightbits storage scales seamlessly to meet growing workload demands.
    • Cost Efficiency: NVMe/TCP runs over existing TCP/IP networks, reducing infrastructure costs.

Supported Features with Lightbits in OpenStack

It is essential to understand what is supported and what is not when using Lightbits NVMe/TCP block storage with OpenStack:

  1. Block Storage (Cinder): Lightbits integrates seamlessly with OpenStack Cinder to provide NVMe/TCP-based block storage.
  2. Volume-Based Storage: Lightbits provides highly performant, volume-based storage that can be attached to VMs for live migration.
  3. High Availability (HA): Lightbits supports storage redundancy, ensuring no disruption during migrations or failovers.
  4. Seamless Integration: Lightbits can be deployed over standard TCP/IP networks without requiring proprietary hardware.

By focusing on its strengths in block storage and high-performance NVMe/TCP volumes, Lightbits ensures the best possible performance for OpenStack live migration workloads.

Real-World Performance Results

We tested OpenStack live migration with Lightbits NVMe/TCP storage in a Kolla distribution with an OpenStack Yoga. To evaluate migration performance, we used the built-in OpenStack “nova live-migration” command to trigger live migrations under different conditions (with load and no load). For load testing, we used “fio” to simulate disk activity and stress the system during migration.

FlavorvCPURAM (GB)OS Disk (GB)
m1.xtiny1220
m1.tiny2440
m1.mid6860

We conducted tests under both “load” and “no load” conditions. We tested scenarios with varying loads and configurations. Below is a detailed breakdown of the migration times:

For m1.xtiny(Time in Seconds):

Load only Data volumes
1 core, 2GB memory, 20GB OS disk, Data Disk of 100GBNo LoadLoadRAWFile System
1 volume (OS only)3035--
3 volumes (OS + 2 X data)30463043
5 volumes (OS + 4 X data)35473549

For m1.tiny(Time in Seconds)

Load only Data volumes
2 core, 4GB memory, 40GB OS disk, Data Disk of 100GBNo LoadLoadRAWFile System
1 volume (OS only)3050--
3 volumes (OS + 2 X data)31565656
5 volumes (OS + 4 X data)31616064

For m1.mid(Time in Seconds)

Load only Data volumes
2 core, 4GB memory, 40GB OS disk, Data Disk of 100GBNo LoadLoadRAWFile System
1 volume (OS only)3287--
3 volumes (OS + 2 X data)35959293
5 volumes (OS + 4 X data)35989898

The migration time is affected by multiple factors and increases with the number of vCPUs, RAM, and disk of the VM. Other factors that can influence migration times include network congestion, CPU contention, and storage backend performance. Our tests show that the migration time increases with load due to these factors.

It is important to note that OpenStack’s internal processes, such as memory transfer, CPU state synchronization, and scheduling delays, primarily affect the observed migration times. Lightbits storage being shared across Nova nodes results in the actual storage handover taking place within a second, which further highlights the efficiency of NVMe/TCP in OpenStack environments.

Impact of Tuning Migration Performance

During our tests, we observed that the m1.mid instance initially took 90 seconds to complete live migration under load. However, by applying tuning parameters such as post-copy migration and auto-convergence, we were able to significantly optimize the migration process.

  • Post-Copy Migration reduced overall migration time dramatically, bringing it down to 22 seconds by transferring memory pages after the VM was resumed on the destination node, reducing pre-copy iterations.
  • Auto-Convergence dynamically adjusted CPU performance to ensure smooth memory transfer, reducing excessive iterations and improving migration efficiency, resulting in a migration time of 50 seconds.

Multiple VM Migration Performance

In addition to testing single VM migration, we also evaluated how migration behaves when multiple VMs are migrated simultaneously. To simulate a real-world workload, we initiated the live migration of VMs concurrently under no-load conditions. The results were highly efficient, with all migrations successfully completing within 40 seconds. Despite migrating multiple VMs at once, the completion time remained well within an optimal range, demonstrating Lightbits’ ability to handle concurrent migrations without storage-induced delays.

Best Practices for Optimizing Live Migration

To get the most out of OpenStack live migration with Lightbits, follow these best practices:

  1. Leverage NVMe/TCP Storage: Use Lightbits NVMe/TCP for low-latency, high-performance disk transfers.
  2. Optimize Your Network: Deploy high-bandwidth TCP/IP networks (e.g., 25Gbps or higher).
  3. Pre-Migration Checks: Verify storage connectivity and resource availability before migration.
  4. Tune Migration Parameters:
    • Post-Copy Migration: Use post-copy migration to reduce downtime by transferring memory pages after the VM is resumed on the destination.
    • Auto-Convergence: Enable auto-convergence to dynamically throttle the VM’s CPU during migration, reducing memory transfer rates and ensuring successful migration under high load.
    • Compression: Enable compression to reduce the amount of data transferred over the network.
    • Network Bandwidth Tuning: Allocate sufficient bandwidth for migration traffic to avoid congestion.
  5. Ensure High Availability: Configure Lightbits HA to guarantee uninterrupted storage access.

Conclusion

Lightbits NVMe/TCP storage revolutionizes OpenStack live migration by addressing storage bottlenecks and reducing migration times. By delivering high performance, reliability, and cost efficiency, Lightbits enables organizations to achieve seamless VM migrations with minimal disruption.

For organizations prioritizing high-speed, resilient block storage, Lightbits NVMe/TCP provides a seamless and efficient live migration experience, ensuring business continuity with minimal disruption.

Learn more about Lightbits NVMe/TCP storage at www.lightbitslabs.com.

About the Writer: