VM live migration has been out in the market for some time, but as for containers it has always been a more complicated problem to solve. At Virtuozzo we aim to provide a solution for live migration to any virtualization technology—regardless of whether it is VM- or Virtuozzo Container™-based. Other container implementations (such as LXC or Docker) can also greatly benefit from live migration. That is why we started the CRIU project, which aims to provide a toolset for live migration implementation for any Linux-based container. Now this technology is being widely accepted by the Linux community and it is also used in our brand new Virtuozzo 7 product. So let’s take a look at the most advanced features provided by CRIU that will likely make it a defacto standard for container live migration in the near future.
Live migration: VMs vs containers
Let’s have a deeper look on live migration implementation, and also why container migration is technically a more difficult task to solve. The diagram below shows what exactly should be live-migrated, for both VMs and containers:
The most significant difference is that when migrating a container, we need to migrate all the processes associated with the container along with the specific environment on top of which the container is built. Let’s take a look at the set of actions performed for migration:
- Freeze processes
- Saving state
- State copy
- State restore
- Unfreeze processes
In addition to those actions for live migration, two more steps are added: memory pre-copy in the very beginning and post-copy at the end of the live migration process. Those actions can significantly reduce freeze time of the VM or container during migration. In the first step, all the memory pages are copied from source to destination while the VM/CT is still running on the source. If some memory pages change (become ‘dirty’) during this process, they will be re-copied until the rate of re-copied pages is not less than page dirtying rate. Post-copy is initiated when the VM is migrated but some memory is still required to be transferred, thus the VM is already running and unfrozen, but some pages are copied on the fly.
The steps are basically the same for both CTs and VMs, but the difference in handling those steps can be seen from the diagram below:
It is clear that the processing required to perform migration of a container is much more complicated—unlike VMs that can be seen as a “black box” with all the memory allocated inside this very box, containers are represented by a number of processes with memory distributed among them. In addition, a container requires more kernel objects to be stored and more actions to retrieve the list of processes and objects to be migrated. On top of that, for some of the data pieces there was no API to get them from the kernel. So how are all these problems solved in Virtuozzo 7?
CRIU and CPT (pre-CRIU)
Initially nobody believed that a live migration of a container was possible. Then a Virtuozzo developer decided to tackle this challenge and came up with a solution. All the procedures and functions required to perform live migration of containers were implemented in the CPT extension to the Virtuozzo kernel, which could save the full state of a running VE (Virtual Environment) and restore it later on the same (or on a different) host in a way transparent to running applications and network connections. CPT allowed migration of a VE in a way that is essentially invisible both to users of this container and for external clients using network services located inside the container. It still introduced a short delay in service, required for actual checkpoint/restore of the processes, but this delay was indistinguishable from a short interruption of network connectivity.
As part of CPT functionality, we provided both kernel patch sets and our own API to use checkpoint and restoration functions that are the foundation of any container live migration. This solved the problem for Virtuozzo, but not for other containers—the implementation was so complex that there was no way to make the whole patch acceptable for the mainstream Linux. In order to make the technology open and insure fast development by the open source community, we introduced the CRIU (Checkpoint Restore in Userspace) project that provides functionality very similar to CPT, but unlike CPT most of this functionality is implemented in userspace. In Virtuozzo 7 we finally switched to CRIU as backend for live migration.
When implementing CRIU, a lot of the existing kernel APIs were utilized but some new functions were also added. Here is an example of how CRIU leverages such kernel APIs:
Let’s examine an application that is using a kernel resource. For example, this application has sent system call “open” and received a file descriptor:
In order to save the state of application the CRIU tool should ask the kernel about running processes and resources associated with those processes. In this case from example, CRIU should ask about opened files:
Then (on the destination) to restore the state and resource we can use all the same system call to get the descriptor:
Unlike VMs, where the hypevisor generally knows nothing about files opened inside by the guest kernel, in containers it is all apparent—and thus all needs to be handled by the migration. The example above illustrates handling of one particular object type by CRIU utility and C/R (CheckPoint/Restore) procedure performed in userspace. There is, however, much more than restoring file descriptors after migration; here is an incomplete list of particularly challenging problems that CRIU had to solve:
- Migration of processes sub-tree and multi-threaded applications
- Migration of opened and then deleted files (in Linux an opened file can be deleted – it will remain accessible for a process that opened it, and will be finally removed upon closing – but not visible on the file system)
- Migration of segments of shared memory and shared file descriptors
- Migration of alive TCP connections
- Optimizations that significantly reduced freeze time (memory pre- and post-copy)
To overcome those challenges, in many cases an extension of the kernel API was required. One such example is system call “mmap” which is used by applications to configure and allocate segments of virtual memory. Originally the kernel provided only very limited output on memory segments (limited to the fact of their existence) which is not sufficient for further restoration. As a solution for this problem, Virtuozzo engineers proposed extension to proc file system: “”/proc/<PID>/map_files/” which was accepted by the kernel development community and became part of the upstream kernel.
All functions and CRIU itself are supported in vanilla kernel since version 3.11. The features that were added to kernel as part of CRIU project implementation include but are not limited to:
- Code injection to read task states
- Proc “map_files” directory to determine exact file being mapped and mappings sharing information which was mentioned in the example above
- KCMP system call that can be used to check whether two processes identified by their PIDs share a kernel resource such as virtual memory, file descriptors, and so on.
- TCP repair mode to implement TCP socket reconstruction after migration
- Last-pid sysctl to restore task with desired PID value
This kind of functionality and open source nature insured CRIU to grow beyond the C/R utility used in Virtuozzo platform. CRIU is also an underlying technology for live migration for both Ubuntu LXD and Docker. To sum everything up, CRIU is a powerful and open tool to build live migration on top of it. It already offers a superset of live migration features compared to the legacy CPT solution, and the project evolves quickly with support of engineers from Google, RedHat, Canonical and of course Virtuozzo. The current version of CRIU is available for x86_64, ARM, AArch64 and Power architectures and supports all kinds if server software that is typically run in containers.
Real life application of CRIU: live migration in Virtuozzo 7
Live migration has certain prerequisites to ensure the process goes smoothly. Here are most important things:
- Time on the source and destination servers should be in sync, NTP is a good way to provide that. The reason is that certain processes running in virtual machines and containers may rely on system time being steady and might behave unpredictably when resumed on a destination server where time is different.
- The network must support data transfer rates of at least 1 Gbps.
- The source and destination servers must belong to the same subnetwork – this is to ensure that the network traffic still can be delivered after migration is completed.
- The CPUs on the source and destination servers must be manufactured by the same vendor, and the CPU capabilities of the destination server must support all the capabilities on the source server.
- Virtual machine and container disks can be located on local disks, NFS or VIrtuozzo storage.
To start the migration procedure to a remote server, one can run this command on the local server:
# prlctl migrate MyVM root:firstname.lastname@example.org
To move a VM from a remote server, this command can be run on the local server:
# prlctl migrate destserver.com/MyVM localhost
If the destination server credentials are not provided in the command, the user will be asked to do so during migration.
During migration the following actions are performed by Virtuozzo 7:
- Virtuozzo checks whether the destination server meets all the migration requirements and the virtual machine or container can be migrated to it.
- All virtual memory and disks of the virtual machine or container are migrated to the destination server.
- The virtual machine or container on the source server is suspended.
- The changed memory pages and virtual disk blocks, if any, are migrated to the destination server.
- The virtual machine or container is resumed on the destination server.
Once migration is complete, the original virtual machine or container is removed from the source server.
Live migration with Virtuozzo 7 is swift and easy to implement. It can be used with any virtualization technology and it is based on the fast evolving CRIU utility which is open and provides C/R capabilities in userspace. If you are interested in CRIU project and live migration you are welcome to visit the CRIU official website: CRIU.org for more details on the current project status. You are also welcome to contact the CRIU community directly by subscribing to the official mailing list: CRIU Mailing List.