The performance of containers is often referred to as being “as good as physical servers,” meaning the additional performance overhead of containers is almost zero, as opposed to the significant performance impact related to virtual hardware emulation for virtual machines.
While it is true that hypervisors and virtual hardware are not a required layer of OS containers, it can be quite misleading, especially if assuming real-life performance would line up like a physical server first, then containers, then virtual machines.
First, containers (and sometimes even virtual machines) can, and often times do, perform better than physical servers. We have seen cases when running multiple instances of an application on a physical server can show worse total performance than if being split into the same number of separate containers or VMs, one application instance in each.
Several factors contributing to that are:
- pfcache (described below)
- locality of execution—when grouped by containers, applications within a single container have a better chance of running on a limited set of CPUs working with local memory under NUMA, thus showing better performance
- better efficiency of the page cache—multiple potentially resource-greedy applications sitting in the same “sandbox” can make page cache usage inefficient, forcing the operating system to deal with memory and cache shortage. Partitioning/resource management features of either VMs or OS containers will help to ensure that enough resources are available for an operating system to manage cache more effectively.
Second, modern hypervisors have very little overhead; thanks to virtualization support in the hardware, the amount of extra operations that a hypervisor needs to perform as “virtualization overhead” is small. In fact, if one will try to run a performance test designed to benchmark an individual computer system (like CPU, memory, storage, network) and compare results between running directly on a physical server and inside a virtual machine solely running on the same hardware—with proper tuning of the hypervisor—most likely the difference will be negligible. Of course, this is not a proper benchmark to evaluate real-life performance of virtualized vs non-virtualized applications, but it still shows there are cases when VM performance can be close to a physical server.
That does not mean, however, that VMs and containers are equally well-suited to run any kind of workload.
There are number of techniques and features in OS container technology that may increase overall system performance. Here are few examples:
- Containers offer extremely fast startup times (fractions of seconds), which is an important factor when building application architecture that relies on fast startup and shutdown (such as microservices).
- Virtuozzo OS containers offer a special feature called “pfcache,” which identifies and merges identical file mappings in memory in different containers. This has a positive effect from two sides:
- It decreases memory overall consumptions by releasing memory decreasing number of files loaded into memory.
- It improves IO performance, because there is a much higher chance that a particular file will be cached (because fewer files need to be cached, and individual files are used by a greater number of readers). IO bandwidth is a very common bottleneck for many applications, so an improvement here goes a long way.
Besides, there is another factor, which we can illustrate with the following graph:
This particular test is a performance benchmark; it deploys one or more groups of virtual appliances, which run certain applications working together as a single group (called Consolidation Stack Unit (CSU)). Each server in the group generates output results, such as transactions per second, and the aggregated result is used to compare different virtualization solutions. By increasing the number of CSUs, it is possible to compare how different virtualization solutions behave, which produce more transactions on the same hardware with the same number of CSUs, and which are able to run more tiles effectively (before overall system performance begins to decrease).
This particular run was to compare virtual machines and containers.
It is easy to see that before CPU reaches overcommit, containers and VM scores go head to head; the difference is within a single-digit percent.
Around the CPU overcommit threshold, the difference becomes obvious. With no idle CPU time left, all the extra cycles that hypervisors use will not be available to the applications, and thus the score stops increasing well before the same happens in containers. With more CSUs added, we soon hit memory overcommit. This is where difference becomes most apparent. VMs use more memory due to individual kernel and OS components, and their content is more or less a black box for the host (hypervisor) kernel, thus limiting the memory management techniques that can be used. With containers, properly set resource management parameters restrict applications from overusing resources, but at the same time do not force the system to pre-allocate these resources (which is especially true for memory)—thus more memory will be available to kernel and applications.
Depending on the usage pattern, the difference may be more dramatic. For example, here is the result of a test called “DVD-store.” It’s conceptually similar to vConsolidate, but uses a different application—an online e-commerce system.
Similar patterns are seen on most of the tested workloads. Not every test shows such dramatic difference as DVD-store though, but here are the conditions revealing the greater benefits of containers:
- Servers are working at maximum capacity, with CPU and memory utilization approaching 100 percent. In production, this is a common pattern for batch processing or data analytic workloads—when the goal is to get maximum processing power from the servers without a need to reserve capacity to ensure better performance during “peak” loads.
- Containers run similar or identical applications, thus pfcache can do a good job finding a lot if identical files across containers.
- Context switching latency is a significant factor. An example would be several multi-thread web servers, with an application and database backend, passing user requests before several components before returning response to the user. The smaller an individual request is, the more toll increasing latency will take on the overall performance.
- Multiple virtual CPUs (exceeding number of physical CPUs) assigned to multiple VMs. This condition forces frequent context switches between environments, which, in the case of a VM, is a more expensive operation. When triggered often, it can significantly decrease overall system performance.
- Small individual workloads with a large number of them running concurrently. VMs carry certain memory overhead for each individual instance (for kernel footprint and hypervisor structures); the greater the number of instances (and smaller individual workload), the greater the amount of memory consumed by those footprints.
Examples of real-life applications that show better performance in containers include:
- High-loaded, multi-component and multi-instance web and application servers
- Data analytic software (especially running in a SaaS model and using VM/container partitioning for tenancy)
- Applications built on a micro-services architecture
- Multiple concurrent batch-processing workloads that use all available resources