Cloud resources can be expensive, especially when you are forced to pay for resources that you don’t need; on the other hand resource shortages cause downtimes. What’s a developer to do? In this article (full version at InfoQ) we’ll go through five steps that let you pay for just the resources you actually consume, without being limited as your application capacity requirements scale.
Admit That You Overpay for VMs
Almost every cloud vendor offers the ability to choose from a range of different VM sizes. Choosing the right VM size can be a daunting task; too small and you can trigger performance issues or even downtimes during load spikes. Over-allocate? Then during normal load or idle periods all unused resources are wasted. Does this scenario look familiar from your own cloud hosted applications?
In addition, if you need to add just a few more resources to the same VM, the only way out with most of current cloud vendors is to double your VM size. Exacerbating the problem, you need to incur downtime when you move, by stopping a current VM, performing all steps of application redeploy or migration, and then dealing with the inevitable associated challenges.
Find How to Scale up And down Efficiently
Vertical scaling optimizes memory and CPU usage of any instance, according to its current load. If configured properly, this works perfectly for both monoliths, as well as microservices.
Setting up vertical scaling inside a VM by adding or removing resources on the fly without downtimes is a difficult task. VM technologies provide memory ballooning, but it’s not fully automated, requiring tooling for monitoring the memory pressure in the host and guest OS, and then activating up or down scaling as appropriate. But this doesn't work well in practice, as the memory sharing should be automatic in order to be useful.
Container technology unlocks a new level of flexibility thanks to its out-of-box automatic resource sharing among containers on the same host, with a help of cgroups. Resources that are not consumed within the limit boundaries are automatically shared with other containers running on the same hardware node. And unlike VMs, the resource limits in containers can be easily scaled without reboot of the running instances.
Migrate from VMs to Containers
There is a common misconception that containers are good only for greenfield applications (microservices and cloud-native). The experience and use cases prove possibility to migrate existing workloads from VMs to containers without rewriting or redesigning applications.
For monolithic and legacy applications it is preferable to use system containers, so that you can reuse architecture, configuration, etc., that were implemented in the original VM design. Use standard network configurations like multicast, run multiple processes inside a container, avoid issues with incorrect memory limits determination, write on the local file system and keep it safe during container restart, troubleshoot issues and analyze logs in an already established way, use a variety of configuration tools based on SSH, and be liberal in relying on other important “old school” tasks.
To proceed with migration, you need to prepare the required container images. For system containers, that process might be a bit more complex than for application containers, so either build it yourself or use an orchestrator like Jelastic with pre-configured system container templates.
Each application component should be placed inside an isolated container. This approach can simplify the application topology in general, as some specific parts of the project may become unnecessary within a new architecture.
Enable Garbage Collector with Memory Shrink
For scaling Java vertically, it is not sufficient to just use containers; you also need to configure the JVM properly. Specifically, the garbage collector you select should provide memory shrinking in runtime.
Such GC packages all the live objects together, removes garbage objects, uncommit and releases unused memory back to the operation system, in contrast to non-shrinking GC or non-optimal JVM start options, where Java applications hold all committed RAM and cannot be scaled vertically according to the application load. Unfortunately, the JDK 8 default Parallel garbage collector (XX:+UseParallelGC) is not shrinking and does not solve the issue of inefficient RAM usage by JVM. Fortunately, this is easily remedied by switching to Garbage-First (-XX:+UseG1GC).
The following two parameters configure the vertical scaling of memory resources:
- set Xms - a scaling step
- set Xmx - a maximum scaling limit
Also, the application should periodically invoke Full GC, for example, System.gc(), during a low load or idle stage. This process can be implemented inside the application logic or automated with a help of the external Jelastic GC Agent.
In the graph below, we show the result of activating the following JVM start options with delta time growth of about 300 seconds:
-XX:+UseG1GC -Xmx2g -Xms32m
As you see, the reserved RAM (orange) increases slowly corresponding to the real usage growth (blue). And all unused resources within the Max Heap limits are available to be consumed by other containers or processes running in the same host, and not wasted by standing idle.
Choose a Cloud with Pay-as-You-Use Model
Cloud computing is very often compared to electricity usage, in that it provides resources on demand and offers a “pay as you go” model. But there is a major difference - your electric bill doesn’t double when you use a little more power!
Most of the cloud vendors provide a “pay as you go” billing model, which means that it is possible to start with a smaller machine and then add more servers as the project grows. But as we described above, you cannot simply choose the size that precisely fits your current needs and will scale with you, without some extra manual steps and possible downtimes. So you keep paying for the limits - for a small machine at first, then for one double in size, and ultimately horizontal scaling to several underutilized VMs.
In contrast to that, a “pay as you use” billing approach considers the load on the application instances at a present time, and provides or reclaims any required resources on the fly, which is made possible thanks to container technology. As a result, you are charged based on actual consumption and are not required to make complex reconfigurations to scale up.
Realizing benefits of vertical scaling helps to quickly eliminate a set of performance issues, avoid unnecessary complexity with rashly implemented horizontal scaling, and decrease cloud spends regardless of application type - monolith or microservice.