Ensuring 24/7 availability is a substantial point in providing the high quality web-services. To achieve this, different solutions may be applied depending on a particular application pattern. For example, through horizontal scaling, software clusterization and session replication.
However, even a properly configured cluster may be rarely stricken by downtime in case of a whole data center occasional failure. So, today we’d like to showcase a simple, cheap and effective HA solution - distribution of your workloads across several data centers. Recently, this became possible with a newly added Jelastic feature of Multiple Hardware Regions.
To reach the highest point of availability, we offer configuring two application clusters with the same content at different data centers, so in case one of them is unavailable, all the incoming requests will be redirected to another region. In order to accomplish this, we’ll take advantage of the DNS round robin load-balancing as the most affordable option for such HA implementation.
1. To start with, you need to have at least two environments, located within different hardware regions (the easiest way to implement this is to clone the environment with your application and migrate it to another region).
2. Next, we’ll benefit on the DNS servers’ possibility to bind multiple IP addresses (i.e. entry points) to a single domain name. For that, add external IPs of your both balancers into separate A records for your custom domain using the corresponding guide.
To make sure everything is configured properly, you can examine the appropriate DNS settings through running the following command for your domain via terminal:
Both of the stated A records will be listed within the ANSWER SECTION of the received output in the following format (obviously, the exact values for your domain name will differ):
example.com 9999 IN A first_IP
example.com 9999 IN A second_IP
The Implied Workflow
As a result, DNS server will send back the whole list of the available addresses upon receiving a request for the stated domain. The corresponding web-browser will try them one-by-one and choose the first that responded for establishing the connection. Usually, it’s the first Public IP address in the list. In case the appropriate data center is unavailable, the next one will be checked.
In addition, DNS servers automatically provide round robin distribution for the domain names with multiple IP records. Thus, after each request is processed, the order of addresses will be cyclically shifted, moving first IP down, which results in even workload distribution among all of your environments in different regions.
Since it is very unlikely that both of data centers with your environments will go down at a time, you’ve already got enough redundancy level. Nevertheless, the number of availability regions can be easily enlarged even further with the similar configurations applied.
Performance & Failover Testing
In order to check the effectiveness of this solution, we’ve held a special test with the simultaneous failover check up. For that, the dedicated domain was stably loaded with the help of the Apache JMeter tool, while changes on both instances were tracked using the embedded Jelastic statistics module.
To start with, we’ve ensured all the incoming requests are processed without errors by sending a continuous and persistent load. After the ~10 minutes period without a single failure occurred, one of the clusters was manually shut down in order to simulate the failure.
Below you can see the CPU & Network statistics for both of our load-balancers during running the described scenario:
As you can see, both environments in different regions handle steady and evenly divided load till the point, where the first cluster receives the shutdown command (approximately at 12:10 - this moment is marked with the red dotted line). After that, its activity drops down to zero (just as expected for the instance, which is unavailable), while the resource consumption at the second environment starts to rise, as it became the only entry point and all the received traffic is handled here now.
After the first cluster halt, all newly incoming requests were directed to the active instance, thus no one of them was dropped down. This can be seen within the automatically generated JMeter report below, where you can examine the processing results for the moment of the simulated failure breakpoint and the overall test summary:
You can check the efficiency (i.e. response time & the amount of errors) of the described HA approach by yourself using any other load-generating tool (like Load Impact, Load UI, WebLoad, etc) in a similar way.
The presented method is extremely simple and affordable, as it doesn’t require any additional soft or hardware integration to successfully deal with such major disaster as failure of the entire data center. Thus, do not waste the time and increase your services availability and uptime period with Jelastic right now! Give a try and experience all the Jelastic DevOps PaaS benefits by yourself.
Stay tuned for the upcoming publications at our blog to discover another more advanced HA solution, which ensures automatic DNS-records management through tracking the availability of hardware regions and provides smart geo-distribution of the incoming traffic by means of the Azure Traffic Manager. And remember, that applying preventive measures beforehand always costs less efforts and money than ensuing recovering of the lost data and restoring your customers’ trust in case of trouble.