Often we run into architecting an application which has unpredictable load thus making it nearly impossible to predict the resource & infrastructure requirements during designing of a system.
Additional design factors come into the picture if the application caters to the ever-changing hospitality industry, where we have to take care of dynamic content changes.
Various solution to these problems would have been discussed, but we tried to solve this problem by running multiple Docker containers on the same machine, thus utilizing the full power of hardware resources. Let’s dwell on the use case in detail for a better understanding of its design.
The hospitality business web app under consideration had a legacy technology framework. They had on-prem, single server, on LAMP stack.
Over time, the systems were prone to security attacks & performance bottlenecks. There were many single points of failures in the system like Redis servers, Nginx and HAProxy; which was dangerous, given that we only had one on-prem server.
Business wanted to scale and thus technology had to be upgraded. The need of the hour was to work in an agile mode and get things done on a short notice with high quality, reliability, and accuracy.
To cater to these technological challenges we needed to overhaul their existing framework, without impacting their business operations.
With old systems, there was substantial maintenance cost associated and therefore, the business objective was cost reduction and reliability to support business scalability.
Few options that we considered –
- Going along with AWS which provides most of the services out of the box
- Another option was to go with stand-alone servers and use Dockers
- Caching on HAProxy level or nginx level
- Using Redis Master-Slave or Sentinel architecture
We decided to go with multiple Docker containers hosted on the same server. Reason being –
- Moving to AWS was shot down given the humongous existing on-prem setup
- We scaled up the number of servers and used Dockers Swarm, which would provide an automated management tool to manage all the containers
- Docker Swarm provides better fault tolerance & high availability to the application since a server crash will not impact the application. There is one catch with Docker Swarm, fault tolerance comes with odd numbers of servers only. That means, in the case of 3 servers, the application will work perfectly if one server goes off i.e, 1:3 ratio. We hope that Docker Swarm team adds support for the even number of servers also.
- Horizontal Scalability – Swarm add hosts to handle increased load without making changes to the application
- No Downtime during server maintenance/upgrades.
- The overall server was brought down by using Dockers
In addition to using Dockers Swarm, multiple other components were re-thought.
- We moved away from traditional Redis setup to Redis Sentinel setup, thus avoiding another single point of failures in the system.
- We implemented 2 level of load balancing; one at HAProxy level by using replicas. The second one at Swarm Load Balancers. The load is distributed between servers.
- We also replicated micro apps within each container to ensure that resource utilization was most efficient at all times.
- We implemented centralized nginx with Etag for caching data. APIs data cached on nginx layer reduces the response time and increase performance. Etag is used to make sure duplication is avoided with the same data from API as Etag will change only when data is different from the one being cached in nginx.
Results: The current application runs on more than 50,000 devices placed across the United States with more than 2 million requests with zero downtime and 4x performance improvement.
Fig 1: High-level Multi-Host Architecture for our client
Fig2: The detailed layered internal architecture
Tech Stack Choices:
Tech stack is our point of focus to assist architecture focus points. Our team engage and evaluate/design using React for UI implementation and rely on python for backend integration.
Software portability and maintenance defines the tools utilized and for easier deployment. Currently, Redis is used for in-memory caching of data and sessions.
Redis is an in-memory database structure, which persists our data and hence reduces our external DB calls for fetching data. In design cases, where two-way communication is a priority along with compatibility and security,
Firebase is our choice. Firebase, which is a mobile or web application development platform provides data storage and real-time synchronization. Firebase supports both react and python and provides malleable rule sets with security features.
Software Deployment and Testing –
Given the need of being first time right in the business use case, we follow test-driven approach. In this approach, the team starts with test cases and then proceed towards actual implementation.
In order to reduce the tester dependency, the team has implemented an automated unit and integration test cases. These test cases reduce the tester’s effort by around 40%; thus improving the delivery speed and quality.
“Writing test cases is Worth the time”
In case of deployment, continuous deployment setup is engaged where jenkins will trigger a build automatically at a particular/specified time.
With production, development comes the performance part of it. In order to make sure product performance is stable, our team has used multiple tools – Jmeter, AB and gatling. In all these tools, define the API to test, the number of users and gradually increase the users to recreate the actual scenario. Such robustness testing has yielded praises from our clients.
Having deployed solutions at client base and they are testimonials to our design efforts. Our applications run on more than 50,000 devices placed across the United States with more than 2 million requests with zero downtime and 4x performance improvement.
To go ahead with multi-host implementation, we need to keep track of the following things:
- Choose framework depending on the use case
- Fault Tolerance comes with odd numbers servers only (Eg. FT will be 1 with the 3 servers, 2 with the 5 servers and so on) in Docker Swarm.