Scalable Application Architecture

Architecting scalable system is more challenging and when it’s done right its more rewarding.  Top tech companies like Google and Amazon are world-famous for their versatile high performing systems. They are all highly versatile and famous because they are all highly scalable.  To simply put, scalability is known as the capability of a system to efficiently meet its growing demand/users without adversely affecting its performance, cost, maintainability etc.

Performance – capability of a system on how efficiently it performs its tasks in a given time and utilizes its resources, again a most important factor/feature of a system.  In a non-profit public applications, it’s somehow tolerable to perform a task in two to three seconds whereas in commercial and mission critical applications it’s not at all tolerable when time taken to perform certain task exceeds even two to three seconds. When a commercial system just hit market and gaining users/popularity is not vulnerable to scalability issues as the system is just gaining users. Performance of the system can be considerably high. Once application picks up market and system demand grows drastically over time, there is a lot of chances that an application might highly vulnerable to performance and scalability issues. When the system is not designed with future performance and scalability issues in mind, the system will surely lose its hard-earned potential customers over time.

General techniques for scalable system design

Performance and scalability design and implementation is done in almost all of the phases of system development starting from planning/requirement analysis to even after deploying to production environment. So we will discuss here on how to make decision on performance and scalable capabilities of a system in three stages of a system life-cycle.

  1. Planning and Requirement analysis phase
  2. Design and Development phase
  3. Production and Maintenance phase

Planning and Requirement analysis phase:

Major software design decisions and system modeling are made at this stage. Making right and futuristic decision at this stage is crucial for the success of the project.  System and performance model can be created to evaluate the system design before investing time and resources to implement a flawed design.

The time, effort and money invested upfront in performance modeling should be proportional to project risk. For a project with significant risk, where performance is critical, you may spend more time and energy up front developing the model. For a project where performance is less of a concern, modeling might be simple.

Budget represents the constraints and enables us to specify how much we can spend (resource-wise) and how we plan to spend it. Constraints govern our total spending, and then we can decide where to spend to get to the total. We assign budget in terms of response time, throughput, latency and resource utilization.

The performance model we develop helps us capture the following important information upfront:

Category Description
Application Description The design of the application in terms of its layers and its target infrastructure.
Scenarios Critical and significant use cases, sequence diagrams, and user stories relevant to performance.
Performance objectives Response time, throughput, resource utilization
Budgets Constraints we set on the execution of use cases, such as maximum execution time and resource utilization levels, including CPU, memory, disk I/O and network I/O.
Workload goals Goals for the number of users, concurrent users, data volumes, and information about the desired use of the application.
Quality-of-service (QoS) requirements QoS requirements, such as security, maintainability, and interoperability, may impact the performance. So We should have an agreement across software and infrastructure teams about QoS restrictions and requirements.

2. Design and Development phase:

Design and development phase is another important phase of software development where we have to focus on performance and scalability requirements. Typically any website application will have static contents and business logic and data storage. It’s really a wise idea to layer them separately so that each layer can scale independently from one another. Following flexible architecture is important so that when there is any new change comes the architecture should be able to adapt easily to the change without affecting performance or scalability.

Static content –  consists of Images(JPEG, GIF or other), HTML, JavaScript and CSS, can be layered separately  so that it can be hosted independently from other part of the system leading to efficient scaling. Since static contents are stateless, replica can be easily created in a cluster environment for high availability. Content Delivery Network can be formed in geographically different locations for the static content which will promote high scalability and availability of static content.

Business logic layer – consists of application logic written in C#, Java or other programming language, should also be layered separately and should be able to be hosted in its own server to promote scalability.  A great approach to implement business logic is to implement and expose them as Web Service, SOAP based or REST based. REST based services are highly scalable and interoperable compared to SOAP based services. Parallelism can also be applied easily on business logic when it’s designed and implemented highly modular. Parallel processing applications can perform really well when user demands are more and they try to access the system more concurrently.  Another interesting idea to apply on business logic layer is that when various logical parts of the system are grouped/layered separately based on its functional scenarios, like order processing module from report processing module, it’s really easy to decide to create more nodes for highly accessible module like order processing compare to less accessible module like report processing in a cluster environment.

4. Production and Maintenance phase:

In this final phase of SDLC, we will see the actual result of all the efforts we put for bringing up the system. During the initial launch of the system there may not be issues about availability of the system as the users are just growing and system is very well capable to cope up with the low or medium demand. When user counts shooting up to a level beyond the capability of the system, that’s when all the availability, performance and scalability issues are starting to pour in.

Suppose we reached the threshold limit of the system capability beyond which system tends to experience issues, then following are the areas we have to target to sort out the issues.

  1. Performance tuning.
  2. Scaling up (vertical scaling).
  3. Scaling out (Horizontal scaling).

Performance tuning.

This step would consist of refactoring application’s source code, analyzing an application’s configuration settings, attempting to further parallelize application’s logic and implementing caching strategies. When application matures over time, performance tuning in the above said areas might not be possible as there won’t be any scope for further performance tuning.

Scaling up (vertical scaling)

Scaling up is adding more resource to the existing web server in order to increase the performance of the system when demand grows. Resources can be adding more core processors, more physical memory. Vertical scaling is relatively simpler to do because it requires no changes to an application, a web application simply goes from running on a single node to running on a single node with more resources.

Even though vertical scaling is the simplest of scaling techniques, there are limitations to it. Limitations on vertical scaling can be due to the operating system itself or an operational constraint like security, management or a provider’s architecture.

Operating system OS type Physical memory
Windows server 2008 standard 32-bits 4 GB
Windows server 2008 standard 64-bits 32 GB
Linux 32-bits 1 GB to 4 GB
Linux 64-bits 4GB to 32 GB

As you can see operating system have limitations on increasing memory beyond that the system cannot be upgraded, so performance cannot be increased after that limit.

Scaling out (Horizontal scaling)

Scaling out refers to adding more nodes forming a cluster environment and making the application to run those nodes to provide high availability. In scaling terminology, this implies that the node on which a web application is running cannot be equipped with more CPU, memory, Bandwidth or I/O capacity and thus a web application is split to run on multiple boxes or nodes.

When we are planning to horizontally scale our web application, we have to concentrate on how to scale out the three major layers of our web application, Static content layer, business logic layer and data storage layer.

Scaling out static content layer is always easy as it’s stateless, because no matter which node of a cluster we use to retrieve our static content it’s going to be the same content. So now the question is when to scale static content layer? Well, the reason is when we experience increased latency in loading the static content, such as HTML files, CSS files, JavaScript files, on the browser.

When we plan to horizontally scale static content, the first thing we need to address is how to set up master-slave architecture. A master being the node where you would apply changes made to an application static content layer and the slave node(s) being the one(s) that receive updates (replicated/synchronized) from the master at the predetermined time.

Unlike scaling out static content layer, business logic layer is so sensitive because it maintains state.  Meaning, when node1 handles order from 1 to 5000 customers in an order processing system and node 2 handles orders from 5001 to 10000, and when customer who belongs to order 200 is good to carry out transactions as long as he/she is routed to node1. Problem arises when he is routed to node2 as node2 has no clue about order 200.

To handle the above said problem the solution lies in specialized software such as Terracotta, GigaSpaces, Oracle coherence etc.  These softwares solve the issues through replication and synchronization. In addition to that this problem can also be solved by making business logic tier and permanent storage tier working together. We can choose better database which can horizontally scale well in a distributed cluster environment. There are many such Dbs available in the market, Apache Cassandra, amazon SimpleDB etc.

Scaling permanent storage tier is a huge topic by itself, so I am leaving it as a scope for my future articles. 🙂


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s