March 2, 2016

Shipping Software at Scale

Going into my second internship, my goal was to understand what it takes to develop web applications that scale to millions of users.

No server can fulfill millions of concurrent requests. Thus, a scalable application must run on an arbitrary number of machines. Despite being distributed, we want our application to be addressable and function as if it were a unit. For example, although Facebook is hosted on many thousands of machines, the site feels as if it may be hosted on one mega-machine, being accessible in the same way (www.facebook.com) and having the same data.

The challenge of developing a scalable application is to create the impression of being a cohesive application – being accessible in the same way, having the same data – while running on multiple machines.

Let’s call a set of servers (from this point forward, a host) running an application a cluster. Each host in a cluster is fungible and stateless. A load balancer directs requests within the cluster to idle hosts, which can process that request.

The cluster is the application. However, it is convenient to think of the functionality of the cluster in terms of the functionality of a single host; the cluster then does a lot of what that host does, in parallel.

Remember: each host is fungible and stateless.

Let’s take a brief interlude to examine some salient properties to do with the robustness of such a distributed application.

The lifecycle of a host’s activity begins with its receiving a request. The host makes a series of computations and dependency calls based on that request and computes a response. The response is sent back and the application is ready to handle the next request.

It is important that a hosts is stateless both because it is unlikely a user will access the application through the same host every time (and we want our application to compute a uniform response) and because persistent storage has much stronger guarantees about the availability of data. Losing data in a way that is non-deterministic and difficult to track is bad.

Because distributed applications run on many machines, the likelihood of some failures occurring is very high. Think of it as the limit as n tends to infinity of .999^n.

Per the lifecycle of a host, one can see that a failure of the application would mean the caller does not receive a response in reasonable time. It is important to note that although the aggregate probability of some failure occurring is high, the probability of consecutive failures occurring quickly tends to zero. Thus, recovering from failure is quick and painless. Either the user or the application can simply retry failed requests.

The probability of requests failing after some number of retries can be thought of as the limit as n tends to infinity of .001^n.

The last piece of the puzzle has to do with making the code that lives on each host easy to write. Dependencies, such as a database or another application, must be addressable as a unit despite also running on multiple machines. For example, when you accessing Facebook, the particular host handling your request should be able to call its Authentication dependency just as easily as you can open the page. Having the code running on a host written as if it were an application interacting with other whole applications makes it easy to reason about its logic.

The problem of developing software at scale is then reduced to (aside from, you know, the awesome stuff like Kafka, HBase, and load balancers that have taken millions of man hours to develop) getting your code up and running onto a bunch of hosts.

We need to get our code live to hundreds of machines. How? Build a continuous deployment pipeline.

Fortunately, deploying software to multiple of hosts is not significantly more woesome than a simple deployment. The process is modeled by something called a pipeline. Pipelines specify a sequence of environments (a set of hosts, cluster, typically having some designation e.g. Beta, Production) and conditions must be met for software to go from one environment to the next.

A prototypical pipeline functions as follows: you write and commit some code. The pipeline specifies to put that code onto a few test servers and run tests. Should those tests pass, the pipeline sends the code to another set of test++ servers. These may receive a small portion of production traffic. If there are errors, the update is rejected and the old code is restored. Otherwise, the update is gradually propagated to all production hosts. Along the way, pipelines may pick up additional dependencies, and modify host configurations, but that’s mostly incidental to their purpose.

The process supported by a pipeline is called continuous deployment, and is a cornerstone of any large web application.

The quality of tests is keystone to the integrity of the pipeline process.

The quality of tests is keystone to the integrity of the pipeline process. Much complexity can baked into the testing stage to ensure that updates do not break functionality. For example, requests can be run in parallel with both the old and new code and the pipeline can verify that the output of both is the same or differs as expected. Others may integrate sophisticated A/B testing frameworks into their deployment infrastructure at this stage, allowing developers to adjust traffic and monitor the performance of an update, even against business metrics.

A continous deployment pipeline allows you to reason about your application as if it were on a single host.

When used correctly, a continuous deployment pipeline allows you to reason about your application as if it were running on a single host, easily scale by spinning servers up and down, and have an effortless process for getting updates from your singular development environment to thousands of production hosts.

Isn’t that poetic? The developer gets to write software as if it were running on a single machine and the user gets to experience the application likewise.

Isn’t programming poetic?

There’s plenty more to be said about how to ensure that an application is free of bottlenecks, but that’s a topic for another day. For now, in the words of my mentor, “thread pools are the new if statements.”

Also, try not to have synchronous calls between dependencies, especially when those calls may take more than single digit (or low double digit) milliseconds.

Kudos

Shipping Software at Scale

Now read this

Building A Convolutional Neural Network With TensorFlow