(Contributor article “SmartStack: Service Discovery in the Cloud by Igor Serebryany” Site Reliability Engineer at Airbnb. Originally appeared on Airbnb)
What is SmartStack?
SmartStack is an automated service discovery and registration framework. It makes the lives of engineers easier by transparently handling creation, deletion, failure, and maintenance work of the machines running code within your organization. We believe that our approach to this problem is among the best possible: simpler conceptually, easier to operate, more configurable, and providing more introspection than any of its kind. The SmartStack way has been battle-tested at Airbnb over the past year, and has broad applicability in many organizations, large and small.
SmartStack’s components – Nerve and Synapse – are available on GitHub! Read on to learn more about the magic under the hood.
The problem of services in an SOA
Companies like Airbnb often start out as monolithic applications – a kind of swiss army knife which performs all of the functions of the organization. As traffic (and the number of engineers working on the product) grows, this approach doesn’t scale. The code base becomes too complicated, concerns are not cleanly separated, changes from many engineers touching many different parts of the codebase go out together, and performance is determined by the worst-performing sections in the application.
The solution to this problem is services: individual, smaller code bases, running on separate machines with separate deployment cycles, that more cleanly address more targeted problem domains. This is called a services-oriented architecture: SOA.
As you build out services in your architecture, you will notice that instead of maintaining a single pool of general-purpose application servers, you are now maintaining many smaller pools. This leads to a number of problems. How do you direct traffic to each machine in the pool? How do you add new machines and remove broken or retired ones? What is the impact of a single broken machine on the rest of the application?
Dealing with these questions, across a collection of several services, can quickly grow to be a full-time job for several engineers, engaging in an arduous, manual, error-prone process fraught with peril and the potential for downtime.
More…