Test all the (Network) Things

(Contributor article “Test all the (Network) Things by Dan McCormick” SVP of Technology at Shutterstock. Originally appeared on )

Our engineering team supports many different sites, including the , the , the , BigstockOffset, and Skillfeed

All these sites rely on a core set of REST services for functionality like authentication, payment, and search. Since these core services are so critical, we need to know if they’re functioning properly at all times, and get alerted if they aren’t. There are plenty of solutions for server-level monitoring, but we couldn’t find a good, simple solution for service or API monitoring. So we built one. It’s called , fornetwork testing framework, and it’s part of a large collection of .

Slide & Bio…

 

(Contributor article “Isomorphic JavaScript: The Future of Web Apps by ” Front-end Engineer at Airbnb. Originally appeared on Airbnb)

At Airbnb, we’ve learned a lot over the past few years while building rich web experiences. We dove into the single-page app world in 2011 with our mobile web site, and have since launched Wish Lists and our newly-redesigned search page, among others. Each of these is a large JavaScript app, meaning that the bulk of the code runs in the browser in order to support a more modern, interactive experience.

This approach is commonplace today, and libraries like Backbone.js, Ember.js, and Angular.js have made it easier for developers to build these rich JavaScript apps. We have found, however, that these types of apps have some critical limitations. To explain why, let’s first take a quick detour through the history of web apps.

More…

 

In this talk, “How RethinkDB Works,” , Lead Engineer at RethinkDB will discuss the value of RethinkDB’s flexible schemas, ease of use, and how to scale a RethinkDB cluster from one to many nodes. He will also talk about how RethinkDB fits into the CAP theorem, and its persistence semantics. Finally, Joe will give a live demo, showing how to load and analyze data, how to scale out the cluster to achieve higher performance, and even destroy a node and show how RethinkDB handles failure. This talk was recorded at the meetup at StumbleUpon Offices.

Bio…

 

Data Driven Growth at Airbnb by  – As Airbnb’s VP of Engineering, Mike Curtis is tasked with using big data infrastructure to provide a better UX and drive massive growth. He’s also responsible for delivering simple, elegant ways to find and stay at the most interesting places in the world. He is currently working to build a team of engineers that will have a big impact as Airbnb continues to construct a bridge between the online and offline worlds. Mike’s particular focus is on search and matching, systems infrastructure, payments, trust and safety, and mobile.

Bio…

 

(Contributor article “Tracking Twitter Followers with MongoDB by André Spiegel,” Consulting Engineer at MongoDB. Originally appeared on MongoDB blog

As a recently hired engineer at MongoDB, part of my ramping-up training is to create a number of small projects with our software to get a feel for how it works, how it performs, and how to get the most out of it. I decided to try it on Twitter. It’s the age-old question that plagues every Twitter user: who just unfollowed me? Surprising or not, Twitter won’t tell you that. You can see who’s currently following you, and you get notified when somebody new shows up. But when your follower count drops, it takes some investigation to figure out who you just lost.

I’m aware there’s a number of services that will answer that question for you. Well, I wanted to try this myself.

The Idea and Execution

The basic idea is simple: You have to make calls to Twitter’s REST API to retrieve, periodically, the follower lists of the accounts you want to monitor. Find changes in these lists to figure out who started or stopped following the user in question. There are two challenging parts:

  1. When you talk to Twitter, talk slowly, lest you hit the rate limit.
  2. This can get big. Accounts can have millions of followers. If the service is nicely done, millions of users might want to use it.

The second requirement makes this a nice fit for MongoDB.

The program, which I called “followt” and wrote in Java, can be found on github. For this article, let me just summarize the overall structure:

  • The scribe library proved to be a great way to handle Twitter’s OAuth authentication mechanism.
  • Using , we can retrieve the numeric ids of 5,000 followers of a given account per minute. For large accounts, we need to retrieve the full list in batches, potentially thousands of batches in a row.
  • The numeric ids are fine for determining whether an account started or stopped following another. But if we want to display the actual user names, we need to translate those ids to screen names, using . We can make 180 of these calls per 15 minute window, and up to 100 numeric ids can be translated in each call. In order to make good use of the 180 calls we’re allowed, we have to make sure not to waste them for individual user ids, but to batch as many requests into each of these as we can. The class net.followt.UserDB in the application implements this mechanism, using a BlockingQueue for user ids.

    More…

 

(Contributor article “SmartStack: Service Discovery in the Cloud by ” Site Reliability Engineer at Airbnb. Originally appeared on Airbnb)

What is SmartStack?

SmartStack is an automated service discovery and registration framework. It makes the lives of engineers easier by transparently handling creation, deletion, failure, and maintenance work of the machines running code within your organization. We believe that our approach to this problem is among the best possible: simpler conceptually, easier to operate, more configurable, and providing more introspection than any of its kind. The SmartStack way has been battle-tested at Airbnb over the past year, and has broad applicability in many organizations, large and small.

SmartStack’s components – Nerve and Synapse – are available on GitHub! Read on to learn more about the magic under the hood.

The problem of services in an SOA

Companies like Airbnb often start out as monolithic applications – a kind of swiss army knife which performs all of the functions of the organization. As traffic (and the number of engineers working on the product) grows, this approach doesn’t scale. The code base becomes too complicated, concerns are not cleanly separated, changes from many engineers touching many different parts of the codebase go out together, and performance is determined by the worst-performing sections in the application.

The solution to this problem is services: individual, smaller code bases, running on separate machines with separate deployment cycles, that more cleanly address more targeted problem domains. This is called a services-oriented architecture: SOA.

As you build out services in your architecture, you will notice that instead of maintaining a single pool of general-purpose application servers, you are now maintaining many smaller pools. This leads to a number of problems. How do you direct traffic to each machine in the pool? How do you add new machines and remove broken or retired ones? What is the impact of a single broken machine on the rest of the application?

Dealing with these questions, across a collection of several services, can quickly grow to be a full-time job for several engineers, engaging in an arduous, manual, error-prone process fraught with peril and the potential for downtime.

More…

 

This is an Apache Zookeeper introduction – In this talk, , from Rent The Runway, gives an introduction to ZooKeeper. She talks on why it’s useful and how you should use it once you have it running. Camille goes over the high-level purpose of ZooKeeper and covers some of the basic use cases and operational concerns. One of the requirements for running Storm or a Hadoop cluster is to have a reliable Zookeeper setup. When you’re running a service distributed across a large cluster of machines, even tasks like reading configuration information, which are simple on single-machine systems, can be hard to implement reliably. This talk was recorded at the NYC Storm User Group meetup at WebMD Health.

Interested in the Tech Challenges at Rent the Runway?
If you’re looking for a super smart team working on significant problems in the areas of data science and logistics don’t miss this opportunity to connect directly with an engineer inside Rent the Runway.

 

 

The ZooKeeper framework was originally built at Yahoo! to make it easy for the company’s applications to access configuration information in a robust and easy-to-understand way, but it has since grown to offer a lot of features that help coordinate work across distributed clusters. Apache Zookeeper became a de-facto standard for coordination service and used by Storm, Hadoop, HBase, ElasticSearch and other distributed computing frameworks.

Slides & Bio…

 

(Contributor article “Schema Design for Time Series Data in MongoDB by by ,” Solutions Architect at MongoDB and , Director of Product Marketing at MongoDB. Originally appeared on MongoDB blog)

Data as Ticker Tape

New York is famous for a lot of things, including ticker tape parades.

For decades the most popular way to track the price of stocks on Wall Street was through ticker tape, the earliest digital communication medium. Stocks and their values were transmitted via telegraph to a small device called a “ticker” that printed onto a thin roll of paper called “ticker tape.” While out of use for over 50 years, the idea of the ticker lives on in scrolling electronic tickers at brokerage walls and at the bottom of most news networks, sometimes two, three and four levels deep.

Today there are many sources of data that, like ticker tape, represent observations ordered over time. For example:

  • Financial markets generate prices (we still call them “stock ticks”).
  • Sensors measure temperature, barometric pressure, humidity and other environmental variables.
  • Industrial fleets such as ships, aircraft and trucks produce location, velocity, and operational metrics.
  • Status updates on social networks.
  • Calls, SMS messages and other signals from mobile devices.
  • Systems themselves write information to logs.

This data tends to be immutable, large in volume, ordered by time, and is primarily aggregated for access. It represents a history of what happened, and there are a number of use cases that involve analyzing this history to better predict what may happen in the future or to establish operational thresholds for the system.

Time Series Data and MongoDB

Time series data is a great fit for MongoDB. There are many examples of organizations using MongoDB to store and analyze time series data. Here are just a few:

  • Silver Spring Networks, the leading provider of smart grid infrastructure, analyzes utility meter data in MongoDB.
  • EnerNOC analyzes billions of energy data points per month to help utilities and private companies optimize their systems, ensure availability and reduce costs.
  • Square maintains a MongoDB-based open source tool called Cube for collecting timestamped events and deriving metrics.
  • Server Density uses MongoDB to collect server monitoring statistics.
  • Skyline Innovations, a solar energy company, stores and organizes meteorological data from commercial scale solar projects in MongoDB.
  • One of the world’s largest industrial equipment manufacturers stores sensor data from fleet vehicles to optimize fleet performance and minimize downtime.

In this post, we will take a closer look at how to model time series data in MongoDB by exploring the schema of a tool that has become very popular in the community: . MMS helps users manage their MongoDB systems by providing monitoring, visualization and alerts on over 100 database metrics. Today the system monitors over 25k MongoDB servers across thousands of deployments. Every minute thousands of local MMS agents collect system metrics and ship the data back to MMS. The system processes over 5B events per day, and over 75,000 writes per second, all on less than 10 physical servers for the MongoDB tier.

More…

 

Rails Challenges and Demystifying Rest APIs – First Talk by : This talk is for Rails initiates as well as the managers and mentors who help them gain skills and improve. Rails can be difficult for new developers. In this talk Daniel Kehoe talks on the challenges of learning Rails and how to help all Rails developers overcome the obstacles.

Second Talk by : Struggling with integrating Web Services? Kirsten Jones from 3Scale will give you a whirlwind tour of the entire web services stack, from the HTTP layer to authentication models, and demonstrate methods you can use to debug issues when integrating with APIs.  Common issues and resolutions will be covered as well, to make sure you can get those integrations up and running with a minimum of hair-pulling.

Bios…

 

In this talk, from Tumblr gives an “Introduction to Digital Signal Processing in Hadoop”. Adam introduces the concepts of digital signals, filters, and their interpretation in both the time and frequency domain, and he works through a few simple examples of low-pass filter design and application. It’s much more application focused than theoretical, and there is no assumed prior knowledge of signal processing. This talk was recorded at the NYC Machine Learning Meetup at Pivotal Labs.

Adam also works through how they can be used either in a real-time stream or in batch-mode in Hadoop (with Scalding).  He also has some examples of how to detect trendy meme-ish blogs on Tumblr.

Slides & Bio…

Proudly hosted by WPEngine