Test all the (Network) Things

(Contributor article “Test all the (Network) Things by Dan McCormick” SVP of Technology at Shutterstock. Originally appeared on )

Our engineering team supports many different sites, including the , the , the , BigstockOffset, and Skillfeed

All these sites rely on a core set of REST services for functionality like authentication, payment, and search. Since these core services are so critical, we need to know if they’re functioning properly at all times, and get alerted if they aren’t. There are plenty of solutions for server-level monitoring, but we couldn’t find a good, simple solution for service or API monitoring. So we built one. It’s called , fornetwork testing framework, and it’s part of a large collection of .

Slide & Bio…

 

(Contributor article “Isomorphic JavaScript: The Future of Web Apps by ” Front-end Engineer at Airbnb. Originally appeared on Airbnb)

At Airbnb, we’ve learned a lot over the past few years while building rich web experiences. We dove into the single-page app world in 2011 with our mobile web site, and have since launched Wish Lists and our newly-redesigned search page, among others. Each of these is a large JavaScript app, meaning that the bulk of the code runs in the browser in order to support a more modern, interactive experience.

This approach is commonplace today, and libraries like Backbone.js, Ember.js, and Angular.js have made it easier for developers to build these rich JavaScript apps. We have found, however, that these types of apps have some critical limitations. To explain why, let’s first take a quick detour through the history of web apps.

More…

 

2012, My Year of Code

2012 was supposed to be . Remember that? We were all supposed to learn to code en masse, become our own technical cofounders, and build a better digital future.

Yet despite the fact that over 450,000 people, including Mayor Bloomberg, vowed to learn to code, we’ve so far seen a notable lack of success stories.

I was successful.

What follows is a step-by-step guide on how I went from almost a complete beginner to working as a professional software developer in just over a year, as well as my general thoughts on the process.

Background: In late 2011 I shut down my YC startup due to a lack of scalability, among other reasons. I made a list of what went right, what went wrong, and most importantly what weak points I would need to address if I ever wanted to make another serious go at a startup. Not knowing how to code wasn’t the only weakness I had in my skill stack, but it was certainly a big liability for a web entrepreneur, and one of the most straightforward (if not easiest) areas to address. So when I saw the whole year-of-code thing start to take off I knew I had to do this, if only out of fear that everyone else would learn to code and I’d be left behind. At the time I was also having difficulty finding potential technical co-founders for a new startup idea, and I was told that this would become much easier if I were a developer myself.

More…

 

(Contributor article “Tracking Twitter Followers with MongoDB by André Spiegel,” Consulting Engineer at MongoDB. Originally appeared on MongoDB blog

As a recently hired engineer at MongoDB, part of my ramping-up training is to create a number of small projects with our software to get a feel for how it works, how it performs, and how to get the most out of it. I decided to try it on Twitter. It’s the age-old question that plagues every Twitter user: who just unfollowed me? Surprising or not, Twitter won’t tell you that. You can see who’s currently following you, and you get notified when somebody new shows up. But when your follower count drops, it takes some investigation to figure out who you just lost.

I’m aware there’s a number of services that will answer that question for you. Well, I wanted to try this myself.

The Idea and Execution

The basic idea is simple: You have to make calls to Twitter’s REST API to retrieve, periodically, the follower lists of the accounts you want to monitor. Find changes in these lists to figure out who started or stopped following the user in question. There are two challenging parts:

  1. When you talk to Twitter, talk slowly, lest you hit the rate limit.
  2. This can get big. Accounts can have millions of followers. If the service is nicely done, millions of users might want to use it.

The second requirement makes this a nice fit for MongoDB.

The program, which I called “followt” and wrote in Java, can be found on github. For this article, let me just summarize the overall structure:

  • The scribe library proved to be a great way to handle Twitter’s OAuth authentication mechanism.
  • Using , we can retrieve the numeric ids of 5,000 followers of a given account per minute. For large accounts, we need to retrieve the full list in batches, potentially thousands of batches in a row.
  • The numeric ids are fine for determining whether an account started or stopped following another. But if we want to display the actual user names, we need to translate those ids to screen names, using . We can make 180 of these calls per 15 minute window, and up to 100 numeric ids can be translated in each call. In order to make good use of the 180 calls we’re allowed, we have to make sure not to waste them for individual user ids, but to batch as many requests into each of these as we can. The class net.followt.UserDB in the application implements this mechanism, using a BlockingQueue for user ids.

    More…

 

(Contributor article “SmartStack: Service Discovery in the Cloud by ” Site Reliability Engineer at Airbnb. Originally appeared on Airbnb)

What is SmartStack?

SmartStack is an automated service discovery and registration framework. It makes the lives of engineers easier by transparently handling creation, deletion, failure, and maintenance work of the machines running code within your organization. We believe that our approach to this problem is among the best possible: simpler conceptually, easier to operate, more configurable, and providing more introspection than any of its kind. The SmartStack way has been battle-tested at Airbnb over the past year, and has broad applicability in many organizations, large and small.

SmartStack’s components – Nerve and Synapse – are available on GitHub! Read on to learn more about the magic under the hood.

The problem of services in an SOA

Companies like Airbnb often start out as monolithic applications – a kind of swiss army knife which performs all of the functions of the organization. As traffic (and the number of engineers working on the product) grows, this approach doesn’t scale. The code base becomes too complicated, concerns are not cleanly separated, changes from many engineers touching many different parts of the codebase go out together, and performance is determined by the worst-performing sections in the application.

The solution to this problem is services: individual, smaller code bases, running on separate machines with separate deployment cycles, that more cleanly address more targeted problem domains. This is called a services-oriented architecture: SOA.

As you build out services in your architecture, you will notice that instead of maintaining a single pool of general-purpose application servers, you are now maintaining many smaller pools. This leads to a number of problems. How do you direct traffic to each machine in the pool? How do you add new machines and remove broken or retired ones? What is the impact of a single broken machine on the rest of the application?

Dealing with these questions, across a collection of several services, can quickly grow to be a full-time job for several engineers, engaging in an arduous, manual, error-prone process fraught with peril and the potential for downtime.

More…

 

(Contributor article “How We Measured America’s Most Hospitable Cities” by Riley Newman, Head of Analytics/Data Science at Airbnb. originally appeared on Airbnb Blog)

By: Andrey Fradkin, Riley Newman & Rebecca Rosenfelt

Lately, we’ve been thinking about how we can promote and share exceptional hosting practices. We know that some hosts on our site consistently receive exceptional reviews. What are the common characteristics of these hosts?

As a first step in our investigation, we created a “Hospitality Index” that measures host quality across cities. Immediately, we saw stark regional trends in host quality and hospitality.

Methodology

To build the index of America’s most hospitable cities, we looked to reviews, our richest source of data about how a trip went. After each trip, we ask guests to rate a number of specific dimensions:

Reviews2

  • Cleanliness — a foundational aspect of any travel experience.
  • Check In — a crucial moment that affects the entire trip.
  • Communication — the primary factor in resolving queries and forestalling any issues.
  • Value — This one is a bit tricky because in some ways it encompasses all the other measures. But capturing a guest’s sense of the overall value of the experience is an important metric.
  • Accuracy — expectation management is key to a smooth Airbnb experience.

There’s a long history of criticism surrounding 5-star review systems. For example, scores tend to be binary (5 or 1). But we can be confident that a 5-star score is a good experience, at minimum. So for the index we looked at the percentage of trips (not reviews, which would be biased by review rates) where guests give 5-star scores for all of the above criteria.

Bio…

 

QAing New Code with MMS: Map/Reduce vs. Aggregation Framework

(Contributor article by Alex Giamas, Co-Founder and CTO of CareAcross. originally appeared on 10gen Blog)

When releasing software, most teams focus on correctness, and rightly so. But great teams also QA their code for performance.  can also be used to quantify the effect of code changes on your MongoDB database. Our staging environment is an exact mirror of our production environment, so we can test code in staging to reveal performance issues that are not evident in development. We take code changes to staging, where we pull data from MMS to determine if feature X will impact performance.

As a working example, we can use MMS to calculate views across a day using both Map/Reduce and the aggregation framework to compare on their performance and how they affect overall DB performance.

Our test data consists of 10M entries in a collection named views in the database named CareAcross with entries of the following style:

{
userId: “userIdName”, date: ISODate(“2013-08-28T00:00:01Z”), url: “urlEntry”,  
}

Using a simple map reduce operation we can sum on our documents values and calculate the sum per userId:

 db.views.mapReduce(function () {emit(this.userId, 1)}, function (k,v) {return Array.sum(v)}, {out:"result"})

The equivalent operation using Aggregation framework looks like this:

db.views.aggregate({$group: {_id:"$userId", total:{$sum:1}}})

The mapReduce function hits the server at 18:54. The aggregation command hits the server at 19:01.

If we compare these two operations across our data set we will get the following metrics from MMS:

More…

 

This talk is by , Assistant Professor of Operations Management at McGill University.

This talk focuses on how to create and share interactive web-friendly content with the R language. You will learn how to create interactive presentations, visualizations, web pages, blogs, applications and dashboards.

More specifically, Mr Vaidyanathan will explain how the advent of R Markdown and knitr have made it easy to create dynamic documents in different formats, including HTML. However, most content is still static and does not take full advantage of the web as an interactive medium. As a part of this talk, he will present a modular approach to inject interactivity into an R Markdown document using Slidify, rCharts and Shiny.

 

Bio…

 

This talk is by , a Data Scientist at . It is a talk sponsored by .

Josh talks extensively about the rare statistician/software engineer that is the Data Scientist. He posits that there is a continuous need for their craft especially as data sets become larger and computational skills more necessary. Josh puts forth ways to develop and grow data scientists and teams, sharing how to build an appealing and inspiring workplace for them to thrive.

 

 

Continue reading »

 

This talk is by Jan Vitek, a professor in computer science at Purdue University, recorded at Trulia.

In this video, Dr. Vitek discusses the design and implementation of Distributed Random Forest, a big data algorithm for H2O.



Want to hear from more top engineers?
Our weekly email contains the best software development content and interviews with top CTOs. Enter your email address now to stay in the loop.

 

Bio…

Proudly hosted by WPEngine