Open source software engineering developer community and events

Test all the (Network) Things by Dan McCormick

Nov 152013

(Contributor article “Test all the (Network) Things by Dan McCormick” SVP of Technology at Shutterstock. Originally appeared on )

Our engineering team supports many different sites, including the , the , the , Bigstock, Offset, and Skillfeed

All these sites rely on a core set of REST services for functionality like authentication, payment, and search. Since these core services are so critical, we need to know if they’re functioning properly at all times, and get alerted if they aren’t. There are plenty of solutions for server-level monitoring, but we couldn’t find a good, simple solution for service or API monitoring. So we built one. It’s called , fornetwork testing framework, and it’s part of a large collection of .

Slide & Bio…

November 15, 2013
Shutterstock
API, architecture, monitoring, REST, testing

Core.async a Clojure Library for Asynchronous Programming by David Nolen – Transcript

Nov 142013

Note: This is the transcript from the “Core.async a Clojure Library for Asynchronous Programming” presentation by from The New York Times. The video of his presentation can be found here: https://g33ktalk.com/core-async-a-clojure-library/

David Nolen: My name is David Nolen. I’m going to talk a bit about ClojureScript in core.async. How many people here have ever read about Core’s communicating sequential processes? Cool. That’s good. Has anybody ever tried using Golang, Rob Pike’s Golang? Only one, okay. Has anybody used a language that actually implements CSP? I mean, Go is one. So not that many. So this is something I think is really funny that something that nearly everybody has heard of, but nobody has tried. There’s been languages in the past. You had Occam-pi for the transputer, you had Concurrent ML, which was a variant of Standard ML that supported CSP. Then Go is actually really making waves. People really like it. I don’t really like it, but I think the CSP aspect of it is actually pretty cool. It very much holds very closely to Tony Hoare’s ideas.

So, Rich Hickey decided more or less to just copy Go’s interpretation of Tony Hoare’s original ideas, so I’m not going to assume that you know too much about CSP, and so we’ll go slow, but we’ll end up going fast later, so it won’t be boring if you think you know this stuff.

More…

Isomorphic JavaScript: The Future of Web Apps by Spike Brehm

Nov 122013

(Contributor article “Isomorphic JavaScript: The Future of Web Apps by ” Front-end Engineer at Airbnb. Originally appeared on Airbnb)

At Airbnb, we’ve learned a lot over the past few years while building rich web experiences. We dove into the single-page app world in 2011 with our mobile web site, and have since launched Wish Lists and our newly-redesigned search page, among others. Each of these is a large JavaScript app, meaning that the bulk of the code runs in the browser in order to support a more modern, interactive experience.

This approach is commonplace today, and libraries like Backbone.js, Ember.js, and Angular.js have made it easier for developers to build these rich JavaScript apps. We have found, however, that these types of apps have some critical limitations. To explain why, let’s first take a quick detour through the history of web apps.

More…

November 12, 2013
Airbnb
Javascript

Hacking Recruiting by Peter Soderling – Transcript on Engineering PR

Nov 112013

Announcer: Okay, so our next speaker is Pete Soderling. He is the founder of G33kTalk, which is—I don’t know, I guess, I think it’s going to be like a multimedia empire in a little bit, but he has a really good mailing list that sends out some really interesting articles, mostly technology articles, a little bit on technical leadership, as well. I’m not sure how he does it, like he operates both in New York and San Francisco. He actually records meetups, and so he puts some of the videos up on his website, as well. And so, he’s certainly somebody that’s very much embedded in the community. He gave the—well, a longer version of this talk at QCon, because recruiting is also something that he is good at in addition to multimedia and stuff. So, without further ado.

Pete Soderling: Thank you. Hey, guys. You’ve been sitting here for a while, so I’ll try to keep it to the point. I want to introduce you to the concept of Engineering PR. First of all, who am I, and why the hell am I up here? I’m an engineer from the first bubble, a hacker before that. I turned programmer in the mid-90s, and I ended up turned entrepreneur in 2003, so I’ve seen and hired lots of engineers over the last fifteen years, and now, I do consulting with top startups in New York and the Bay Area, and the CTO’s directing, helping them figure out how to build the best engineering teams. I’m also the founder of G33ktalk, as John mentioned, Keith mentioned, and now, I’ll tell you more about that, as well.

So why should you care about hiring? If you’re an engineer who’s already in leadership, you know exactly why because it’s important to build the best team. If you’re an engineer who wants to get into leadership, this is the single most important thing that you can learn that you might not already know. I do a lot of career coaching with engineers, and, being originally a self-taught engineer myself, it’s become apparent to me that some of the softer aspects of leadership management, hiring, recruiting, retention, team building—these things are crucial, and it’s especially hard in the current market because the market dynamics are quite lopsided.

More…

November 11, 2013
Uncategorized
Career, Engineering PR, open source, recruiting

How RethinkDB Works by Joe Doliner

Nov 082013

In this talk, “How RethinkDB Works,” , Lead Engineer at RethinkDB will discuss the value of RethinkDB’s flexible schemas, ease of use, and how to scale a RethinkDB cluster from one to many nodes. He will also talk about how RethinkDB fits into the CAP theorem, and its persistence semantics. Finally, Joe will give a live demo, showing how to load and analyze data, how to scale out the cluster to achieve higher performance, and even destroy a node and show how RethinkDB handles failure. This talk was recorded at the meetup at StumbleUpon Offices.

Bio…

November 8, 2013
RethinkDB
big data, Distributed Database, json, MapReduce, RethinkDB

Data Driven Growth at Airbnb by Mike Curtis

Nov 072013

Data Driven Growth at Airbnb by – As Airbnb’s VP of Engineering, Mike Curtis is tasked with using big data infrastructure to provide a better UX and drive massive growth. He’s also responsible for delivering simple, elegant ways to find and stay at the most interesting places in the world. He is currently working to build a team of engineers that will have a big impact as Airbnb continues to construct a bridge between the online and offline worlds. Mike’s particular focus is on search and matching, systems infrastructure, payments, trust and safety, and mobile.

Bio…

November 7, 2013
Airbnb
big data, Infrastructure

Tracking Twitter Followers with MongoDB by André Spiegel

Nov 072013

(Contributor article “Tracking Twitter Followers with MongoDB by André Spiegel,” Consulting Engineer at MongoDB. Originally appeared on MongoDB blog)

As a recently hired engineer at MongoDB, part of my ramping-up training is to create a number of small projects with our software to get a feel for how it works, how it performs, and how to get the most out of it. I decided to try it on Twitter. It’s the age-old question that plagues every Twitter user: who just unfollowed me? Surprising or not, Twitter won’t tell you that. You can see who’s currently following you, and you get notified when somebody new shows up. But when your follower count drops, it takes some investigation to figure out who you just lost.

I’m aware there’s a number of services that will answer that question for you. Well, I wanted to try this myself.

The Idea and Execution

The basic idea is simple: You have to make calls to Twitter’s REST API to retrieve, periodically, the follower lists of the accounts you want to monitor. Find changes in these lists to figure out who started or stopped following the user in question. There are two challenging parts:

When you talk to Twitter, talk slowly, lest you hit the rate limit.
This can get big. Accounts can have millions of followers. If the service is nicely done, millions of users might want to use it.

The second requirement makes this a nice fit for MongoDB.

The program, which I called “followt” and wrote in Java, can be found on github. For this article, let me just summarize the overall structure:

The scribe library proved to be a great way to handle Twitter’s OAuth authentication mechanism.
Using , we can retrieve the numeric ids of 5,000 followers of a given account per minute. For large accounts, we need to retrieve the full list in batches, potentially thousands of batches in a row.
The numeric ids are fine for determining whether an account started or stopped following another. But if we want to display the actual user names, we need to translate those ids to screen names, using . We can make 180 of these calls per 15 minute window, and up to 100 numeric ids can be translated in each call. In order to make good use of the 180 calls we’re allowed, we have to make sure not to waste them for individual user ids, but to batch as many requests into each of these as we can. The class net.followt.UserDB in the application implements this mechanism, using a BlockingQueue for user ids. More…

November 7, 2013
MongoDB
MongoDB

Understanding and Managing Cassandra’s Vnodes + Under the Hood: Acunu Analytics by Tim Moreton and Nicolas Favre-Felix

Nov 062013

“Understanding and Managing Cassandra’s Vnodes + Under the Hood: Acunu Analytics” - In this talk, , Founder and CTO at , and , Software Engineer at Acunu Analytics, share the concept, implementation and benefits of virtual nodes in Apache Cassandra 1.2 & 2.0. They also go over why virtual nodes are a replacement to token management, and how to use Acunu Analytics to collect event data, build OLAP-style cubes and ask SQL-like queries via a RESTful API, on top of Cassandra. This talk was recorded at the DataStax Cassandra SF users group meetup.

More Info…

November 6, 2013
Uncategorized
API, big data, cassandra, REST, SQL

SmartStack: Service Discovery in the Cloud by Igor Serebryany

Nov 052013

(Contributor article “SmartStack: Service Discovery in the Cloud by Igor Serebryany” Site Reliability Engineer at Airbnb. Originally appeared on Airbnb)

What is SmartStack?

SmartStack is an automated service discovery and registration framework. It makes the lives of engineers easier by transparently handling creation, deletion, failure, and maintenance work of the machines running code within your organization. We believe that our approach to this problem is among the best possible: simpler conceptually, easier to operate, more configurable, and providing more introspection than any of its kind. The SmartStack way has been battle-tested at Airbnb over the past year, and has broad applicability in many organizations, large and small.

SmartStack’s components – Nerve and Synapse – are available on GitHub! Read on to learn more about the magic under the hood.

The problem of services in an SOA

Companies like Airbnb often start out as monolithic applications – a kind of swiss army knife which performs all of the functions of the organization. As traffic (and the number of engineers working on the product) grows, this approach doesn’t scale. The code base becomes too complicated, concerns are not cleanly separated, changes from many engineers touching many different parts of the codebase go out together, and performance is determined by the worst-performing sections in the application.

The solution to this problem is services: individual, smaller code bases, running on separate machines with separate deployment cycles, that more cleanly address more targeted problem domains. This is called a services-oriented architecture: SOA.

As you build out services in your architecture, you will notice that instead of maintaining a single pool of general-purpose application servers, you are now maintaining many smaller pools. This leads to a number of problems. How do you direct traffic to each machine in the pool? How do you add new machines and remove broken or retired ones? What is the impact of a single broken machine on the rest of the application?

Dealing with these questions, across a collection of several services, can quickly grow to be a full-time job for several engineers, engaging in an arduous, manual, error-prone process fraught with peril and the potential for downtime.

More…

November 5, 2013
Airbnb
automation, HAProxy, Infrastructure, SmartStack, zookeeper

Apache Zookeeper Introduction By Camille Fournier

Nov 042013

This is an Apache Zookeeper introduction – In this talk, , from Rent The Runway, gives an introduction to ZooKeeper. She talks on why it’s useful and how you should use it once you have it running. Camille goes over the high-level purpose of ZooKeeper and covers some of the basic use cases and operational concerns. One of the requirements for running Storm or a Hadoop cluster is to have a reliable Zookeeper setup. When you’re running a service distributed across a large cluster of machines, even tasks like reading configuration information, which are simple on single-machine systems, can be hard to implement reliably. This talk was recorded at the NYC Storm User Group meetup at WebMD Health.

Interested in the Tech Challenges at Rent the Runway?

If you’re looking for a super smart team working on significant problems in the areas of data science and logistics don’t miss this opportunity to connect directly with an engineer inside Rent the Runway.

The ZooKeeper framework was originally built at Yahoo! to make it easy for the company’s applications to access configuration information in a robust and easy-to-understand way, but it has since grown to offer a lot of features that help coordinate work across distributed clusters. Apache Zookeeper became a de-facto standard for coordination service and used by Storm, Hadoop, HBase, ElasticSearch and other distributed computing frameworks.

Slides & Bio…

November 4, 2013
Rent The Runway
big data, Hadoop, Infrastructure, zookeeper

Older Entries Newer Entries

Test all the (Network) Things by Dan McCormick

Core.async a Clojure Library for Asynchronous Programming by David Nolen – Transcript

Isomorphic JavaScript: The Future of Web Apps by Spike Brehm

Hacking Recruiting by Peter Soderling – Transcript on Engineering PR

How RethinkDB Works by Joe Doliner

Data Driven Growth at Airbnb by Mike Curtis

Tracking Twitter Followers with MongoDB by André Spiegel

(Contributor article “Tracking Twitter Followers with MongoDB by André Spiegel,” Consulting Engineer at MongoDB. Originally appeared on MongoDB blog)

The Idea and Execution

Understanding and Managing Cassandra’s Vnodes + Under the Hood: Acunu Analytics by Tim Moreton and Nicolas Favre-Felix

SmartStack: Service Discovery in the Cloud by Igor Serebryany

Apache Zookeeper Introduction By Camille Fournier

Training: Practical Machine Learning for Engineers

DATA ENGINEERING NEWSLETTER

Categories

Archives