Location: New York, NY & San Francisco, CA
Foursquare
Location: New York, NY & San Francisco, CA
In this talk, Joe Crobak, formerly from Foursquare, will give a brief overview of how a workflow engine fits into a standard Hadoop-based analytics stack. He will also give an architectural overview of Azkaban, Luigi, and Oozie, elaborating on some features, tools, and practices that can help build a Hadoop workflow system from scratch or improve upon an existing one. This talk was recorded at the NYC Data Engineering meetup at Ebay.
Building a reliable pipeline of data ingress, batch computation, and data egress with Hadoop can be a major challenge. Most folks start out with cron to manage workflows, but soon discover that doesn’t scale past a handful of jobs. There are a number of open-source workflow engines with support for Hadoop, including Azkaban (from LinkedIn), Luigi (from Spotify), and Apache Oozie. Having deployed all three of these systems in production, Joe talks about what features and qualities are important for a workflow system.
(Original post with audio and slides is here )
Blake Shaw: Thank you all for coming. As was mentioned, my name is Blake, and today I’m going to be talking about machine learning with large networks of people and places. So, here at Foursquare, we think there’s a great opportunity to leverage massive amounts of location data to help people better understand and connect with places all over the world. Continue reading »
In this talk given at the Prince Building Tech Talks Series, , a data scientist at Foursquare, discusses the role of data science in Foursquare’s planned location recommendation service.
Foursquare is now aware of over 1.5 billion check-ins from 15 million people at 30 million different places all over the world. Each check-in can be thought of as an edge in a vast network connecting people to each other and to the places that they care about most.
Podcast: Play in new window | Download
Continue reading »