(Original post with video of talk here)
Abe Stanway: Okay. Hi, I’m Abe. I’m a data engineer at Etsy and today we’re going to talk about Skyline, of which I was the primary author. And so we’re going to talk about how we monitor, why we decided to build this, and how it advances the art of monitoring. So let’s start.
So Etsy is the world’s handmade vintage marketplace. We are based right here in Dumbo, so this wasn’t too much of a pain to get up here. We have a large stack. We’ve got a lot of stuff going on. Specifically, or actually not specifically at all, these are just some of our numbers of some of the servers that we’re dealing with – 41 shards, MySQL, 24 API servers, 72 web servers, 42 Gearman boxes, a 150 node Hadoop cluster, 15 memcached boxes, and around 60 search machines, and a lot more than that. Probably on a scale of a hundred to two hundred, for sure other various services come with a lot of things that we have.
And that’s not to mention the app itself, which is running on top of all these machines, and all the services that are actually running on these machines. In addition to that, we practice something called continuous deployment, which is kind of the new hotness we’ve developed with Devoxx, right. It’s kind of always deploying every single day, so we deploy around thirty to sixty times a day, every day, and we make this really really easy to do for all our engineers.
Continue reading »