Big data

Hadoop Adventures At Spotify

Operating a small-size Hadoop cluster is a calm walk in a forest, while working with a large-size Hadoop cluster is a big adventure in a real jungle. The bigger elephant is, the more love and care it demands and we have discovered it in a hard way. In this presentation, I will talk about our real-world Hadoop issues that either broke our cluster or made it very unstable, especially when we were growing very fast from a 60 to 690-node Hadoop cluster. Each issue comes from our JIRA dashboard and is based on facts. We will also expose real graphs, numbers, even our excerpts from emails and conversations. We will honestly share the mistakes that we made, describe the lessons that we learned (including an ashaming one!), and explain the fixes that finally domesticated this love-demanding yellow elephant and its friends.

Adam Kawa
Data Engineer, Analytics & Data Infrastructure
Adam Kawa works as Data Engineer at Spotify, where his main responsibility is to maintain one of the largest Hadoop-YARN clusters in Europe. Every so often, he implements and troubleshoots Python MapReduce, Hive and Pig jobs. He also works as Hadoop instructor at Compendium (Authorized Cloudera Training Partner). Adam is a frequent speaker at Hadoop conferences and Hadoop User Groups meetups. He co-organizes Stockholm and Warsaw Hadoop User Groups. He regularly blogs about the Hadoop ecosystem at

Interested in more ondemand presentations?

Subscribe to ieOnDemand

Read next:

Groupon: What Advice Would You Give To Someone Starting A Career In Big Data?