Daniel Molnar and Steve Loughran met up for an interview! Daniel Molnar will be speaking at Berlin Buzzwords on Monday, June 6, 2016. He's a generalist in a tight-knit data team enabling data driven company culture and operations as a data janitor, data analyst and occasional data scientist. A perfect interview partner for our program committee member Steve Loughran who is a member of technical staff at Hortonworks, where he works on leading-edge developments within the Hadoop ecosystem, including service availability, cloud infrastructure integration, and emerging layers in the Hadoop stack.
Hi Daniel, thank you for taking some time for a short interview. We are very pleased to have you as a speaker at Berlin Buzzwords 2016. Your talk takes place on Monday, June 6. What is it about, and why should people attend?
Migrating a full data stack completely relying on AWS services (and a lots of them) to Microsoft Azure does not seem to be an enlightening and funny task. How we managed to do it with open sourcing tools as roadkills and what we learnt about the barebone necessities ending up within the sole body of a Raspberry Pi may shock the absolute believers of distributed computing. An adventure in bash, make and SQL with a detour in Moore's law and falling memory prices.
Why?
This is in production and we're happy with the benchmarks.
Why a raspberry pi?
The Commodore 64 of our times. A computing unit. It's good to see solving a problem with a Raspberry Pi instead of a Hadoop cluster.
How do you compare Azure to AWS? What do you miss from AWS? How much data do you keep in Azure?
It's a different beast. AWS is more ground up, its pieces are dumb as an axe, but it's working. Azure is everything MS ever did moved to the cloud plus more. As it arrived later, it offers more sophisticated tools. We're new to open source, so we made lots of roadkills on the way as open source tools.
Having looked at the other talks — what you hope to learn from the conference for yourself?
Benchmarks from real life. Simple solutions that work in production.
Probability and statistics are become core parts of big system applications. Where would somebody begin to learn (or re-learn) this branch of mathematics?
I am confused why most CS curriculum circles around calculus as we would be engineers building bridges while probability that should be taught to everybody is curiously missing most of the times. I recommend learning in context: Ben Goldacre 'Bad Science', Leonard Mlodinow 'The Drunkard's Walk: How Randomness Rules Our Lives' and Nassim Nicholas Taleb 'Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets'.
Where you see the area of technology you work in going in the next few years?
I'm hoping that we skip to the next part of the hype cycle and things can be more about solving real and important problems.
What are you most enjoying working on at the moment?
I'm writing a book titled 'The Data Janitor Handbook'. I have to make James Mickens proud of me.
Emacs, vi or something else?
I use pico/nano on remote machines since 1994. These days both Textmate and Sublime is in daily use.
Thank you, Daniel!
Don't miss Daniels talk on "Migrating a data stack from AWS to Azure (via Raspberry Pi)" on Monday, June 6, 2016 at 12.20 am @ Frannz Club, Kulturbrauerei.
Photo: What? CC BY-SA 2.0 Véronique Debord-Lazaro