Every week we are introducing new speakers which will be on stage at #bbuzz 2015. Thanks to our program committee we can present part of our new eclectic program. Presentations range from beginner friendly introductions on hot data analysis topics to in-depth technical presentations about scalable architectures. The conference presents more than 50 talks by international speakers specific to the three tags "search", "store" and "scale".
Using Random Projections to Make Sense of High-Dimensional Big Data
Stefan Savev & Michael Kleen
Search – 40 min
Stefan used to work for Microsoft; at the moment, he works as a Senior Software Engineer at ResearchGate focussing on developing recommendation systems. He loves to discuss algorithms, search engine implementation and machine learning. Michael works as a Software Engineer at ResearchGate on real-time data processing for instance.
Together they will invite us on a journey into the rich area of random projections via many graphical illustrations and intuitive examples. They will make clear that a moderate number of simple one dimensional projections is enough to answer hard questions about the data via techniques such as visualization, classification and clustering. Their talk will nvestigate how and why random projections work and where they break and discuss several interesting properties of high dimensional data.
Hive on Spark
Szehon Ho
Scale - 40 min
Szehon Ho is a software engineer in Cloudera and an Apache Hive PMC member, based in Palo Alto. Prior to this, he was a principal software engineer in Informatica, gaining experience in enterprise software development and the field of data integration.
Apache Hive is a popular SQL interface for batch processing and ETL using Apache Hadoop. This talk and demo will circle mainly around the new executive engine option Spark for Hive, touching upon Hive user experience, streamlining, operational management for Spark shops and the comparison of MapRededuce and Spark.