Every week we are introducing new speakers which will be on stage at #bbuzz 2015. Thanks to our program committee we can present part of our new eclectic program. Presentations range from beginner friendly introductions on hot data analysis topics to in-depth technical presentations about scalable architectures. The conference presents more than 50 talks by international speakers specific to the three tags "search", "store" and "scale".
Fast Decompression Lucene Codec
Ivan Mamontov & Mikhail Khludnev
Search – 40min
Ivan is a search engineer at GridDynamics and interested in low-level system programming, software platforms design and architecture. For the last few years, he has worked on eCommerce search platforms extending Lucene and Solr. Mikhail builds search and navigation engines for eCommerce, and mades contribution into Lucene and Solr from time to time.
Sorted lists of integers are commonly used in Lucene's implementation of inverted index. Those lists are often compressed in-memory as a trade-off between memory footprint and access speed and CPU utilization. Thus, encoding and, more important, decoding of these lists consumes significant CPU time. In this talk Ivan and Mikhail will show their prototype of Lucene Codec which uses a simple C library for compressing lists of integers using binary packing and SIMD instructions, which significantly improves decoding throughput.
Approaching Join Index for Lucene
Mikhail Khludnev
Search – 40min
Lucene works great with independent text documents, but real life problems often require to handle relations between documents. Aside of several workarounds, like term encodings, field collapsing or term positions, we have two mainstream approaches to handle document relations: join and block-join. Both have their downsides. Join lacks performance, while block-join makes is really expensive to handle index updates, since it requires to wipe a whole block of related documents.
This session presents an attempt to apply join index, borrowed from RDBMS world, for addressing drawbacks of the both join approaches currently present in Lucene. Mikhail will look into the idea per se, possible implementation approaches, and review the benchmarking results.