Session abstract:
At the beginning of the year 2017, the Apache Lucene team decided to focus on releasing Apache Lucene 7. Around Berlin Buzzwords, the new version will be available for testing.
This talk will present the new and changed features of Lucene 7: As TF-IDF is no longer the default, several query special cases like query normalization and the so-called "coord factor" were removed. Those were workaround for problems that are specific to TF-IDF like not strong enough term frequency saturation, but can be completely ignored with other ranking functions like BM25. The user has to be prepared that scores may differ and the absolute values of scores are meaningless, breaking applications. The problem with query normalization and coordination factors was correct query rewriting, but now many more optimizations can be done to handle optional, filtered, and mandatory query clauses: Lucene 7 will be faster if it finds duplicate clauses. The talk will also present recent Lucene 6 features like graph token streams and how they are used in Lucene 7.
The talk will also present future plans to support the Java 9 module system and the current state of Java 9 support inside Apache Lucene, because it is expected that Lucene/Solr and Elasticsearch users will one of the first communities that will migrate to Java 9, because recent hotspot optimizations will execute queries and allow doc values access with much higher performance.