Embracing diversity: searching over multiple languages

06/12/2017 - 12:20 to 13:00
Moon Lounge
long talk (40 min)

Session abstract: 

Although a lot of online content is written in English there’re tons of non English users out there that still need to retrieve information. When searching, especially for tech related topics, it’s common to compose queries in English; however for such users search results written in their own native language may be preferred.

We’ll see how statistical machine translation tools can help in the above scenario to perform text translation at query time, resulting in an improved recall and precision for the search engine queries.

We’ll be having a look at how cross language information retrieval can be implemented on top of Apache Lucene with the help of Apache Joshua machine translation toolkit.

The audience would gain a better understanding of how to be able to make search queries against a multilingual corpora indexed into Apache Lucene and being able to retrieve all of the relevant search results in different languages.