High quality, low maintenance content tagging @ ZEIT Online

05/27/2014 - 14:50 to 15:10

Kesselhaus

short talk (20 min)

Beginner

Session abstract:

On ApachCon 2012 we presented a system for content tagging at ZEIT Online. This talk presents recent work on improving the ranking of the tags by building exclusively on the data (complete ZEIT Online News Archive). Tags are either named entities, statistically relevant terms/phrases occurring in the news articles or topics delivered from text classification. The system uses open-source technology such as Lucene and Gate to produce these tags. We will present the requirements and expectations of an online editorial office on such a system and ideas on how to meet those expectations. All tags are ranked based on trend analysis, typical contexts and overall TF-IDF scores, all computed on the whole archive. No manual maintenance of thesauri or ontology resources is necessary.

Video:

Breno Faria & Christoph Goller at #bbuzz 2014

Slide:

breno_faria_christoph_gollerbbuzz14.pdf

Berlin Buzzwords

High quality, low maintenance content tagging @ ZEIT Online

Session abstract:

Video:

Breno Faria & Christoph Goller at #bbuzz 2014

Slide:

breno_faria_christoph_gollerbbuzz14.pdf

Partners

Gold Partner

Past conferences