Charles University at CLEF 2007 Ad-Hoc Track
In this paper we describe retrieval experiments performed at Charles University in Prague for participation in the CLEF 2007 Ad-Hoc track. We focused on the Czech monolingual task and used the LEMUR toolkit as the retrieval system. Our results demonstrate that for Czech as a highly inflectional language, lemmatization significantly improves retrieval results and manually created queries are only slightly better than queries automatically generated from topic specifications.
KeywordsRetrieval Model Stopword List Query Construction Base Language Model Base Search Engine
Unable to display preview. Download preview PDF.
- 1.Lemur, http://www.lemurproject.org/
- 3.Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: A language-model based search engine for complex queries (extended version). Technical Report IR-407, CIIR, UMass (2005)Google Scholar
- 4.Lavrenko, V., Croft, W.B.: Relevance based language models. In: SIGIR 2001: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM Press, New York (2001)Google Scholar
- 5.Hajič, J., Vidová-Hladká, B.: Tagging Inflective Languages: Prediction of Morphological Categories for a Rich, Structured Tagset. In: Proceedings of the Conference COLING - ACL 1998, Montreal, Canada (1998)Google Scholar
- 6.Hajič, J.: Disambiguation of Rich Inflection (Computational Morphology of Czech), Nakladatelství Karolinum, Prague (2004)Google Scholar