Benefit of Proper Language Processing for Czech Speech Retrieval in the CL-SR Task at CLEF 2006

Ircing, Pavel; Müller, Luděk

doi:10.1007/978-3-540-74999-8_95

Pavel Ircing¹ &
Luděk Müller¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4730))

Included in the following conference series:

Workshop of the Cross-Language Evaluation Forum for European Languages

411 Accesses
11 Citations

Abstract

The paper describes the system built by the team from the University of West Bohemia for participation in the CLEF 2006 CL-SR track. We have decided to concentrate only on the monolingual searching in the Czech test collection and investigate the effect of proper language processing on the retrieval performance. We have employed the Czech morphological analyser and tagger for that purposes. For the actual search system, we have used the classical tf.idf approach with blind relevance feedback as implemented in the Lemur toolkit. The results indicate that a suitable linguistic preprocessing is indeed crucial for the Czech IR performance.

This work was supported by the Grant Agency of the Czech Academy of Sciences project No. 1ET101470416 and the Ministry of Education of the Czech Republic project No. LC536.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Inkpen, D., Alzghool, M., Islam, A.: Using various indexing schemes and multiple translations in the CL-SR task at CLEF 2005. In: Peters, C., Gey, F.C., Gonzalo, J., Mueller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 760–768. Springer, Heidelberg (2006)
Chapter Google Scholar
Oard, D., Wang, J., Jones, G., White, R., Pecina, P., Soergel, D., Huang, X., Shafran, I.: Overview of the CLEF-2006 Cross-Language Speech Retrieval Track. In: Peters, C., Clough, P., Gey, F., Karlgren, J., Magnini, B., Oard, D., de Rijke, M., Stempfhuber, M. (eds.) Evaluation of Multilingual and Multi-modal Information Retrieval - 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, Alicante, Spain. LNCS (to appear, 2007)
Google Scholar
Hajič, J.: Disambiguation of Rich Inflection (Computational Morphology of Czech). Karolinum, Prague (2004)
Google Scholar
Hajič, J., Hladká, B.: Tagging Inflective Languages: Prediction of Morphological Categories for a Rich, Structured Tagset. In: Proceedings of COLING-ACL Conference, Montreal, Canada, pp. 483–490 (1998)
Google Scholar
Hull, D.A.: Stemming Algorithms: A Case Study for Detailed Evaluation. Journal of the American Society of Information Science 47(1), 70–84 (1996)
Article Google Scholar
Carnegie Mellon University and the University of Massachusetts: The Lemur Toolkit for Language Modeling and Information Retrieval (2006), http://www.lemurproject.org/
Zhai, C.: Notes on the Lemur TFIDF model. Note with Lemur 1.9 documentation, School of CS, CMU (2001)
Google Scholar
Liu, B., Oard, D.: One-Sided Measures for Evaluating Ranked Retrieval Effectiveness with Spontaneous Conversational Speech. In: Proceedings of SIGIR 2006, Seattle, Washington, USA, pp. 673–674 (2006)
Google Scholar
Ircing, P., Psutka, J., Radová, V.: Automatic Transcription of Audio Archives for Spoken Document Retrieval. In: Proceedings of Computational Intelligence 2006, San Francisco, USA, pp. 448–452 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

University of West Bohemia, Faculty of Applied Sciences, Dept. of Cybernetics, Univerzitní 8, 306 14 Plzeň, Czech Republic
Pavel Ircing & Luděk Müller

Authors

Pavel Ircing
View author publications
You can also search for this author in PubMed Google Scholar
Luděk Müller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Carol Peters Paul Clough Fredric C. Gey Jussi Karlgren Bernardo Magnini Douglas W. Oard Maarten de Rijke Maximilian Stempfhuber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ircing, P., Müller, L. (2007). Benefit of Proper Language Processing for Czech Speech Retrieval in the CL-SR Task at CLEF 2006. In: Peters, C., et al. Evaluation of Multilingual and Multi-modal Information Retrieval. CLEF 2006. Lecture Notes in Computer Science, vol 4730. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74999-8_95

Download citation

DOI: https://doi.org/10.1007/978-3-540-74999-8_95
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74998-1
Online ISBN: 978-3-540-74999-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics