Skip to main content

Benefit of Proper Language Processing for Czech Speech Retrieval in the CL-SR Task at CLEF 2006

  • Conference paper
Evaluation of Multilingual and Multi-modal Information Retrieval (CLEF 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4730))

Included in the following conference series:

Abstract

The paper describes the system built by the team from the University of West Bohemia for participation in the CLEF 2006 CL-SR track. We have decided to concentrate only on the monolingual searching in the Czech test collection and investigate the effect of proper language processing on the retrieval performance. We have employed the Czech morphological analyser and tagger for that purposes. For the actual search system, we have used the classical tf.idf approach with blind relevance feedback as implemented in the Lemur toolkit. The results indicate that a suitable linguistic preprocessing is indeed crucial for the Czech IR performance.

This work was supported by the Grant Agency of the Czech Academy of Sciences project No. 1ET101470416 and the Ministry of Education of the Czech Republic project No. LC536.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Inkpen, D., Alzghool, M., Islam, A.: Using various indexing schemes and multiple translations in the CL-SR task at CLEF 2005. In: Peters, C., Gey, F.C., Gonzalo, J., Mueller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 760–768. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Oard, D., Wang, J., Jones, G., White, R., Pecina, P., Soergel, D., Huang, X., Shafran, I.: Overview of the CLEF-2006 Cross-Language Speech Retrieval Track. In: Peters, C., Clough, P., Gey, F., Karlgren, J., Magnini, B., Oard, D., de Rijke, M., Stempfhuber, M. (eds.) Evaluation of Multilingual and Multi-modal Information Retrieval - 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, Alicante, Spain. LNCS (to appear, 2007)

    Google Scholar 

  3. Hajič, J.: Disambiguation of Rich Inflection (Computational Morphology of Czech). Karolinum, Prague (2004)

    Google Scholar 

  4. Hajič, J., Hladká, B.: Tagging Inflective Languages: Prediction of Morphological Categories for a Rich, Structured Tagset. In: Proceedings of COLING-ACL Conference, Montreal, Canada, pp. 483–490 (1998)

    Google Scholar 

  5. Hull, D.A.: Stemming Algorithms: A Case Study for Detailed Evaluation. Journal of the American Society of Information Science 47(1), 70–84 (1996)

    Article  Google Scholar 

  6. Carnegie Mellon University and the University of Massachusetts: The Lemur Toolkit for Language Modeling and Information Retrieval (2006), http://www.lemurproject.org/

  7. Zhai, C.: Notes on the Lemur TFIDF model. Note with Lemur 1.9 documentation, School of CS, CMU (2001)

    Google Scholar 

  8. Liu, B., Oard, D.: One-Sided Measures for Evaluating Ranked Retrieval Effectiveness with Spontaneous Conversational Speech. In: Proceedings of SIGIR 2006, Seattle, Washington, USA, pp. 673–674 (2006)

    Google Scholar 

  9. Ircing, P., Psutka, J., Radová, V.: Automatic Transcription of Audio Archives for Spoken Document Retrieval. In: Proceedings of Computational Intelligence 2006, San Francisco, USA, pp. 448–452 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Carol Peters Paul Clough Fredric C. Gey Jussi Karlgren Bernardo Magnini Douglas W. Oard Maarten de Rijke Maximilian Stempfhuber

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ircing, P., Müller, L. (2007). Benefit of Proper Language Processing for Czech Speech Retrieval in the CL-SR Task at CLEF 2006. In: Peters, C., et al. Evaluation of Multilingual and Multi-modal Information Retrieval. CLEF 2006. Lecture Notes in Computer Science, vol 4730. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74999-8_95

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74999-8_95

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74998-1

  • Online ISBN: 978-3-540-74999-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics