Skip to main content

Bulgarian and Hungarian Experiments with Hummingbird SearchServerTM at CLEF 2005

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4022))

Abstract

Hummingbird participated in the Bulgarian and Hungarian monolingual information retrieval tasks of the Ad-Hoc Track of the Cross-Language Evaluation Forum (CLEF) 2005. In the ad hoc retrieval tasks, the system was given 50 natural language queries, and the goal was to find all of the relevant documents (with high precision) in a particular document set. We conducted diagnostic experiments with different techniques for matching word variations and handling stopwords. We found that the experimental stemmers significantly increased mean average precision for both languages. Analysis of individual topics found that the algorithmic Bulgarian and Hungarian stemmers encountered some unanticipated stopword collisions. A comparison to an experimental 4-gram technique suggested that Hungarian stemming would further benefit from decompounding.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AltaVista’s Babel Fish Translation Service, http://babelfish.altavista.com/tr

  2. Cross-Language Evaluation Forum web site, http://www.clef-campaign.org/

  3. Hodgson, A.: Converting the Fulcrum Search Engine to Unicode. In: Sixteenth International Unicode Conference (2000)

    Google Scholar 

  4. McNamee, P., Mayfield, J.: JHU/APL Experiments in Tokenization and Non-word Translation. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 85–97. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  5. MTA SZTAKI: English-Hungarian, Hungarian-English Online Dictionary, http://dict.sztaki.hu/english-hungarian

  6. NTCIR (NII-NACSIS Test Collection for IR Systems) Home Page, http://research.nii.ac.jp/~ntcadm/index-en.html

  7. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M.: Okapi at TREC-3. In: Proceedings of TREC-3 (1995)

    Google Scholar 

  8. Savoy, J.: CLEF and Multilingual information retrieval resource page (visited May 2005), http://www.unine.ch/info/clef/

  9. Text REtrieval Conference (TREC) Home Page, http://trec.nist.gov/

  10. Tomlinson, S.: Finnish, Portuguese and Russian Retrieval with Hummingbird SearchServerTM at CLEF 2004. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 221–232. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tomlinson, S. (2006). Bulgarian and Hungarian Experiments with Hummingbird SearchServerTM at CLEF 2005. In: Peters, C., et al. Accessing Multilingual Information Repositories. CLEF 2005. Lecture Notes in Computer Science, vol 4022. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11878773_22

Download citation

  • DOI: https://doi.org/10.1007/11878773_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45697-1

  • Online ISBN: 978-3-540-45700-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics