Skip to main content

Italian Monolingual Information Retrieval with PROSIT

  • Conference paper
Advances in Cross-Language Information Retrieval (CLEF 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2785))

Included in the following conference series:

Abstract

PROSIT (PRObabilistic Sifting of Information Terms) is a novel probabilistic information retrieval system that combines a term-weighting model based on deviation from randomness with information-theoretic query expansion. We report on the application of PROSIT to the Italian monolingual task at CLEF. We experimented with both standard PROSIT and with enhanced versions. In particular, we studied the use of bigrams and coordination level-based retrieval within the PROSIT framework. The main findings of our research are that (i) standard PROSIT was quite effective, with an average precision of 0.5116 on CLEF 2001 queries and 0.5019 on CLEF 2002 queries, (ii) bigrams were useful provided that they were incorporated into the main algorithm, and (iii) the benefits of coordination level-based retrieval were unclear.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gianni Amati, Claudio Carpineto, and Giovanni Romano. FUB at TREC-10 web track: a probabilistic framework for topic relevance term weighting. In E. M. Voorhees and D.K. Harman, editors, In Proceedings of the 10th Text Retrieval Conference TREC 2001, pages 182-191, Gaithersburg, MD, 2002. NIST Special Pubblication 500-250. 257

    Google Scholar 

  2. Gianni Amati and Cornelis Joost van Rijsbergen. Probabilistic models of information retrieval based on measuring divergence from randomness. ACM Transactions on Information Systems, (to appear), 2002. 257, 259

    Google Scholar 

  3. E. Berenci, C. Carpineto, V. Giannini, S. Mizzaro. Effectiveness of keyword-based display and selection of retrieval results for interactive searches. International Journal On Digital Libraries, 3(3):249–260, 2000. 262

    Article  Google Scholar 

  4. D. Bodoff, A. Kambil. Partial coordination. I. The best of pre-coordination and post-coordination. JASIS, 49(14):1254–1269, 1998. 261

    Article  Google Scholar 

  5. D. Carmel, E. Amitay, M. Herscovici, Y. Maarek, Y. Petruschka, A. Soffer, Juru at TREC-10 — Experiments with index pruning. In E. M. Voorhees and D. K. Harman, editors, In Proceedings of the 10th Text Retrieval Conference TREC 2001, pages 228-236, Gaithersburg, MD, 2002. NIST Special Pubblication 500-250. 261

    Google Scholar 

  6. C. Carpineto, R. De Mori, G. Romano, and B. Bigi. An information theoretic approach to automatic query expansion. ACM Transactions on Information Systems, 19(1):1–27, 2001. 257

    Article  Google Scholar 

  7. C. Clarke, G. Cormack, E. Tudhope, (1997). Relevance ranking for one to three term queries. Proceedings of RIAO’97, 388-400, 1997. 2

    Google Scholar 

  8. J Savoy. Reports on CLEF-2001 experiments. In Working notes of CLEF, Darmstadt, 2001. 258

    Google Scholar 

  9. C.M Tan, Y. F. Wang, C.D Lee. The use of bigrams to enhance text categorization. IP&M, 38(4):529–546,2002. 261

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Amati, G., Carpineto, C., Romano, G. (2003). Italian Monolingual Information Retrieval with PROSIT. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds) Advances in Cross-Language Information Retrieval. CLEF 2002. Lecture Notes in Computer Science, vol 2785. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45237-9_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45237-9_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40830-7

  • Online ISBN: 978-3-540-45237-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics