Abstract
PROSIT (PRObabilistic Sifting of Information Terms) is a novel probabilistic information retrieval system that combines a term-weighting model based on deviation from randomness with information-theoretic query expansion. We report on the application of PROSIT to the Italian monolingual task at CLEF. We experimented with both standard PROSIT and with enhanced versions. In particular, we studied the use of bigrams and coordination level-based retrieval within the PROSIT framework. The main findings of our research are that (i) standard PROSIT was quite effective, with an average precision of 0.5116 on CLEF 2001 queries and 0.5019 on CLEF 2002 queries, (ii) bigrams were useful provided that they were incorporated into the main algorithm, and (iii) the benefits of coordination level-based retrieval were unclear.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Gianni Amati, Claudio Carpineto, and Giovanni Romano. FUB at TREC-10 web track: a probabilistic framework for topic relevance term weighting. In E. M. Voorhees and D.K. Harman, editors, In Proceedings of the 10th Text Retrieval Conference TREC 2001, pages 182-191, Gaithersburg, MD, 2002. NIST Special Pubblication 500-250. 257
Gianni Amati and Cornelis Joost van Rijsbergen. Probabilistic models of information retrieval based on measuring divergence from randomness. ACM Transactions on Information Systems, (to appear), 2002. 257, 259
E. Berenci, C. Carpineto, V. Giannini, S. Mizzaro. Effectiveness of keyword-based display and selection of retrieval results for interactive searches. International Journal On Digital Libraries, 3(3):249–260, 2000. 262
D. Bodoff, A. Kambil. Partial coordination. I. The best of pre-coordination and post-coordination. JASIS, 49(14):1254–1269, 1998. 261
D. Carmel, E. Amitay, M. Herscovici, Y. Maarek, Y. Petruschka, A. Soffer, Juru at TREC-10 — Experiments with index pruning. In E. M. Voorhees and D. K. Harman, editors, In Proceedings of the 10th Text Retrieval Conference TREC 2001, pages 228-236, Gaithersburg, MD, 2002. NIST Special Pubblication 500-250. 261
C. Carpineto, R. De Mori, G. Romano, and B. Bigi. An information theoretic approach to automatic query expansion. ACM Transactions on Information Systems, 19(1):1–27, 2001. 257
C. Clarke, G. Cormack, E. Tudhope, (1997). Relevance ranking for one to three term queries. Proceedings of RIAO’97, 388-400, 1997. 2
J Savoy. Reports on CLEF-2001 experiments. In Working notes of CLEF, Darmstadt, 2001. 258
C.M Tan, Y. F. Wang, C.D Lee. The use of bigrams to enhance text categorization. IP&M, 38(4):529–546,2002. 261
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Amati, G., Carpineto, C., Romano, G. (2003). Italian Monolingual Information Retrieval with PROSIT. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds) Advances in Cross-Language Information Retrieval. CLEF 2002. Lecture Notes in Computer Science, vol 2785. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45237-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-45237-9_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40830-7
Online ISBN: 978-3-540-45237-9
eBook Packages: Springer Book Archive