From Uncertain Inference to Probability of Relevance for Advanced IR Applications

Nottelmann, Henrik; Fuhr, Norbert

doi:10.1007/3-540-36618-0_17

From Uncertain Inference to Probability of Relevance for Advanced IR Applications

Henrik Nottelmann⁵ &
Norbert Fuhr⁵

Conference paper
First Online: 01 January 2003

1255 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2633))

Abstract

Uncertain inference is a probabilistic generalisation of the logical view on databases, ranking documents according to their probabilities that they logically imply the query. For tasks other than ad-hoc retrieval, estimates of the actual probability of relevance are required. In this paper, we investigate mapping functions between these two types of probability. For this purpose, we consider linear and logistic functions. The former have been proposed before, whereas we give a new theoretic justification for the latter. In a series of upper-bound experiments, we compare the goodness of fit of the two models. A second series of experiments investigates the effect on the resulting retrieval quality in the fusion step of distributed retrieval. These experiments show that good estimates of the actual probability of relevance can be achieved, and the logistic model outperforms the linear one. However, retrieval quality for distributed retrieval (only merging, without resource selection) is only slightly improved by using the logistic function.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jamie Callan and Margaret Connell. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97–130, 2001.
Article Google Scholar
W. S. Cooper, F. C. Gey, and D. P. Dabney. Probabilistic retrieval based on staged logistic regression. In N. Belkin, P. Ingwersen, and M. Pejtersen, editors, Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 198–210, New York, 1992. ACM.
Google Scholar
S. Fienberg. The Analysis of Cross-Classified Categorial Data. MIT Press, Cambridge, Mass., 2. edition, 1980.
Google Scholar
D. H. Freeman. Applied Categorial Data Analysis. Dekker, New York, 1987.
Google Scholar
N. Fuhr. A decision-theoretic approach to database selection in networked IR. ACM Transactions on Information Systems, 17(3):229–249, 1999.
Article Google Scholar
N. Fuhr and C. Buckley. A probabilistic learning approach for document indexing. ACM Transactions on Information Systems, 9(3):223–248, 1991.
Article Google Scholar
N. Fuhr and U. Pfeifer. Combining model-oriented and description-oriented approaches for probabilistic indexing. In A. Bookstein, Y. Chiaramella, G. Salton, and V. V. Raghavan, editors, Proceedings of the Fourteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 46–56, New York, 1991. ACM.
Google Scholar
F. C. Gey. Inferring probability of relevance using the method of logistic regression. In Bruce W. Croft and C. J. van Rijsbergen, editors, Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 222–231, London, et al., 1994. Springer-Verlag.
Google Scholar
L. Gravano and H. Garcia-Molina. Generalizing GIOSS to vector-space databases and broker hierarchies. In U. Dayal, P. M. D. Gray, and S. Nishio, editors, VLDB’95, Proceedings of 21th International Conference on Very Large Data Bases, pages 78–89, Los Altos, California, 1995. Morgan Kaufman.
Google Scholar
L. Gravano, H. Garcia-Molina, and A. Tomasic. The effectiveness of GlOSS for the text database discovery problem. In R. T. Snodgrass and M. Winslett, editors, Proceedings of the 1994 ACM SIGMOD. International Conference on Management of Data., pages 126–137, New York, 1994. ACM.
Google Scholar
D. Harman, editor. The Second Text REtrieval Conference (TREC-2), Gaithersburg, Md. 20899, 1994. National Institute of Standards and Technology.
Google Scholar
Henrik Nottelmann and Norbert Fuhr. MIND resource selection framework and methods. Technical report, Universität Dortmund, February 2002. http://ls6-www.cs.uni-dortmund.de/ir/projects/mind/d31.pdf.
Michael Pollmann. Entwicklung und untersuchung von verbesserten probabilistischen indexierungsfunktionen für freitext-indexierung. Diploma thesis, Universität Dortmund, Fachbereich Informatik, 1993.
Google Scholar
M. F. Porter. An algorithm for suffix stripping. Program, 14:130–137, July 1980.
Google Scholar
William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery, editors. Nested Relations and Complex Objects in Databases. Cambridge University Press, 1992.
Google Scholar
S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, 33:294–304, 1977.
Article Google Scholar
Stephen E. Robertson, Steve Walker, Micheline Hancock-Beaulieu, Aarron Gull, and Marianna Lau. Okapi at TREC. In Text REtrieval Conference, pages 21–30, 1992.
Google Scholar
H. R. Turtle and W. B. Croft. Efficient probabilistic inference for text retrieval. In Proceedings RIAO 91, pages 644–661, Paris, France, 1991. Centre de Hautes Etudes Internationales d’Informatique Documentaire (CID).
Google Scholar
C. J. van Rijsbergen. A non-classical logic for information retrieval. The Computer Journal, 29(6):481–485, 1986.
Article MATH Google Scholar
C. J. van Rijsbergen. Probabilistic retrieval revisited. The Computer Journal, 35(3): 291–298, 1992.
Article MATH Google Scholar
S. K. M. Wong and Y. Y. Yao. On modeling information retrieval with probabilistic inference. ACM Transactions on Information Systems, 13(1):38–68, 1995.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Informatics and Interactive Systems, University of Duisburg-Essen, 47048, Duisburg, Germany
Henrik Nottelmann & Norbert Fuhr

Authors

Henrik Nottelmann
View author publications
You can also search for this author in PubMed Google Scholar
Norbert Fuhr
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Via Giuseppe Moruzzi, 1, 56124, Pisa, Italy
Fabrizio Sebastiani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nottelmann, H., Fuhr, N. (2003). From Uncertain Inference to Probability of Relevance for Advanced IR Applications. In: Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2003. Lecture Notes in Computer Science, vol 2633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36618-0_17

Download citation

DOI: https://doi.org/10.1007/3-540-36618-0_17
Published: 15 April 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-01274-0
Online ISBN: 978-3-540-36618-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics