Abstract
Uncertain inference is a probabilistic generalisation of the logical view on databases, ranking documents according to their probabilities that they logically imply the query. For tasks other than ad-hoc retrieval, estimates of the actual probability of relevance are required. In this paper, we investigate mapping functions between these two types of probability. For this purpose, we consider linear and logistic functions. The former have been proposed before, whereas we give a new theoretic justification for the latter. In a series of upper-bound experiments, we compare the goodness of fit of the two models. A second series of experiments investigates the effect on the resulting retrieval quality in the fusion step of distributed retrieval. These experiments show that good estimates of the actual probability of relevance can be achieved, and the logistic model outperforms the linear one. However, retrieval quality for distributed retrieval (only merging, without resource selection) is only slightly improved by using the logistic function.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Jamie Callan and Margaret Connell. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97–130, 2001.
W. S. Cooper, F. C. Gey, and D. P. Dabney. Probabilistic retrieval based on staged logistic regression. In N. Belkin, P. Ingwersen, and M. Pejtersen, editors, Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 198–210, New York, 1992. ACM.
S. Fienberg. The Analysis of Cross-Classified Categorial Data. MIT Press, Cambridge, Mass., 2. edition, 1980.
D. H. Freeman. Applied Categorial Data Analysis. Dekker, New York, 1987.
N. Fuhr. A decision-theoretic approach to database selection in networked IR. ACM Transactions on Information Systems, 17(3):229–249, 1999.
N. Fuhr and C. Buckley. A probabilistic learning approach for document indexing. ACM Transactions on Information Systems, 9(3):223–248, 1991.
N. Fuhr and U. Pfeifer. Combining model-oriented and description-oriented approaches for probabilistic indexing. In A. Bookstein, Y. Chiaramella, G. Salton, and V. V. Raghavan, editors, Proceedings of the Fourteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 46–56, New York, 1991. ACM.
F. C. Gey. Inferring probability of relevance using the method of logistic regression. In Bruce W. Croft and C. J. van Rijsbergen, editors, Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 222–231, London, et al., 1994. Springer-Verlag.
L. Gravano and H. Garcia-Molina. Generalizing GIOSS to vector-space databases and broker hierarchies. In U. Dayal, P. M. D. Gray, and S. Nishio, editors, VLDB’95, Proceedings of 21th International Conference on Very Large Data Bases, pages 78–89, Los Altos, California, 1995. Morgan Kaufman.
L. Gravano, H. Garcia-Molina, and A. Tomasic. The effectiveness of GlOSS for the text database discovery problem. In R. T. Snodgrass and M. Winslett, editors, Proceedings of the 1994 ACM SIGMOD. International Conference on Management of Data., pages 126–137, New York, 1994. ACM.
D. Harman, editor. The Second Text REtrieval Conference (TREC-2), Gaithersburg, Md. 20899, 1994. National Institute of Standards and Technology.
Henrik Nottelmann and Norbert Fuhr. MIND resource selection framework and methods. Technical report, Universität Dortmund, February 2002. http://ls6-www.cs.uni-dortmund.de/ir/projects/mind/d31.pdf.
Michael Pollmann. Entwicklung und untersuchung von verbesserten probabilistischen indexierungsfunktionen für freitext-indexierung. Diploma thesis, Universität Dortmund, Fachbereich Informatik, 1993.
M. F. Porter. An algorithm for suffix stripping. Program, 14:130–137, July 1980.
William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery, editors. Nested Relations and Complex Objects in Databases. Cambridge University Press, 1992.
S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, 33:294–304, 1977.
Stephen E. Robertson, Steve Walker, Micheline Hancock-Beaulieu, Aarron Gull, and Marianna Lau. Okapi at TREC. In Text REtrieval Conference, pages 21–30, 1992.
H. R. Turtle and W. B. Croft. Efficient probabilistic inference for text retrieval. In Proceedings RIAO 91, pages 644–661, Paris, France, 1991. Centre de Hautes Etudes Internationales d’Informatique Documentaire (CID).
C. J. van Rijsbergen. A non-classical logic for information retrieval. The Computer Journal, 29(6):481–485, 1986.
C. J. van Rijsbergen. Probabilistic retrieval revisited. The Computer Journal, 35(3): 291–298, 1992.
S. K. M. Wong and Y. Y. Yao. On modeling information retrieval with probabilistic inference. ACM Transactions on Information Systems, 13(1):38–68, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nottelmann, H., Fuhr, N. (2003). From Uncertain Inference to Probability of Relevance for Advanced IR Applications. In: Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2003. Lecture Notes in Computer Science, vol 2633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36618-0_17
Download citation
DOI: https://doi.org/10.1007/3-540-36618-0_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-01274-0
Online ISBN: 978-3-540-36618-8
eBook Packages: Springer Book Archive