Skip to main content

From Uncertain Inference to Probability of Relevance for Advanced IR Applications

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2633))

Abstract

Uncertain inference is a probabilistic generalisation of the logical view on databases, ranking documents according to their probabilities that they logically imply the query. For tasks other than ad-hoc retrieval, estimates of the actual probability of relevance are required. In this paper, we investigate mapping functions between these two types of probability. For this purpose, we consider linear and logistic functions. The former have been proposed before, whereas we give a new theoretic justification for the latter. In a series of upper-bound experiments, we compare the goodness of fit of the two models. A second series of experiments investigates the effect on the resulting retrieval quality in the fusion step of distributed retrieval. These experiments show that good estimates of the actual probability of relevance can be achieved, and the logistic model outperforms the linear one. However, retrieval quality for distributed retrieval (only merging, without resource selection) is only slightly improved by using the logistic function.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jamie Callan and Margaret Connell. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97–130, 2001.

    Article  Google Scholar 

  2. W. S. Cooper, F. C. Gey, and D. P. Dabney. Probabilistic retrieval based on staged logistic regression. In N. Belkin, P. Ingwersen, and M. Pejtersen, editors, Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 198–210, New York, 1992. ACM.

    Google Scholar 

  3. S. Fienberg. The Analysis of Cross-Classified Categorial Data. MIT Press, Cambridge, Mass., 2. edition, 1980.

    Google Scholar 

  4. D. H. Freeman. Applied Categorial Data Analysis. Dekker, New York, 1987.

    Google Scholar 

  5. N. Fuhr. A decision-theoretic approach to database selection in networked IR. ACM Transactions on Information Systems, 17(3):229–249, 1999.

    Article  Google Scholar 

  6. N. Fuhr and C. Buckley. A probabilistic learning approach for document indexing. ACM Transactions on Information Systems, 9(3):223–248, 1991.

    Article  Google Scholar 

  7. N. Fuhr and U. Pfeifer. Combining model-oriented and description-oriented approaches for probabilistic indexing. In A. Bookstein, Y. Chiaramella, G. Salton, and V. V. Raghavan, editors, Proceedings of the Fourteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 46–56, New York, 1991. ACM.

    Google Scholar 

  8. F. C. Gey. Inferring probability of relevance using the method of logistic regression. In Bruce W. Croft and C. J. van Rijsbergen, editors, Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 222–231, London, et al., 1994. Springer-Verlag.

    Google Scholar 

  9. L. Gravano and H. Garcia-Molina. Generalizing GIOSS to vector-space databases and broker hierarchies. In U. Dayal, P. M. D. Gray, and S. Nishio, editors, VLDB’95, Proceedings of 21th International Conference on Very Large Data Bases, pages 78–89, Los Altos, California, 1995. Morgan Kaufman.

    Google Scholar 

  10. L. Gravano, H. Garcia-Molina, and A. Tomasic. The effectiveness of GlOSS for the text database discovery problem. In R. T. Snodgrass and M. Winslett, editors, Proceedings of the 1994 ACM SIGMOD. International Conference on Management of Data., pages 126–137, New York, 1994. ACM.

    Google Scholar 

  11. D. Harman, editor. The Second Text REtrieval Conference (TREC-2), Gaithersburg, Md. 20899, 1994. National Institute of Standards and Technology.

    Google Scholar 

  12. Henrik Nottelmann and Norbert Fuhr. MIND resource selection framework and methods. Technical report, Universität Dortmund, February 2002. http://ls6-www.cs.uni-dortmund.de/ir/projects/mind/d31.pdf.

  13. Michael Pollmann. Entwicklung und untersuchung von verbesserten probabilistischen indexierungsfunktionen für freitext-indexierung. Diploma thesis, Universität Dortmund, Fachbereich Informatik, 1993.

    Google Scholar 

  14. M. F. Porter. An algorithm for suffix stripping. Program, 14:130–137, July 1980.

    Google Scholar 

  15. William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery, editors. Nested Relations and Complex Objects in Databases. Cambridge University Press, 1992.

    Google Scholar 

  16. S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, 33:294–304, 1977.

    Article  Google Scholar 

  17. Stephen E. Robertson, Steve Walker, Micheline Hancock-Beaulieu, Aarron Gull, and Marianna Lau. Okapi at TREC. In Text REtrieval Conference, pages 21–30, 1992.

    Google Scholar 

  18. H. R. Turtle and W. B. Croft. Efficient probabilistic inference for text retrieval. In Proceedings RIAO 91, pages 644–661, Paris, France, 1991. Centre de Hautes Etudes Internationales d’Informatique Documentaire (CID).

    Google Scholar 

  19. C. J. van Rijsbergen. A non-classical logic for information retrieval. The Computer Journal, 29(6):481–485, 1986.

    Article  MATH  Google Scholar 

  20. C. J. van Rijsbergen. Probabilistic retrieval revisited. The Computer Journal, 35(3): 291–298, 1992.

    Article  MATH  Google Scholar 

  21. S. K. M. Wong and Y. Y. Yao. On modeling information retrieval with probabilistic inference. ACM Transactions on Information Systems, 13(1):38–68, 1995.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nottelmann, H., Fuhr, N. (2003). From Uncertain Inference to Probability of Relevance for Advanced IR Applications. In: Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2003. Lecture Notes in Computer Science, vol 2633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36618-0_17

Download citation

  • DOI: https://doi.org/10.1007/3-540-36618-0_17

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-01274-0

  • Online ISBN: 978-3-540-36618-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics