Advertisement

Estimation of the Collection Parameter of Information Models for IR

  • Parantapa Goswami
  • Eric Gaussier
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7814)

Abstract

In this paper we explore various methods to estimate the collection parameter of the information based models for ad hoc information retrieval. In previous studies, this parameter was set to the average number of documents where the word under consideration appears. We introduce here a fully formalized estimation method for both the log-logistic and the smoothed power law models that leads to improved versions of these models in IR. Furthermore, we show that the previous setting of the collection parameter of the log-logistic model is a special case of the estimated value proposed here.

Keywords

IR Theory Information Models Estimation of Parameters 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Balakrishnan, N., Rao, C.R.: Advances in Survival Analysis, 3rd edn. Handbook of Statistics, vol. 23, ch. 5, p. 96. North Holland (February 2004)Google Scholar
  2. 2.
    Church, K.W., Gale, W.A.: Poisson mixtures. Natural Language Engineering 1, 163–190 (1995)CrossRefGoogle Scholar
  3. 3.
    Clinchant, S., Gaussier, E.: Information-based models for ad hoc ir. In: Proceedings of the 33rd Annual International ACM SIGIR Conference (2010)Google Scholar
  4. 4.
    Clinchant, S., Gaussier, E.: Retrieval constraints and word frequency distributions a log-logistic model for ir. Information Retrieval 14(1), 5–25 (2011)CrossRefGoogle Scholar
  5. 5.
    Fang, H., Tao, T., Zhai, C.: A formal study of information retrieval heuristics. In: Proceedings of the 27th Annual International ACM SIGIR Conference (2004)Google Scholar
  6. 6.
    Johnson, N., Kemp, A., Kotz, S.: Univariate Discrete Distributions. John Wiley & Sons, Inc. (1993)Google Scholar
  7. 7.
    Kaplan, E.L., Meier, P.: Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53(282), 457–481 (1958)MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    Lv, Y., Zhai, C.: A Log-Logistic Model-Based Interpretation of TF Normalization of BM25. In: Baeza-Yates, R., de Vries, A.P., Zaragoza, H., Cambazoglu, B.B., Murdock, V., Lempel, R., Silvestri, F. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 244–255. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  9. 9.
    Ponte, J.M., Bruce Croft, W.: A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference (1998)Google Scholar
  10. 10.
    Robertson, S.E., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval 3(4), 333–389 (2009)CrossRefGoogle Scholar
  11. 11.
    Zhai, C., Lafferty, J.D.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of the 24th Annual International ACM SIGIR Conference (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Parantapa Goswami
    • 1
  • Eric Gaussier
    • 1
  1. 1.Université Joseph Fourier Grenoble 1, LIGGrenobleFrance

Personalised recommendations