Artificial Intelligence Review

, Volume 28, Issue 1, pp 51–68 | Cite as

An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions

Article

Abstract

Machine learning approaches to information retrieval are becoming increasingly widespread. In this paper, we present term-weighting functions reported in the literature that were developed by four separate approaches using genetic programming. Recently, a number of axioms (constraints), from which all good term-weighting schemes should be deduced, have been developed and shown to be theoretically and empirically sound. We introduce a new axiom and empirically validate it by modifying the standard BM25 scheme. Furthermore, we analyse the BM25 scheme and the four learned schemes presented to determine if the schemes are consistent with the axioms. We find that one learned term-weighting approach is consistent with more axioms than any of the other schemes. An empirical evaluation of the schemes on various test collections and query lengths shows that the scheme that is consistent with more of the axioms outperforms the other schemes.

Keywords

Information retrieval Genetic programming Axiomatic constraints 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’00). ACM Press, New York, pp 33–40Google Scholar
  2. Chowdhury A, McCabe MC, Grossman D, Frieder O (2002) Document normalization revisited. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’02). ACM Press, Tampere, pp 381–382Google Scholar
  3. Cummins R, O’Riordan C (2005) An evaluation of evolved term-weighting schemes in information retrieval. In: CIKM, pp 305–306Google Scholar
  4. Cummins R, O’Riordan C (2006) Evolving local and global weighting schemes in information retrieval. Inf Retr 9(3): 311–330CrossRefGoogle Scholar
  5. Cummins R, O’Riordan C (2007a) An axiomatic comparison of learned term-weighting schemes in information retrieval. In: 18th Irish conference on artificial intelligence and cognitive science, AICS 2007, Dublin Institute of TechnologyGoogle Scholar
  6. Cummins R, O’Riordan C (2007b) An axiomatic study of learned term-weighting schemes. In: SIGIR’07 workshop on learning to rank for information retrieval (LR4IR-2007). Amsterdam, Netherlands, pp 11–18Google Scholar
  7. Fan W, Gordon MD, Pathak P (2004) A generic ranking function discovery framework by genetic programming for information retrieval. Inf Process Manage 40(4): 587–602MATHCrossRefGoogle Scholar
  8. Fang H, Zhai C (2005) An exploration of axiomatic approaches to information retrieval. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’05). ACM Press, New York, pp 480–487Google Scholar
  9. Fang H, Tao T, Zhai C (2004) A formal study of information retrieval heuristics. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’04). ACM Press, New York, pp 49–56Google Scholar
  10. He B, Ounis I (2003) A study of parameter tuning for term frequency normalization. In: Proceedings of the twelfth international conference on information and knowledge management (CIKM ’03). ACM Press, New York, pp 10–16Google Scholar
  11. He B, Ounis I (2005) Term frequency normalisation tuning for BM25 and DFR models. In: ECIR, Santiago de Compostela, Spain, pp 200–214Google Scholar
  12. Heaps HS (1978) Information retrieval: computational and theoretical aspects. Academic Press Inc., OrlandoMATHGoogle Scholar
  13. Jung Y, Park H, Du D (2000) A balanced term-weighting scheme for effective document matching. Tech. Rep. TR008, Department of Computer Science, University of Minnesota, MinneapolisGoogle Scholar
  14. Oren N (2002a) Improving the effectiveness of information retrieval with genetic programming. Master’s Thesis, Faculty of Science, University of the Witwatersrand, South AfricaGoogle Scholar
  15. Oren N (2002b) Re-examining tf.idf based information retrieval with genetic programming. In: Proceedings of SAICSIT 2002 conference, pp 224–234Google Scholar
  16. Porter M (1980) An algorithm for suffix stripping. Program 14(3): 130–137Google Scholar
  17. Robertson SE, Walker S, Hancock-Beaulieu M, Gull A, Lau M (1995) Okapi at TREC-3. In: Harman DK (ed) The third Text REtrieval Conference (TREC-3). NIST, GaithersburgGoogle Scholar
  18. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manage 24(5): 513–523CrossRefGoogle Scholar
  19. Trotman A (2005) Learning to rank. Inf Retr 8: 359–381CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  1. 1.Department of Information TechnologyNational University of IrelandGalwayIreland

Personalised recommendations