Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions

  • 108 Accesses

  • 10 Citations

Abstract

Machine learning approaches to information retrieval are becoming increasingly widespread. In this paper, we present term-weighting functions reported in the literature that were developed by four separate approaches using genetic programming. Recently, a number of axioms (constraints), from which all good term-weighting schemes should be deduced, have been developed and shown to be theoretically and empirically sound. We introduce a new axiom and empirically validate it by modifying the standard BM25 scheme. Furthermore, we analyse the BM25 scheme and the four learned schemes presented to determine if the schemes are consistent with the axioms. We find that one learned term-weighting approach is consistent with more axioms than any of the other schemes. An empirical evaluation of the schemes on various test collections and query lengths shows that the scheme that is consistent with more of the axioms outperforms the other schemes.

This is a preview of subscription content, log in to check access.

References

  1. Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’00). ACM Press, New York, pp 33–40

  2. Chowdhury A, McCabe MC, Grossman D, Frieder O (2002) Document normalization revisited. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’02). ACM Press, Tampere, pp 381–382

  3. Cummins R, O’Riordan C (2005) An evaluation of evolved term-weighting schemes in information retrieval. In: CIKM, pp 305–306

  4. Cummins R, O’Riordan C (2006) Evolving local and global weighting schemes in information retrieval. Inf Retr 9(3): 311–330

  5. Cummins R, O’Riordan C (2007a) An axiomatic comparison of learned term-weighting schemes in information retrieval. In: 18th Irish conference on artificial intelligence and cognitive science, AICS 2007, Dublin Institute of Technology

  6. Cummins R, O’Riordan C (2007b) An axiomatic study of learned term-weighting schemes. In: SIGIR’07 workshop on learning to rank for information retrieval (LR4IR-2007). Amsterdam, Netherlands, pp 11–18

  7. Fan W, Gordon MD, Pathak P (2004) A generic ranking function discovery framework by genetic programming for information retrieval. Inf Process Manage 40(4): 587–602

  8. Fang H, Zhai C (2005) An exploration of axiomatic approaches to information retrieval. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’05). ACM Press, New York, pp 480–487

  9. Fang H, Tao T, Zhai C (2004) A formal study of information retrieval heuristics. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’04). ACM Press, New York, pp 49–56

  10. He B, Ounis I (2003) A study of parameter tuning for term frequency normalization. In: Proceedings of the twelfth international conference on information and knowledge management (CIKM ’03). ACM Press, New York, pp 10–16

  11. He B, Ounis I (2005) Term frequency normalisation tuning for BM25 and DFR models. In: ECIR, Santiago de Compostela, Spain, pp 200–214

  12. Heaps HS (1978) Information retrieval: computational and theoretical aspects. Academic Press Inc., Orlando

  13. Jung Y, Park H, Du D (2000) A balanced term-weighting scheme for effective document matching. Tech. Rep. TR008, Department of Computer Science, University of Minnesota, Minneapolis

  14. Oren N (2002a) Improving the effectiveness of information retrieval with genetic programming. Master’s Thesis, Faculty of Science, University of the Witwatersrand, South Africa

  15. Oren N (2002b) Re-examining tf.idf based information retrieval with genetic programming. In: Proceedings of SAICSIT 2002 conference, pp 224–234

  16. Porter M (1980) An algorithm for suffix stripping. Program 14(3): 130–137

  17. Robertson SE, Walker S, Hancock-Beaulieu M, Gull A, Lau M (1995) Okapi at TREC-3. In: Harman DK (ed) The third Text REtrieval Conference (TREC-3). NIST, Gaithersburg

  18. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manage 24(5): 513–523

  19. Trotman A (2005) Learning to rank. Inf Retr 8: 359–381

Download references

Author information

Correspondence to Ronan Cummins.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Cummins, R., O’Riordan, C. An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions. Artif Intell Rev 28, 51–68 (2007). https://doi.org/10.1007/s10462-008-9074-5

Download citation

Keywords

  • Information retrieval
  • Genetic programming
  • Axiomatic constraints