Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Evolved term-weighting schemes in Information Retrieval: an analysis of the solution space

  • 90 Accesses

  • 15 Citations

Abstract

Evolutionary computation techniques are increasingly being applied to problems within Information Retrieval (IR). Genetic programming (GP) has previously been used with some success to evolve term-weighting schemes in IR. However, one fundamental problem with the solutions generated by this stochastic, non-deterministic process, is that they are often difficult to analyse. In this paper, we introduce two different distance measures between the phenotypes (ranked lists) of the solutions (term-weighting schemes) returned by a GP process. Using these distance measures, we develop trees which show how different solutions are clustered in the solution space. We show, using this framework, that our evolved solutions lie in a different part of the solution space than two of the best benchmark term-weighting schemes available.

This is a preview of subscription content, log in to check access.

References

  1. Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. In: SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval. New York, NY, USA, ACM Press, pp 33–40

  2. Carterette B, Allan J (2005) Incremental test collections. In: CIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management. New York, NY, USA, ACM Press, pp 680–687

  3. Choi J-H, Jung H-Y, Kim H-S and Cho H-G (2000). PhyloDraw: a phylogenetic tree drawing system. Bioinformatics 16(11): 1056–1058

  4. Chowdhury A, McCabe MC, Grossman D, Frieder O (2002) Document normalization revisited. In: SIGIR ’02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA, ACM Press, pp 381–382

  5. Cummins R and O’Riordan C (2006). An analysis of the solution space for genetically programmed term-weighting schemes in information retrieval. In: Bell, PMD and Sage, P (eds) 17th artificial intelligence and cognitive science conference (AICS 2006), pp. Queen’s University, Belfast Northern Ireland

  6. Cummins R and O’Riordan C (2006). Evolving local and global weighting schemes in information retrieval. Inform Retrieval 9(3): 311–330

  7. Cummins R and O’Riordan C (2006). A framework for the study of evolved term-weighting schemes in information retrieval. In: Stein, B and Kao, O (eds) TIR-06 text based information retrieval, workshop, ECAI 2006, pp. Riva del Garda, Italy

  8. Fan W, Gordon MD and Pathak P (2004). A generic ranking function discovery framework by genetic programming for information retrieval. Inform Proces Manage 40(4): 587–602

  9. Fang H, Zhai C (2005) An exploration of axiomatic approaches to information retrieval. In: SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval. New York, NY, USA, ACM Press, pp 480–487

  10. HE B, Ounis I (2003) A study of parameter tuning for term frequency normalization. In: CIKM ’03: Proceedings of the twelfth international conference on Information and knowledge management. New York, NY, USA, ACM Press, pp 10–16

  11. Jones KS, Walker S and Robertson SE (2000). A probabilistic model of information retrieval: development and comparative experiments. Inf Process Manage 36(6): 779–808

  12. Koza JR (1992). Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA

  13. Luke S (2001) When short runs beat long runs. In: Proceedings of the genetic and evolutionary computation conference (GECCO-2001). San Francisco, California, USA, Morgan Kaufmann, pp 74–80

  14. Oren N (2002) Re-examining tf.idf based information retrieval with genetic programming. Proceedings of SAICSIT

  15. Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In: SIGIR ’98: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. New York, NY, USA, ACM Press, pp 275–281

  16. Salton G and Buckley C (1988). Term-weighting approaches in automatic text retrieval. Inform Process Manage 24(5): 513–523

  17. Salton G, Wong A and Yang CS (1975). A vector space model for automatic indexing. Commun ACM 18(11): 613–620

  18. Singhal A (2001). Modern information retrieval: a brief overview. Bull IEEE Comput Soc Tech Comm Data Eng 24(4): 35–43

  19. Trotman A (2005). Learning to rank. Inform Retrieval 8: 359–381

  20. Zobel J and Moffat A (1998). Exploring the similarity space. SIGIR Forum 32(1): 18–34

Download references

Author information

Correspondence to Ronan Cummins.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Cummins, R., O’Riordan, C. Evolved term-weighting schemes in Information Retrieval: an analysis of the solution space. Artif Intell Rev 26, 35–47 (2006). https://doi.org/10.1007/s10462-007-9034-5

Download citation

Keywords

  • Genetic programming
  • Information Retrieval
  • Term-weighting schemes