Evolutionary computation techniques are increasingly being applied to problems within Information Retrieval (IR). Genetic programming (GP) has previously been used with some success to evolve term-weighting schemes in IR. However, one fundamental problem with the solutions generated by this stochastic, non-deterministic process, is that they are often difficult to analyse. In this paper, we introduce two different distance measures between the phenotypes (ranked lists) of the solutions (term-weighting schemes) returned by a GP process. Using these distance measures, we develop trees which show how different solutions are clustered in the solution space. We show, using this framework, that our evolved solutions lie in a different part of the solution space than two of the best benchmark term-weighting schemes available.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. In: SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval. New York, NY, USA, ACM Press, pp 33–40
Carterette B, Allan J (2005) Incremental test collections. In: CIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management. New York, NY, USA, ACM Press, pp 680–687
Choi J-H, Jung H-Y, Kim H-S and Cho H-G (2000). PhyloDraw: a phylogenetic tree drawing system. Bioinformatics 16(11): 1056–1058
Chowdhury A, McCabe MC, Grossman D, Frieder O (2002) Document normalization revisited. In: SIGIR ’02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA, ACM Press, pp 381–382
Cummins R and O’Riordan C (2006). An analysis of the solution space for genetically programmed term-weighting schemes in information retrieval. In: Bell, PMD and Sage, P (eds) 17th artificial intelligence and cognitive science conference (AICS 2006), pp. Queen’s University, Belfast Northern Ireland
Cummins R and O’Riordan C (2006). Evolving local and global weighting schemes in information retrieval. Inform Retrieval 9(3): 311–330
Cummins R and O’Riordan C (2006). A framework for the study of evolved term-weighting schemes in information retrieval. In: Stein, B and Kao, O (eds) TIR-06 text based information retrieval, workshop, ECAI 2006, pp. Riva del Garda, Italy
Fan W, Gordon MD and Pathak P (2004). A generic ranking function discovery framework by genetic programming for information retrieval. Inform Proces Manage 40(4): 587–602
Fang H, Zhai C (2005) An exploration of axiomatic approaches to information retrieval. In: SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval. New York, NY, USA, ACM Press, pp 480–487
HE B, Ounis I (2003) A study of parameter tuning for term frequency normalization. In: CIKM ’03: Proceedings of the twelfth international conference on Information and knowledge management. New York, NY, USA, ACM Press, pp 10–16
Jones KS, Walker S and Robertson SE (2000). A probabilistic model of information retrieval: development and comparative experiments. Inf Process Manage 36(6): 779–808
Koza JR (1992). Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA
Luke S (2001) When short runs beat long runs. In: Proceedings of the genetic and evolutionary computation conference (GECCO-2001). San Francisco, California, USA, Morgan Kaufmann, pp 74–80
Oren N (2002) Re-examining tf.idf based information retrieval with genetic programming. Proceedings of SAICSIT
Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In: SIGIR ’98: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. New York, NY, USA, ACM Press, pp 275–281
Salton G and Buckley C (1988). Term-weighting approaches in automatic text retrieval. Inform Process Manage 24(5): 513–523
Salton G, Wong A and Yang CS (1975). A vector space model for automatic indexing. Commun ACM 18(11): 613–620
Singhal A (2001). Modern information retrieval: a brief overview. Bull IEEE Comput Soc Tech Comm Data Eng 24(4): 35–43
Trotman A (2005). Learning to rank. Inform Retrieval 8: 359–381
Zobel J and Moffat A (1998). Exploring the similarity space. SIGIR Forum 32(1): 18–34
About this article
Cite this article
Cummins, R., O’Riordan, C. Evolved term-weighting schemes in Information Retrieval: an analysis of the solution space. Artif Intell Rev 26, 35–47 (2006). https://doi.org/10.1007/s10462-007-9034-5
- Genetic programming
- Information Retrieval
- Term-weighting schemes