Abstract
New general purpose ranking functions are discovered using genetic programming. The TREC WSJ collection was chosen as a training set. A baseline comparison function was chosen as the best of inner product, probability, cosine, and Okapi BM25. An elitist genetic algorithm with a population size 100 was run 13 times for 100 generations and the best performing algorithms chosen from these. The best learned functions, when evaluated against the best baseline function (BM25), demonstrate some significant performance differences, with improvements in mean average precision as high as 32% observed on one TREC collection not used in training. In no test is BM25 shown to significantly outperform the best learned function.
Article PDF
Similar content being viewed by others
References
Anh VN and Moffat A (2002) Improved retrieval effectiveness through impact transformation. Australian Computer Science Communications, 24(2):41–47.
Buckley C (1991) trec_eval. Available: ?????.
Buckley C and Voorhees EM (2000) Evaluating evaluation measure stability. In: Proceedings of the 23rd ACM SIGIR Conference on Information Retrieval, pp. 33–40.
Clarke CLA, Cormack GV and Tudhope EA (2000) Relevance ranking for one to three term queries. Information Processing and Management, 36(2):291–311.
De Jong KA (1975) An analysis of the behavior of a class of genetic adaptive systems. Unpublished Ph.D., University of Michigan.
Fan W, Gordon MD and Pathak P (1999) Automatic generation of a matching function by genetic programming for effective information retrieval. In: Proceedings of the 1999 American Conference on Information Systems.
Fan W, Gordon MD and Pathak P (2004) A generic ranking function discovery framework by genetic programming for information retrieval. Information Processing and Management, 40(4):587–602.
Fan W, Gordon MD, Pathak P, Xi W and Fox EA (2004) Ranking function optimization for effective web search by genetic programming: An empirical study. In: Proceedings of the 37th Annual Hawaii International Conference on System Sciences.
Grefenstette JJ (1986) Optimization of control parameters for genetic algorithms. IEEE Transactions on Systems, Man, and Cybernetics, 16(1):122–128.
Harman D (1992) Ranking algorithms. In: Frakes WB and Baeza-Yates R. Eds., Information retrieval: Data Structures and Algorithms Englewood Cliffs, New Jersey, USA, Prentice Hall, PP. 363–392.
Harman D (1993) Overview of the first TREC conference. In: Proceedings of the 16th ACM SIGIR Conference on Information Retrieval, pp. 36–47.
Hart WE and Belew RK (1996) Optimization with genetic algorithm hybrids that use local searches. In: Belew, R.K. and Mitchell M. Eds. Adaptive Individuals in evolving Populations: Models and Algorithms Addison-Wesley Longman Publishing Co., Inc. pp. 483–496.
Hawking D, Craswell N, Thistlewaite P and Harman D (1999) Results and challenges in web search evaluation. In: Proceedings of the 8th International Conference on World Wide Web, pp. 1321–1330.
Holland JH (1975). Adaptation in Natural and Artificial Systems. Ann Arbor, University of Michigan Press.
Igel C and Chellapilla K (1999) Investigating the influence of depth and degree of genotypic change on fitness in genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ‘99), pp. 1061–1068.
Kaszkiel M and Zobel J (1998) Term-ordered query evaluation versus document-ordered query evaluation for large document databases. In: Proceedings of the 21st ACM SIGIR Conference on Information Retrieval, pp. 343–344.
Kekäläinen J and Järvelin K (2002) Using graded relevance assessments in IR evaluation. Journal of the American Society for Information Science and Technology, 53(13):1120–1129.
Khuri S, Bäck T and Heitkötter J (1994) An evolutionary approach to combinatorial optimization problems. In: Proceedings of the 22nd Annual ACM Computer Science Conference, pp. 66–73.
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632.
Koza JR (1992) Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA.
Oren N (2002a) Improving the effectiveness of information retrieval with genetic programming. Unpublished M.Sc., University of the Witwatersrand, Johannesburg.
Oren N (2002b) Reexamining tf.idf based information retrieval with genetic programming. In: Proceedings of the 2002 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists on Enablement through Technology (SAICSIT), pp. 224–234.
Page L, Brin S, Motwani R, and Winograd T (1998) The PageRank Citation Ranking: Bringing Order to the Web (1999–66), Stanford Digital Library Technologies Project.
Pôssas B, Ziviani N, Meira W, and Ribeiro-Neto B (2002) Set-based model: A new approach for information retrieval. In: Proceedings of the 25th ACM SIGIR Conference on Information Retrieval, pp. 230–237.
Raghavan VV, Shi H-p, and Yu CT (1983) Evaluation of the 2-Poisson model as a basis for using term frequency data in searching. In Proceedings of the 6th ACM SIGIR Conference on Information Retrieval, pp. 88–100.
Robertson SE and Sparck Jones K (1976) Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129–146.
Sparck Jones, Robertson SE and Walker S (1999) Okapi/Keenbow at TREC-8. In: Proceedings of the 8th Text REtrieval Conference (TREC-8).
Robertson SE, Walker S, Beaulieu MM, Gatford M, and Payne A (1995) Okapi at TREC-4. In: Proceedings of the 4th Text REtrieval Conference (TREC-4), pp. 73–96.
Robertson SE, Walker S, Jones S, Beaulieu MM, and Gatford M (1994) Okapi at TREC-3. In Proceedings of the 3rd Text REtrieval Conference (TREC-3), pp. 109–126.
Salton G and Buckley C (1988) Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513–523.
Salton G, Wong A, and Yang CS (1975) A vector space model for automatic indexing. Communications of the ACM, 18(11):613–620.
Savoy J, Ndarugendamwo M, and Vrajitoru D (1995) Report on the TREC-4 experiment: Combining probabilistic and vector-space schemes. In: Proceedings of the 4th Text REtrieval Conference (TREC-4), pp. 537–547.
Shaw WM, Wood JB, Wood RE, and Tibbo HR (1991) The cystic fibrosis database: Content and research opportunities. Library and Information Science Research, 13: 347–366.
Smart W and Zhang M (2004) Applying online gradient-descent search to genetic programming for object recognition. In: Proceedings of the 2nd Workshop on Australasian Information Security, Data Mining and Web Intelligence, and Software Internationalisation.
Tongchim S and Chongstitvatana P (2000) Comparison between synchronous and asynchronous implementation of parallel genetic programming. In: Proceedings of the 5th International Symposium on Artificial Life and Robotics (AROB), pp. 251–254.
Williams HE and Zobel J (1999) Compressing integers for fast file access. Computer Journal 42(3):193–201.
Witten IH, Moffat A, and Bell TC (1994) Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold, New York, USA.
Zobel J and Moffat A (1995) Adding compression to a full-text retrieval system. Software—Practice and Experience, 25(8):891–903.
Zobel J and Moffat A (1998) Exploring the similarity space. SIGIR Forum, 32(1):18–34.
Zobel J, Moffat A, and Ramamohanarao K (1998) Inverted files versus signature files for text indexing. Transactions on Database Systems, 23(4):453–490.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Trotman, A. Learning to Rank. Inf Retrieval 8, 359–381 (2005). https://doi.org/10.1007/s10791-005-6991-7
Issue Date:
DOI: https://doi.org/10.1007/s10791-005-6991-7