Skip to main content
Log in

Learning to rank: new approach with the layered multi-population genetic programming on click-through features

  • Published:
Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Abstract

Users’ click-through data is a valuable source of information about the performance of Web search engines, but it is included in few datasets for learning to rank. In this paper, inspired by the click-through data model, a novel approach is proposed for extracting the implicit user feedback from evidence embedded in benchmarking datasets. This process outputs a set of new features, named click-through features. Generated click-through features are used in a layered multi-population genetic programming framework to find the best possible ranking functions. The layered multi-population genetic programming framework is fast and provides more extensive search capability compared to the traditional genetic programming approaches. The performance of the proposed ranking generation framework is investigated both in the presence and in the absence of explicit click-through data in the utilized benchmark datasets. The experimental results show that click-through features can be efficiently extracted in both cases but that more effective ranking functions result when click-through features are generated from benchmark datasets with explicit click-through data. In either case, the most noticeable ranking improvements are achieved at the tops of the provided ranked lists of results, which are highly targeted by the Web users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. T. Joachims, Optimizing search engines using clickthrough data, in The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)

  2. Z. Dou, R. Song, X. Yuan, J.-R. Wen, Are click-through data adequate for learning web search rankings?, in The 17th ACM Conference on Information and Knowledge Management (2008)

  3. A.H. Keyhanipour, B. Moshiri, M. Piroozmand, C. Lucas, Aggregation of multiple search engines based on users’ preferences in webfusion. Knowl.-Based Syst. 20(4), 321–328 (2007)

    Article  Google Scholar 

  4. C. Macdonald, I. Ounis, Usefulness of quality click-through data for training, in The 2009 Workshop on Web Search Click Data (2009)

  5. C. Macdonald, R.L. Santos, I. Ounis, The whens and hows of learning to rank for web search. Inf. Retr. 16(5), 584–628 (2013)

    Article  Google Scholar 

  6. J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, Cambridge, 1992)

    MATH  Google Scholar 

  7. J.-Y. Lin, H.-R. Ke, B.-C. Chien, W.-P. Yang, Designing a classifier by a layered multi-population genetic programming approach. Pattern Recogn. 40, 2211–2225 (2007)

    Article  MATH  Google Scholar 

  8. T. Qin, T.-Y. Liu, J. Xu, H. Li, LETOR: Benchmark dataset for research on learning to rank for information retrieval (Amsterdam, Netherlands, 2007)

  9. O. Chapelle, Y. Chang, Yahoo! learning to rank challenge overview. J. Mach. Learn. Res. 14, 1–24 (2011)

    Google Scholar 

  10. O.D. Alcantara, A.R. Pereira Jr, H.M. de Almeida, M.A. Goncalves, C. Middleton, R. Baeza-Yates, WCL2R: a benchmark collection for learning to rank research with clickthrough data. J. Inf. Data Manag. 1(3), 551–566 (2010)

    Google Scholar 

  11. T.-Y. Liu, Learning to Rank for Information Retrieval (Springer, Berlin, 2011)

    Book  MATH  Google Scholar 

  12. D. Cossock, T. Zhang, Subset ranking using regression, in The 19th Annual Conference on Learning Theory (2006)

  13. N. Fuhr, Optimum polynomial retrieval functions based on the probability ranking principle. ACM Trans. Inf. Syst. 7(3), 183–204 (1989)

  14. W. S. Cooper, F. C. Gey, D. P. Dabney, Probabilistic retrieval based on staged logistic regression, in The 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1992)

  15. F. C. Gey, Inferring probability of relevance using the method of logistic regression, in The 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1994)

  16. R. Nallapati, Discriminative models for information retrieval, in The 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2004)

  17. W. Chu, Z. Ghahramani, Gaussian processes for ordinal regression. J. Mach. Learn. Res. 6, 1019–1041 (2005)

    MathSciNet  MATH  Google Scholar 

  18. K. Crammer, Y. Singer, Pranking with ranking. Adv. Neural Inf. Process. Syst. 14, 641–647 (2002)

    Google Scholar 

  19. A. Shashua, A. Levin, Ranking with large margin principles: two approaches. Adv. Neural Inf. Process. Syst. 15, 937–944 (2003)

  20. Y. Freund, R. Iyer, R.E. Schapire, Y. Singer, An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)

    MathSciNet  MATH  Google Scholar 

  21. M. F. Tsai, T.-Y. Liu, T. Qin, H.-H. Chen, W.-Y. Ma, Frank: a ranking method with fidelity loss, in The 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2007)

  22. Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, H.-W. Hon, Adapting ranking SVM to document retrieval, in The 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2006)

  23. L. Rigutini, T. Papini, M. Maggini, F. Scarselli, SortNet: Learning to rank by a neural-based sorting algorithm, in SIGIR 2008 Workshop on Learning to Rank for Information Retrieval (2008)

  24. E. Renshaw, A. Lazier, C. Burges, T. Shaked, M. Deeds, N. Hamilton, G. Hullender, Learning to rank using gradient descent, in The 22nd International Conference on Machine Learning (2005)

  25. C. J. Burges, R. Ragno, Q. V. Le, Learning to rank with nonsmooth cost functions. Adv. Neural Inf. Process. Syst. 19, 193–200 (2007)

  26. Y. Ganjisaffar, R. Caruana, C. V. Lopes, Bagging gradient-boosted trees for high precision, low variance ranking models, in The 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (2011)

  27. M. Taylor, J. Guiver, S. Robertson, T. Minka, Softrank: optimising non-smooth rank metrics, in The 1st International Conference on Web Search and Web Data Mining (2008)

  28. O. Chapelle, M. Wu, Gradient descent optimization of smoothed information retrieval metrics. Inf. Retr. 13(3), 216–235 (2010)

    Article  Google Scholar 

  29. Y. Yue, T. Finley, F. Radlinski, T. Joachims, A support vector method for optimizing average precision, in The 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2007)

  30. S. Chakrabarti, R. Khanna, U. Sawant, C. Bhattacharyya, Structured learning for nonsmooth ranking losses, in The 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2008)

  31. J. Xu, T.-Y. Liu, M. Lu, H. Li, W.-Y. Ma, Directly optimizing IR evaluation measures in learning to rank, in The 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2008)

  32. J. Xu, H. Li, Adarank: a boosting algorithm for information retrieval, in The 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2007)

  33. J.-Y. Yeh, J.-Y. Lin, H.-R. Ke, W.-P. Yang, Learning to rank for information retrieval using genetic programming, in 2012 IEEE International Conference on Computational Intelligence and Cybernetics (2007)

  34. Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, H. Li, Learning to rank: from pairwise approach to listwise approach, in The 24th International Conference on Machine Learning (2007)

  35. J. C. Huang, B. J. Frey, Structured ranking learning using cumulative distribution networks. Adv. Neural Inf. Process. Syst. 21, 697–704 (2009)

  36. M. N. Volkovs, R. S. Zemel, Boltzrank: learning to maximize expected ranking gain, in The 26th International Conference on Machine Learning (2009)

  37. O. Cordón, F.D. Moya, C. Zarco, A GA-P algorithm to automatically formulate extended Boolean queries for a fuzzy information retrieval system. Mathw. Soft Comput. 7(2–3), 309–322 (2000)

    MATH  Google Scholar 

  38. C. López-Pujalte, V.P. Guerrero Bote, F.D. Moya, A test of genetic algorithms in relevance feedback. Inf. Process. Manag. 38(6), 793–805 (2002)

    Article  MATH  Google Scholar 

  39. A.G. López-Herrera, E. Herrera-Viedma, F. Herrera, A study of the use of multi-objective evolutionary algorithms to learn Boolean queries: a comparative study. J. Assoc. Inf. Sci. Technol. 60(6), 1192–1207 (2009)

    Article  Google Scholar 

  40. Z. Zhu, X. Chen, Q. Zhu, Q. Xie, A GA-based query optimization method for web information retrieval. Appl. Math. Comput. 185(2), 919–930 (2007)

    MATH  Google Scholar 

  41. R.L. Cecchini, C.M. Lorenzetti, A.G. Maguitman, N.B. Brignole, Using genetic algorithms to evolve a population of topical queries. Inf. Process. Manag. 44(6), 1863–1878 (2008)

    Article  Google Scholar 

  42. R.L. Cecchini, C.M. Lorenzetti, A.G. Maguitman, N.B. Brignole, Multiobjective evolutionary algorithms for context-based search. J. Am. Soc. Inf. Sci. Technol. 61(6), 1258–1274 (2010)

    Google Scholar 

  43. A. H. Keyhanipour, B. Moshiri, Designing a web spam classifier based on feature fusion in the layered multi-population genetic programming framework, in The 16th International Conference on Information Fusion (2013)

  44. W. Fan, M.D. Gordon, P. Pathak, Discovery of context-specific ranking functions for effective information retrieval using genetic programming. IEEE Trans. Knowl. Data Eng. 16(4), 523–527 (2004)

    Article  Google Scholar 

  45. W. Fan, M.D. Gordon, P. Pathak, Genetic programming-based discovery of ranking functions for effective web search. J. Manag. Inf. Syst. 21(4), 37–56 (2005)

    Google Scholar 

  46. W. Fan, P. Pathak, L. Wallace, Nonlinear ranking function representations in genetic programming-based ranking discovery for personalized search. Decis. Support Syst. 42(3), 1338–1349 (2006)

    Article  Google Scholar 

  47. H. M. de Almeida, M. A. Gonçalves, M. Cristo, P. Calado, A combined component approach for finding collectionadapted ranking functions based on genetic programming, in The 30th annual international ACM SIGIR conference on Research and development in information retrieval (2007)

  48. F. Wang, X. Xu, AdaGP-Rank: applying boosting technique to genetic programming for learning to rank, in IEEE Youth Conference on Information Computing and Telecommunications (2010)

  49. F. Fernández, M. Tomassini, L. Vanneschi, An empirical study of multipopulation genetic programming. Genet. Program Evolvable Mach. 4(1), 21–51 (2003)

    Article  MATH  Google Scholar 

  50. J.-Y. Lin, H.-R. Ke, B.-C. Chien, W.-P. Yang, Classifier design with feature selection and feature extraction using layered genetic programming. Expert Syst. Appl. 34, 1384–1393 (2008)

    Article  Google Scholar 

  51. A.H. Keyhanipour, M. Piroozmand, K. Badie, A GP-adaptive web ranking discovery framework based on combinative content and context features. J. Informetr. 3, 78–89 (2009)

    Article  Google Scholar 

  52. S. Wang, J. Ma, J. Liu, Learning to rank using evolutionary computation: immune programming or genetic programming?, in The 18th ACM conference on In-70 formation and knowledge management (2009)

  53. D. Bollegala, N. Noman, H. Iba, RankDE: learning a ranking function for information retrieval using differential evolution, in The 13th Annual Conference on Genetic and Evolutionary Computation (2011)

  54. R. Storn, On the usage of differential evolution for function optimization, in 1996 Biennial Conference of the North American Fuzzy Information Processing Society (1996)

  55. R. Storn, K. Price, Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11(4), 341–359 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  56. S. Wang, B. Gao, K. Wang, H. Lauw, CCrank: parallel learning to rank with cooperative coevolution, in The Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)

  57. M.A. Islam, RankGPES: Learning to Rank for Information Retrieval using a Hybrid Genetic Programming with Evolutionary Strategies (Ryerson University, Toronto, 2013)

    Google Scholar 

  58. E. Agichtein, E. Brill, S. Dumais, Improving web search ranking by incorporating user behavior information, in The International ACM SIGIR Conference on Research & Development of Information Retrieval (2006)

  59. F. Radlinski, T. Joachims, Query chains: learning to rank from implicit feedback, in The ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2005)

  60. T. Joachims, F. Radlinski, Search engines that learn from implicit feedback. Computer 40(8), 34–40 (2007)

    Article  Google Scholar 

  61. T. Moon, S. Ji, C. Liao, Z. Zheng, User behavior driven ranking without editorial judgments, in The 19th ACM International Conference on Information and Knowledge Management (2010)

  62. K. Hofmann, S. Whiteson, M. de Rijke, Balancing exploration and exploitation in learning to rank online, in The 33rd European conference on Advances in information retrieval (2011)

  63. N. Liu, J. Yan, D. Shen, D. Chen, Z. Chen, Y. Li, Learning to rank audience for behavioral targeting, in The 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2010)

  64. C.D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval (Cambridge University Press, Cambridge, 2008)

    Book  MATH  Google Scholar 

  65. LETOR4.0 Datasets (2009) [Online]. Available: http://research.microsoft.com/en-us/um/beijing/projects/letor/letor4dataset.aspx. Accessed 1 March 2015

  66. TodoCL, TodoCL search engine Website (2004) [Online]. Available: http://www.todocl.cl. Accessed 1 March 2015

  67. WCL2R (2010) [Online]. Available: http://www.latin.dcc.ufmg.br/collections/wcl2r. Accessed 1 March 2015

  68. LETOR4.0’s Features List (2009) [Online]. Available: http://research.microsoft.com/en-us/um/beijing/projects/letor/LETOR4.0/Data/Features_in_LETOR4.pdf. Accessed 1 March 2015

  69. C. Zhai, J. Lafferty, A study of smoothing methods for language models applied to Ad Hoc information retrieval, in The 24th Annual international ACM SIGIR Conference on Research and Development in Information Retrieval (2001)

  70. M.G. Kendall, Rank Correlation Methods (Oxford University Press, London, 1948)

    MATH  Google Scholar 

  71. T. Joachims, Training linear SVMs in linear time, in The 12th International Conference on Knowledge Discovery and Data Mining (2006)

  72. A. A. Veloso, H. M. Almeida, M. A. Gonçalves, W. J. Meira, Learning to rank at query-time using association rules, in The 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2008)

  73. L. A. Granka, T. Joachims, G. Gay, Eye-tracking analysis of user behavior in WWW search, in The 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2004)

  74. M. Miller, 53% of organic search clicks go to first link, 10 October 2012. [Online]. Available: http://searchenginewatch.com/article/2215868/53-of-Organic-Search-Clicks-Go-to-First-Link-Study. Accessed 1 March 2015

Download references

Acknowledgments

This research work is accomplished by the financial support of the University of Tehran (Grant ID: 8101004/1/02). The authors thank the Editor-in-Chief, the Associate Editor and three anonymous reviewers for their helpful comments and suggestions. Authors would like to give special thanks to Dr. Alireza Tavakoli Targhi and Ms. Maryam Piroozmand for their helps and supports.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amir Hosein Keyhanipour.

Appendices

Appendix 1

See Table 11.

Table 11 List of features in the LETOR4.0 benchmark dataset

Appendix 2

See Table 12.

Table 12 List of features in the WCL2R benchmark dataset

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Keyhanipour, A.H., Moshiri, B., Oroumchian, F. et al. Learning to rank: new approach with the layered multi-population genetic programming on click-through features. Genet Program Evolvable Mach 17, 203–230 (2016). https://doi.org/10.1007/s10710-016-9263-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10710-016-9263-y

Keywords

Navigation