Machine Learning

, Volume 75, Issue 1, pp 129–165 | Cite as

An efficient algorithm for learning to rank from preference graphs

  • Tapio Pahikkala
  • Evgeni Tsivtsivadze
  • Antti Airola
  • Jouni Järvinen
  • Jorma Boberg
Article

Abstract

In this paper, we introduce a framework for regularized least-squares (RLS) type of ranking cost functions and we propose three such cost functions. Further, we propose a kernel-based preference learning algorithm, which we call RankRLS, for minimizing these functions. It is shown that RankRLS has many computational advantages compared to the ranking algorithms that are based on minimizing other types of costs, such as the hinge cost. In particular, we present efficient algorithms for training, parameter selection, multiple output learning, cross-validation, and large-scale learning. Circumstances under which these computational benefits make RankRLS preferable to RankSVM are considered. We evaluate RankRLS on four different types of ranking tasks using RankSVM and the standard RLS regression as the baselines. RankRLS outperforms the standard RLS regression and its performance is very similar to that of RankSVM, while RankRLS has several computational benefits over RankSVM.

Keywords

Ranking Preference learning Preference graph Regularized least-squares Kernel methods 

References

  1. Agarwal, S. (2006). Ranking on graph data. In W. W. Cohen & A. Moore (Eds.), ACM international conference proceeding series: Vol. 148. Proceedings of the 23rd international conference on machine learning (pp. 25–32). New York: ACM. CrossRefGoogle Scholar
  2. Agarwal, S., & Niyogi, P. (2005). Stability and generalization of bipartite ranking algorithms. In P. Auer & R. Meir (Eds.), Lecture notes in computer science: Vol. 3559. Proceedings of the 18th annual conference on learning theory (pp. 32–47). Berlin: Springer. Google Scholar
  3. Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159. CrossRefGoogle Scholar
  4. Brefeld, U., & Scheffer, T. (2005). AUC maximizing support vector learning. In N. Lachiche, C. Ferri, S. A. Macskassy, & A. Rakotomamonjy (Eds.), Proceedings of the 2nd workshop on ROC analysis in machine learning (ROCML’05). Google Scholar
  5. Brualdi, R. A., & Ryser, H. J. (1991). Combinatorial matrix theory. Cambridge: Cambridge University Press. MATHGoogle Scholar
  6. Clémençon, S., Lugosi, G., & Vayatis, N. (2005). Ranking and scoring using empirical risk minimization. In P. Auer & R. Meir (Eds.), Lecture notes in computer science: Vol. 3559. Proceedings of the 18th annual conference on learning theory (pp. 1–15). Berlin: Springer. Google Scholar
  7. Cortes, C., Mohri, M., & Rastogi, A. (2007a). An alternative ranking problem for search engines. In C. Demetrescu (Ed.), Lecture notes in computer science: Vol. 4525. Proceedings of the 6th workshop on experimental algorithms (pp. 1–21). Berlin: Springer. Google Scholar
  8. Cortes, C., Mohri, M., & Rastogi, A. (2007b). Magnitude-preserving ranking algorithms. In Z. Ghahramani (Ed.), ACM international conference proceeding series: Vol. 227. Proceedings of the 24th annual international conference on machine learning (pp. 169–176). New York: ACM. CrossRefGoogle Scholar
  9. Freund, Y., Iyer, R., Schapire, R. E., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal Machine Learning Research, 4, 933–969. CrossRefMathSciNetGoogle Scholar
  10. Fürnkranz, J., & Hüllermeier, E. (2005). Preference learning. Künstliche Intelligenz, 19(1), 60–61. Google Scholar
  11. Gestel, T. V., Suykens, J. A. K., Baesens, B., Viaene, S., Vanthienen, J., Dedene, G., Moor, B. D., & Vandewalle, J. (2004). Benchmarking least squares support vector machine classifiers. Machine Learning, 54(1), 5–32. MATHCrossRefGoogle Scholar
  12. Goldberg, K., Roeder, T., Gupta, D., & Perkins, C. (2001). Eigentaste: A constant time collaborative filtering algorithm. Information Retrieval, 4(2), 133–151. MATHCrossRefGoogle Scholar
  13. Golland, P., Liang, F., Mukherjee, S., & Panchenko, D. (2005). Permutation tests for classification. In P. Auer & R. Meir (Eds.), Lecture notes in computer science: Vol. 3559. Proceedings of the 18th annual conference on learning theory (pp. 501–515). Berlin: Springer. Google Scholar
  14. Herbrich, R., Graepel, T., & Obermayer, K. (1999). Support vector learning for ordinal regression. In Proceedings of the ninth international conference on artificial neural networks (pp. 97–102). London, Institute of Electrical Engineers. Google Scholar
  15. Horn, R., & Johnson, C. R. (1985). Matrix analysis. Cambridge: Cambridge University Press. MATHGoogle Scholar
  16. Huang, J., & Ling, C. X. (2005). Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 17(3), 299–310. CrossRefGoogle Scholar
  17. Joachims, T. (2002). Optimizing search engines using clickthrough data. In D. Hand, D. Keim, & R. Ng (Eds.), Proceedings of the 8th ACM SIGKDD conference on knowledge discovery and data mining KDD’02 (pp. 133–142). New York: ACM. Google Scholar
  18. Joachims, T. (2005). A support vector method for multivariate performance measures. In L. D. Raedt & S. Wrobel (Eds.), ACM international conference proceeding series: Vol. 119. Proceedings of the 22nd international conference on machine learning (pp. 377–384). New York: ACM. CrossRefGoogle Scholar
  19. Joachims, T. (2006). Training linear SVMs in linear time. In T. Eliassi-Rad, L. H. Ungar, M. Craven, & D. Gunopulos (Eds.), Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining KDD’06 (pp. 217–226). New York: ACM. CrossRefGoogle Scholar
  20. Johnson, R., & Zhang, T. (2008). Graph-based semi-supervised learning and spectral kernel design. IEEE Transactions on Information Theory, 54(1), 275–288. CrossRefMathSciNetGoogle Scholar
  21. Liu, T.-Y., Xu, J., Qin, T., Xiong, W., & Li, H. (2007). LETOR: Benchmark dataset for research on learning to rank for information retrieval. In T. Joachims, H. Li, T.-Y. Liu, & C. Zhai (Eds.), SIGIR 2007 workshop on learning to rank for information retrieval (pp. 3–10). Google Scholar
  22. Pahikkala, T., Airola, A., Boberg, J., & Salakoski, T. (2008a). Exact and efficient leave-pair-out cross-validation for ranking RLS. In T. Honkela, M. Pöllä, M.-S. Paukkeri, & O. Simula (Eds.), Proceedings of the 2nd international and interdisciplinary conference on adaptive knowledge representation and reasoning (AKRR’08) (pp. 1–8). Helsinki University of Technology. Google Scholar
  23. Pahikkala, T., Airola, A., Suominen, H., Boberg, J., & Salakoski, T. (2008b). Efficient AUC maximization with regularized least-squares. In A. Holst, P. Kreuger, & P. Funk (Eds.), Frontiers in artificial intelligence and applications: Vol. 173. Proceedings of the 10th Scandinavian conference on artificial intelligence SCAI, 2008 (pp. 12–19). Amsterdam: IOS Press. Google Scholar
  24. Pahikkala, T., Boberg, J., & Salakoski, T. (2006a). Fast n-fold cross-validation for regularized least-squares. In T. Honkela, T. Raiko, J. Kortela, & H. Valpola (Eds.), Proceedings of the ninth Scandinavian conference on artificial intelligence, Espoo, Finland (pp. 83–90). Otamedia Oy. Google Scholar
  25. Pahikkala, T., Suominen, H., Boberg, J., & Salakoski, T. (2007a). Transductive ranking via pairwise regularized least-squares. In P. Frasconi, K. Kersting, & K. Tsuda (Eds.), Workshop on mining and learning with graphs (pp. 175–178). Google Scholar
  26. Pahikkala, T., Tsivtsivadze, E., Airola, A., Boberg, J., & Salakoski, T. (2007b). Learning to rank with pairwise regularized least-squares. In T. Joachims, H. Li, T.-Y. Liu, C. Zhai (Eds.), SIGIR 2007 workshop on learning to rank for information retrieval (pp. 27–33). Google Scholar
  27. Pahikkala, T., Tsivtsivadze, E., Boberg, J., & Salakoski, T. (2006b). Graph kernels versus graph representations: a case study in parse ranking. In T. Gärtner, G. C. Garriga, & T. Meinl (Eds.), Proceedings of the ECML/PKDD’06 workshop on mining and learning with graphs, Berlin, Germany (pp. 181–188). Google Scholar
  28. Parker, B. J., Gunter, S., & Bedo, J. (2007). Stratification bias in low signal microarray studies. BMC Bioinformatics, 8, 326. CrossRefGoogle Scholar
  29. Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78(9), 1481–1497. CrossRefGoogle Scholar
  30. Provost, F. J., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In J. Shavlik (Ed.), Proceedings of the fifteenth international conference on machine learning (pp. 445–453). San Mateo: Morgan Kaufmann. Google Scholar
  31. Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., & Salakoski, T. (2007). BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics, 8, 50. CrossRefGoogle Scholar
  32. Pyysalo, S., Ginter, F., Pahikkala, T., Boberg, J., Järvinen, J., & Salakoski, T. (2006). Evaluation of two dependency parsers on biomedical corpus targeted at protein-protein interactions. Recent Advances in Natural Language Processing for Biomedical Applications, special issue of the International Journal of Medical Informatics, 75(6), 430–442. Google Scholar
  33. Quiñonero-Candela, J., & Rasmussen, C. E. (2005). A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research, 6, 1939–1959. Google Scholar
  34. Quiñonero-Candela, J., Rasmussen, C. E., & Williams, C. K. I. (2007). Approximation methods for Gaussian process regression. In L. Bottou, O. Chapelle, D. DeCoste, & J. Weston (Eds.), Large-scale kernel machines (pp. 203–224). Cambridge: MIT Press. Google Scholar
  35. Rakotomamonjy, A. (2004). Optimizing area under ROC curve with SVMs. In J. Hernández-Orallo, C. Ferri, N. Lachiche, & P. A. Flach (Eds.), Proceedings of the 1st international workshop on ROC analysis in artificial intelligence (pp. 71–80). Google Scholar
  36. Rifkin, R. (2002). Everything old is new again: a fresh look at historical approaches in machine learning. Ph.D. thesis, Massachusetts Institute of Technology. Google Scholar
  37. Rifkin, R., & Klautau, A. (2004). In defense of one-vs-all classification. Journal of Machine Learning Research, 5, 101–141. MathSciNetGoogle Scholar
  38. Rifkin, R., & Lippert, R. (2007a). Notes on regularized least squares (Technical Report MIT-CSAIL-TR-2007-025). Massachusetts Institute of Technology. Google Scholar
  39. Rifkin, R., & Lippert, R. (2007b). Value regularization and Fenchel duality. Journal of Machine Learning Research, 8, 441–479. MathSciNetGoogle Scholar
  40. Rifkin, R., Yeo, G., & Poggio, T. (2003). Regularized least-squares classification. In J. Suykens, G. Horvath, S. Basu, C. Micchelli, & J. Vandewalle (Eds.), NATO science series III: computer and system sciences: Vol. 190. Advances in learning theory: methods, model and applications (pp. 131–154). Amsterdam: IOS Press. Chap. 7. Google Scholar
  41. Schölkopf, B., Herbrich, R., & Smola, A. J. (2001). A generalized representer theorem. In D. Helmbold & R. Williamson (Eds.), Proceedings of the 14th annual conference on computational learning theory and 5th European conference on computational learning theory (pp. 416–426). Berlin: Springer. Google Scholar
  42. Schölkopf, B., Mika, S., Burges, C., Knirsch, P., Müller, K.-R., Rätsch, G., & Smola, A. (1999). Input space versus feature space in kernel-based methods. IEEE Transactions On Neural Networks, 10(5), 1000–1017. CrossRefGoogle Scholar
  43. Shewchuk, J. R. (1994). An introduction to the conjugate gradient method without the agonizing pain (Technical Report CMU-CS-94-125). Carnegie Mellon University, Pittsburgh, PA, USA. Google Scholar
  44. Sleator, D. D., & Temperley, D. (1991). Parsing English with a link grammar (Technical Report CMU-CS-91-196). Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. Google Scholar
  45. Smola, A. J., & Schölkopf, B. (2000). Sparse greedy matrix approximation for machine learning. In P. Langley (Ed.), Proceedings of the seventeenth international conference on machine learning (pp. 911–918). San Mateo: Morgan Kaufmann. Google Scholar
  46. Sonnenburg, S., Braun, M. L., Ong, C. S., Bengio, S., Bottou, L., Holmes, G., Lecun, Y., Müller, K. R., Pereira, F., Rasmussen, C. E., Rätsch, G., Schölkopf, B., Smola, A., Vincent, P., Weston, J., & Williamson, R. (2007). The need for open source software in machine learning. Journal of Machine Learning Research, 8, 2443–2466. Google Scholar
  47. Suykens, J. A. K., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9(3), 293–300. CrossRefMathSciNetGoogle Scholar
  48. Tsivtsivadze, E., Pahikkala, T., Airola, A., Boberg, J., & Salakoski, T. (2008). A sparse regularized least-squares preference learning algorithm. In A. Holst, P. Kreuger, & P. Funk (Eds.), Frontiers in artificial intelligence and applications: Vol. 173. Proceedings of the 10th Scandinavian conference on artificial intelligence SCAI, 2008 (pp. 76–83). Amsterdam: IOS Press. Google Scholar
  49. Tsivtsivadze, E., Pahikkala, T., Pyysalo, S., Boberg, J., Mylläri, A., & Salakoski, T. (2005). Regularized least-squares for parse ranking. In A. F. Famili, J. N. Kok, J. M. Peña, A. Siebes, & A. J. Feelders (Eds.), Lecture notes in computer science: Vol. 3646. Proceedings of the 6th international symposium on intelligent data analysis (pp. 464–474). Berlin: Springer. Google Scholar
  50. Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics, 1, 80–83. CrossRefGoogle Scholar
  51. Zhang, P., & Peng, J. (2004). SVM vs regularized least squares classification. In J. Kittler, M. Petrou, & M. Nixon (Eds.), Proceedings of the 17th international conference on pattern recognition ICPR’04 (Vol. 1, pp. 176–179). Los Alamitos: IEEE Computer Society. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Tapio Pahikkala
    • 1
  • Evgeni Tsivtsivadze
    • 1
  • Antti Airola
    • 1
  • Jouni Järvinen
    • 1
  • Jorma Boberg
    • 1
  1. 1.Turku Centre for Computer Science (TUCS), Department of Information TechnologyUniversity of TurkuTurkuFinland

Personalised recommendations