Abstract
In this paper, we introduce a framework for regularized least-squares (RLS) type of ranking cost functions and we propose three such cost functions. Further, we propose a kernel-based preference learning algorithm, which we call RankRLS, for minimizing these functions. It is shown that RankRLS has many computational advantages compared to the ranking algorithms that are based on minimizing other types of costs, such as the hinge cost. In particular, we present efficient algorithms for training, parameter selection, multiple output learning, cross-validation, and large-scale learning. Circumstances under which these computational benefits make RankRLS preferable to RankSVM are considered. We evaluate RankRLS on four different types of ranking tasks using RankSVM and the standard RLS regression as the baselines. RankRLS outperforms the standard RLS regression and its performance is very similar to that of RankSVM, while RankRLS has several computational benefits over RankSVM.
Article PDF
Similar content being viewed by others
References
Agarwal, S. (2006). Ranking on graph data. In W. W. Cohen & A. Moore (Eds.), ACM international conference proceeding series: Vol. 148. Proceedings of the 23rd international conference on machine learning (pp. 25–32). New York: ACM.
Agarwal, S., & Niyogi, P. (2005). Stability and generalization of bipartite ranking algorithms. In P. Auer & R. Meir (Eds.), Lecture notes in computer science: Vol. 3559. Proceedings of the 18th annual conference on learning theory (pp. 32–47). Berlin: Springer.
Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159.
Brefeld, U., & Scheffer, T. (2005). AUC maximizing support vector learning. In N. Lachiche, C. Ferri, S. A. Macskassy, & A. Rakotomamonjy (Eds.), Proceedings of the 2nd workshop on ROC analysis in machine learning (ROCML’05).
Brualdi, R. A., & Ryser, H. J. (1991). Combinatorial matrix theory. Cambridge: Cambridge University Press.
Clémençon, S., Lugosi, G., & Vayatis, N. (2005). Ranking and scoring using empirical risk minimization. In P. Auer & R. Meir (Eds.), Lecture notes in computer science: Vol. 3559. Proceedings of the 18th annual conference on learning theory (pp. 1–15). Berlin: Springer.
Cortes, C., Mohri, M., & Rastogi, A. (2007a). An alternative ranking problem for search engines. In C. Demetrescu (Ed.), Lecture notes in computer science: Vol. 4525. Proceedings of the 6th workshop on experimental algorithms (pp. 1–21). Berlin: Springer.
Cortes, C., Mohri, M., & Rastogi, A. (2007b). Magnitude-preserving ranking algorithms. In Z. Ghahramani (Ed.), ACM international conference proceeding series: Vol. 227. Proceedings of the 24th annual international conference on machine learning (pp. 169–176). New York: ACM.
Freund, Y., Iyer, R., Schapire, R. E., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal Machine Learning Research, 4, 933–969.
Fürnkranz, J., & Hüllermeier, E. (2005). Preference learning. Künstliche Intelligenz, 19(1), 60–61.
Gestel, T. V., Suykens, J. A. K., Baesens, B., Viaene, S., Vanthienen, J., Dedene, G., Moor, B. D., & Vandewalle, J. (2004). Benchmarking least squares support vector machine classifiers. Machine Learning, 54(1), 5–32.
Goldberg, K., Roeder, T., Gupta, D., & Perkins, C. (2001). Eigentaste: A constant time collaborative filtering algorithm. Information Retrieval, 4(2), 133–151.
Golland, P., Liang, F., Mukherjee, S., & Panchenko, D. (2005). Permutation tests for classification. In P. Auer & R. Meir (Eds.), Lecture notes in computer science: Vol. 3559. Proceedings of the 18th annual conference on learning theory (pp. 501–515). Berlin: Springer.
Herbrich, R., Graepel, T., & Obermayer, K. (1999). Support vector learning for ordinal regression. In Proceedings of the ninth international conference on artificial neural networks (pp. 97–102). London, Institute of Electrical Engineers.
Horn, R., & Johnson, C. R. (1985). Matrix analysis. Cambridge: Cambridge University Press.
Huang, J., & Ling, C. X. (2005). Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 17(3), 299–310.
Joachims, T. (2002). Optimizing search engines using clickthrough data. In D. Hand, D. Keim, & R. Ng (Eds.), Proceedings of the 8th ACM SIGKDD conference on knowledge discovery and data mining KDD’02 (pp. 133–142). New York: ACM.
Joachims, T. (2005). A support vector method for multivariate performance measures. In L. D. Raedt & S. Wrobel (Eds.), ACM international conference proceeding series: Vol. 119. Proceedings of the 22nd international conference on machine learning (pp. 377–384). New York: ACM.
Joachims, T. (2006). Training linear SVMs in linear time. In T. Eliassi-Rad, L. H. Ungar, M. Craven, & D. Gunopulos (Eds.), Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining KDD’06 (pp. 217–226). New York: ACM.
Johnson, R., & Zhang, T. (2008). Graph-based semi-supervised learning and spectral kernel design. IEEE Transactions on Information Theory, 54(1), 275–288.
Liu, T.-Y., Xu, J., Qin, T., Xiong, W., & Li, H. (2007). LETOR: Benchmark dataset for research on learning to rank for information retrieval. In T. Joachims, H. Li, T.-Y. Liu, & C. Zhai (Eds.), SIGIR 2007 workshop on learning to rank for information retrieval (pp. 3–10).
Pahikkala, T., Airola, A., Boberg, J., & Salakoski, T. (2008a). Exact and efficient leave-pair-out cross-validation for ranking RLS. In T. Honkela, M. Pöllä, M.-S. Paukkeri, & O. Simula (Eds.), Proceedings of the 2nd international and interdisciplinary conference on adaptive knowledge representation and reasoning (AKRR’08) (pp. 1–8). Helsinki University of Technology.
Pahikkala, T., Airola, A., Suominen, H., Boberg, J., & Salakoski, T. (2008b). Efficient AUC maximization with regularized least-squares. In A. Holst, P. Kreuger, & P. Funk (Eds.), Frontiers in artificial intelligence and applications: Vol. 173. Proceedings of the 10th Scandinavian conference on artificial intelligence SCAI, 2008 (pp. 12–19). Amsterdam: IOS Press.
Pahikkala, T., Boberg, J., & Salakoski, T. (2006a). Fast n-fold cross-validation for regularized least-squares. In T. Honkela, T. Raiko, J. Kortela, & H. Valpola (Eds.), Proceedings of the ninth Scandinavian conference on artificial intelligence, Espoo, Finland (pp. 83–90). Otamedia Oy.
Pahikkala, T., Suominen, H., Boberg, J., & Salakoski, T. (2007a). Transductive ranking via pairwise regularized least-squares. In P. Frasconi, K. Kersting, & K. Tsuda (Eds.), Workshop on mining and learning with graphs (pp. 175–178).
Pahikkala, T., Tsivtsivadze, E., Airola, A., Boberg, J., & Salakoski, T. (2007b). Learning to rank with pairwise regularized least-squares. In T. Joachims, H. Li, T.-Y. Liu, C. Zhai (Eds.), SIGIR 2007 workshop on learning to rank for information retrieval (pp. 27–33).
Pahikkala, T., Tsivtsivadze, E., Boberg, J., & Salakoski, T. (2006b). Graph kernels versus graph representations: a case study in parse ranking. In T. Gärtner, G. C. Garriga, & T. Meinl (Eds.), Proceedings of the ECML/PKDD’06 workshop on mining and learning with graphs, Berlin, Germany (pp. 181–188).
Parker, B. J., Gunter, S., & Bedo, J. (2007). Stratification bias in low signal microarray studies. BMC Bioinformatics, 8, 326.
Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78(9), 1481–1497.
Provost, F. J., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In J. Shavlik (Ed.), Proceedings of the fifteenth international conference on machine learning (pp. 445–453). San Mateo: Morgan Kaufmann.
Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., & Salakoski, T. (2007). BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics, 8, 50.
Pyysalo, S., Ginter, F., Pahikkala, T., Boberg, J., Järvinen, J., & Salakoski, T. (2006). Evaluation of two dependency parsers on biomedical corpus targeted at protein-protein interactions. Recent Advances in Natural Language Processing for Biomedical Applications, special issue of the International Journal of Medical Informatics, 75(6), 430–442.
Quiñonero-Candela, J., & Rasmussen, C. E. (2005). A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research, 6, 1939–1959.
Quiñonero-Candela, J., Rasmussen, C. E., & Williams, C. K. I. (2007). Approximation methods for Gaussian process regression. In L. Bottou, O. Chapelle, D. DeCoste, & J. Weston (Eds.), Large-scale kernel machines (pp. 203–224). Cambridge: MIT Press.
Rakotomamonjy, A. (2004). Optimizing area under ROC curve with SVMs. In J. Hernández-Orallo, C. Ferri, N. Lachiche, & P. A. Flach (Eds.), Proceedings of the 1st international workshop on ROC analysis in artificial intelligence (pp. 71–80).
Rifkin, R. (2002). Everything old is new again: a fresh look at historical approaches in machine learning. Ph.D. thesis, Massachusetts Institute of Technology.
Rifkin, R., & Klautau, A. (2004). In defense of one-vs-all classification. Journal of Machine Learning Research, 5, 101–141.
Rifkin, R., & Lippert, R. (2007a). Notes on regularized least squares (Technical Report MIT-CSAIL-TR-2007-025). Massachusetts Institute of Technology.
Rifkin, R., & Lippert, R. (2007b). Value regularization and Fenchel duality. Journal of Machine Learning Research, 8, 441–479.
Rifkin, R., Yeo, G., & Poggio, T. (2003). Regularized least-squares classification. In J. Suykens, G. Horvath, S. Basu, C. Micchelli, & J. Vandewalle (Eds.), NATO science series III: computer and system sciences: Vol. 190. Advances in learning theory: methods, model and applications (pp. 131–154). Amsterdam: IOS Press. Chap. 7.
Schölkopf, B., Herbrich, R., & Smola, A. J. (2001). A generalized representer theorem. In D. Helmbold & R. Williamson (Eds.), Proceedings of the 14th annual conference on computational learning theory and 5th European conference on computational learning theory (pp. 416–426). Berlin: Springer.
Schölkopf, B., Mika, S., Burges, C., Knirsch, P., Müller, K.-R., Rätsch, G., & Smola, A. (1999). Input space versus feature space in kernel-based methods. IEEE Transactions On Neural Networks, 10(5), 1000–1017.
Shewchuk, J. R. (1994). An introduction to the conjugate gradient method without the agonizing pain (Technical Report CMU-CS-94-125). Carnegie Mellon University, Pittsburgh, PA, USA.
Sleator, D. D., & Temperley, D. (1991). Parsing English with a link grammar (Technical Report CMU-CS-91-196). Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
Smola, A. J., & Schölkopf, B. (2000). Sparse greedy matrix approximation for machine learning. In P. Langley (Ed.), Proceedings of the seventeenth international conference on machine learning (pp. 911–918). San Mateo: Morgan Kaufmann.
Sonnenburg, S., Braun, M. L., Ong, C. S., Bengio, S., Bottou, L., Holmes, G., Lecun, Y., Müller, K. R., Pereira, F., Rasmussen, C. E., Rätsch, G., Schölkopf, B., Smola, A., Vincent, P., Weston, J., & Williamson, R. (2007). The need for open source software in machine learning. Journal of Machine Learning Research, 8, 2443–2466.
Suykens, J. A. K., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9(3), 293–300.
Tsivtsivadze, E., Pahikkala, T., Airola, A., Boberg, J., & Salakoski, T. (2008). A sparse regularized least-squares preference learning algorithm. In A. Holst, P. Kreuger, & P. Funk (Eds.), Frontiers in artificial intelligence and applications: Vol. 173. Proceedings of the 10th Scandinavian conference on artificial intelligence SCAI, 2008 (pp. 76–83). Amsterdam: IOS Press.
Tsivtsivadze, E., Pahikkala, T., Pyysalo, S., Boberg, J., Mylläri, A., & Salakoski, T. (2005). Regularized least-squares for parse ranking. In A. F. Famili, J. N. Kok, J. M. Peña, A. Siebes, & A. J. Feelders (Eds.), Lecture notes in computer science: Vol. 3646. Proceedings of the 6th international symposium on intelligent data analysis (pp. 464–474). Berlin: Springer.
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics, 1, 80–83.
Zhang, P., & Peng, J. (2004). SVM vs regularized least squares classification. In J. Kittler, M. Petrou, & M. Nixon (Eds.), Proceedings of the 17th international conference on pattern recognition ICPR’04 (Vol. 1, pp. 176–179). Los Alamitos: IEEE Computer Society.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Thomas Gärtner and Gemma C. Garriga.
Rights and permissions
About this article
Cite this article
Pahikkala, T., Tsivtsivadze, E., Airola, A. et al. An efficient algorithm for learning to rank from preference graphs. Mach Learn 75, 129–165 (2009). https://doi.org/10.1007/s10994-008-5097-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-008-5097-z