Abstract
In ordinal regression, a score function and threshold values are sought to classify a set of objects into a set of ranked classes. Classifying an individual in a class with higher (respectively lower) rank than its actual rank is called an upgrading (respectively downgrading) error. Since upgrading and downgrading errors may not have the same importance, they should be considered as two different criteria to be taken into account when measuring the quality of a classifier. In Support Vector Machines, margin maximization is used as an effective and computationally tractable surrogate of the minimization of misclassification errors. As an extension, we consider in this paper the maximization of upgrading and downgrading margins as a surrogate of the minimization of upgrading and downgrading errors, and we address the biobjective problem of finding a classifier maximizing simultaneously the two margins. The whole set of Pareto-optimal solutions of such biobjective problem is described as translations of the optimal solutions of a scalar optimization problem. For the most popular case in which the Euclidean norm is considered, the scalar problem has a unique solution, yielding that all the Pareto-optimal solutions of the biobjective problem are translations of each other. Hence, the Pareto-optimal solutions can easily be provided to the analyst, who, after inspection of the misclassification errors caused, should choose in a later stage the most convenient classifier. The consequence of this analysis is that it provides a theoretical foundation for a popular strategy among practitioners, based on the so-called ROC curve, which is shown here to equal the set of Pareto-optimal solutions of maximizing simultaneously the downgrading and upgrading margins.
Similar content being viewed by others
References
Adams NM, Hands DJ (1999) Comparing classifiers when the miallocation costs are uncertain. Pattern Recognit 32: 1139–1147
Allwein EL, Schapire RE, Singer Y (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1: 113–141
Ballarino G, Bernardi F, Requena M, Schadee H (2009) Persistent inequalities? expansion of education and class inequality in Italy and Spain. Eur Sociol Rev 25(1): 123–138
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30: 1145–1159
Bredensteiner E, Bennet K (1999) Multicategory classification by support vector machines. Comput Opt Appl 12: 53–79
Cardoso JS, da Costa JF Pinto, Cardoso MJ (2005) Modelling ordinal relations with SVMs: an application to objective aesthetic evalutaion of breast cancer conservative treatment. Neural Netw 18: 808–817
Carrizosa E (2006) Deriving weights in multiple-criteria decision making with support vector machines. TOP 14(2): 399–424
Carrizosa E (2008) Support vector machines and distance minimization. In: Pardalos PM, Hansen P (eds) Data mining and mathematical programming. AMS, New York, pp 2–20
Carrizosa E, Martín-Barragán B (2006) Two-group classification via a biobjective margin maximization model. Eur J Oper Res 173(3): 746–761
Carrizosa E, Martín-Barragán B, Morales D Romero (2008) Multi-group support vector machines with measurement costs: a biobjective approach. Discret Appl Math 156(6): 950–966
Chu W, Keerthi SS (2007) Support vector ordinal regression. Neural Comput 19(3): 792–815
Cortes C, Vapnik V (1995) Support-vector network. Mach Learn 20: 273–297
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge
Dembczyński K, Kotłowski W (2009) Decision rule-based algorithm for ordinal classification based on rank loss minimization. In: Preference learning, ECML/PKDD workshop
Dembczyński K, Kotłowski W, Słowiński R (2008) Ordinal classification with decision rules. In: Proceedings of the 3rd ECML/PKDD international conference on Mining complex data. MCD’07. Springer, Berlin, pp 169–181
Ehrgott M, Gandibleaux X (eds) (2002) Multiple criteria optimization. State of the art annotated bibliographic surveys, volume 52 of international series in operations research and management science. Kluwer Academic Publishers, Boston
Everson RM, Fieldsend JE (2006) Multi-class ROC analysis from a multi-objective optimisation perspective. Pattern Recogn Lett 27(8): 918–927
Grigoroudis E, Nikolopoulou G, Zopounidis C (2008) Customer satisfaction barometers and economic development: An explorative ordinal regression analysis. Total Qual Manag Bus Excell 19(5): 441–460
Guermeur Y (2002) Combining discriminant models with multi-class SVMs. Pattern Anal Appl 5: 168–179
Hand DJ, Till RJ (2001) A simple generalisation of the area under the roc curve for multiple class classification problems. Mach Learn 45(2): 171–186
Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 26(2): 451–471
Herbrich R (2002) Learning theory classifiers. Theory and algorithms. MIT Press, Cambridge
Herbrich R, Graepel T, Obermayer K (1999) Support vector learning for ordinal regression. In: In Ninth international conference on artificial neural networks ICANN, vol. 17, pp 97–102
Igel C (2005) Multi-objective model selection for supprot vector machines. In: Evolution multi-criterion optimization. Lecture notes in computer sciences, vol. 3410, pp 534–546
Jiao T, Peng J, Terlaky T (2009) A confidence voting process for ranking problems based on support vector machines. Ann Oper Res 166: 23–38
Jin Y, Sendhoff B (2008) Pareto-based multiobjective machine learning: an overview and case studies. IEEE Trans Syst Man Cybern Part C Appl Rev 38(3): 397–415
Kupinski MA, Anastasio MA (1999) Multiobjective genetic optimization of diagnostic classifiers with implications for generating receiver operating characteristic curves. IEEE Trans Med Imaging 18(8): 675–685
Lall R, Campbell MJ, Walters SJ, Morgan K (2002) A review of ordinal regression models applied on health-related quality of life assessments. Stat Methods Med Res 11(1): 49–67
Li L, Lin HT (2007) Ordinal regression by extended binary classification. In: Schölkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems, vol. 19. MIT Press, Cambridge, pp 865–872
Lin HT, Li L (2006) Large-margin thresholded ensembles for ordinal regression: theory and practice. In: Algorithmic learning theory: ALT 2006. Lecture notes in computer sciences, vol. 4264, Springer, Berlin, pp 319–333
Lin HT, Li L (2009) Combining ordinal preferences by boosting. In: Second preference learning workshop at ECML/PKDD’09
Mangasarian OL (1965) Linear and nonlinear separation of patterns by linear programming. Oper Res 13: 444–452
Mercer J (1909) Functions of positive and negative type and their connection with the theory of integral equations. Philos Trans Royal Soc Lond A 209: 415–446
Nakayama H, Yun YB, Asada T, Yoon M (2005) MOP/GP models for machine learning. Eur J Oper Res 166: 756–768
Pedroso JP, Murata N (2001) Support vector machines with different norms: motivation, formulations and results. Pattern Recognit Lett 22: 1263–1272
Plastria F (2009) Asymmetric distances, semidirected networks and majority in Fermat-Weber problems. Ann Oper Res 167(1): 121–155
Platt JC, Cristianini N, Shawe-Taylor J (2000) Large margin DAGs for multiclass classification. Adv Neural Inform Process Syst 12: 547–553
Rennie JDM, Srebro N (2005) Loss functions for preference levels: regression with discrete ordered labels. In: Proceedings of the IJCAI multidisciplinary workshop on advances in preference handling
Shashua A, Levin A (2003) Ranking with large margin principle: two approaches. In: Thrun S, Becker S, Obermayer K (eds) Advances in Neural Information Processing Systems, volume 15. MIT Press, Cambridge, pp 937–944
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Tatsumi K, Hayashida K, Higashi H, Tanino T (2007) Multi-objective multiclass support vector machine for pattern recognition. SICE, 2007. Annual Conference, pp 1095–1098
Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin
Vapnik V (1998) Statistical learning theory. Wiley, New York
Waegeman W, De Baets B, Boullart L (2008) Roc analysis in ordinal regression learning. Pattern Reognit Lett 29(1): 1–9
Weston J, Watkins C (1999) Multi-class support vector machines. In: Proceedings of ESANN99. D. Facto Press, Brussels
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was partially supported by project MTM2009-14039, ECO2008-05080 of Ministerio de Educación y Ciencia (Spain), FQM-329 of Plan Andaluz de Investigación (Andalucía, Spain) and CCG07-UC3M/ESP-3389 of the Comunidad de Madrid (Spain).
Rights and permissions
About this article
Cite this article
Carrizosa, E., Martin-Barragan, B. Maximizing upgrading and downgrading margins for ordinal regression. Math Meth Oper Res 74, 381–407 (2011). https://doi.org/10.1007/s00186-011-0368-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00186-011-0368-z