Skip to main content
Log in

Maximizing upgrading and downgrading margins for ordinal regression

  • Original Article
  • Published:
Mathematical Methods of Operations Research Aims and scope Submit manuscript

Abstract

In ordinal regression, a score function and threshold values are sought to classify a set of objects into a set of ranked classes. Classifying an individual in a class with higher (respectively lower) rank than its actual rank is called an upgrading (respectively downgrading) error. Since upgrading and downgrading errors may not have the same importance, they should be considered as two different criteria to be taken into account when measuring the quality of a classifier. In Support Vector Machines, margin maximization is used as an effective and computationally tractable surrogate of the minimization of misclassification errors. As an extension, we consider in this paper the maximization of upgrading and downgrading margins as a surrogate of the minimization of upgrading and downgrading errors, and we address the biobjective problem of finding a classifier maximizing simultaneously the two margins. The whole set of Pareto-optimal solutions of such biobjective problem is described as translations of the optimal solutions of a scalar optimization problem. For the most popular case in which the Euclidean norm is considered, the scalar problem has a unique solution, yielding that all the Pareto-optimal solutions of the biobjective problem are translations of each other. Hence, the Pareto-optimal solutions can easily be provided to the analyst, who, after inspection of the misclassification errors caused, should choose in a later stage the most convenient classifier. The consequence of this analysis is that it provides a theoretical foundation for a popular strategy among practitioners, based on the so-called ROC curve, which is shown here to equal the set of Pareto-optimal solutions of maximizing simultaneously the downgrading and upgrading margins.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Adams NM, Hands DJ (1999) Comparing classifiers when the miallocation costs are uncertain. Pattern Recognit 32: 1139–1147

    Article  Google Scholar 

  • Allwein EL, Schapire RE, Singer Y (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1: 113–141

    MathSciNet  Google Scholar 

  • Ballarino G, Bernardi F, Requena M, Schadee H (2009) Persistent inequalities? expansion of education and class inequality in Italy and Spain. Eur Sociol Rev 25(1): 123–138

    Article  Google Scholar 

  • Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30: 1145–1159

    Article  Google Scholar 

  • Bredensteiner E, Bennet K (1999) Multicategory classification by support vector machines. Comput Opt Appl 12: 53–79

    Article  MATH  Google Scholar 

  • Cardoso JS, da Costa JF Pinto, Cardoso MJ (2005) Modelling ordinal relations with SVMs: an application to objective aesthetic evalutaion of breast cancer conservative treatment. Neural Netw 18: 808–817

    Article  Google Scholar 

  • Carrizosa E (2006) Deriving weights in multiple-criteria decision making with support vector machines. TOP 14(2): 399–424

    Article  MathSciNet  MATH  Google Scholar 

  • Carrizosa E (2008) Support vector machines and distance minimization. In: Pardalos PM, Hansen P (eds) Data mining and mathematical programming. AMS, New York, pp 2–20

    Google Scholar 

  • Carrizosa E, Martín-Barragán B (2006) Two-group classification via a biobjective margin maximization model. Eur J Oper Res 173(3): 746–761

    Article  MATH  Google Scholar 

  • Carrizosa E, Martín-Barragán B, Morales D Romero (2008) Multi-group support vector machines with measurement costs: a biobjective approach. Discret Appl Math 156(6): 950–966

    MATH  Google Scholar 

  • Chu W, Keerthi SS (2007) Support vector ordinal regression. Neural Comput 19(3): 792–815

    Article  MathSciNet  MATH  Google Scholar 

  • Cortes C, Vapnik V (1995) Support-vector network. Mach Learn 20: 273–297

    MATH  Google Scholar 

  • Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge

    Google Scholar 

  • Dembczyński K, Kotłowski W (2009) Decision rule-based algorithm for ordinal classification based on rank loss minimization. In: Preference learning, ECML/PKDD workshop

  • Dembczyński K, Kotłowski W, Słowiński R (2008) Ordinal classification with decision rules. In: Proceedings of the 3rd ECML/PKDD international conference on Mining complex data. MCD’07. Springer, Berlin, pp 169–181

  • Ehrgott M, Gandibleaux X (eds) (2002) Multiple criteria optimization. State of the art annotated bibliographic surveys, volume 52 of international series in operations research and management science. Kluwer Academic Publishers, Boston

  • Everson RM, Fieldsend JE (2006) Multi-class ROC analysis from a multi-objective optimisation perspective. Pattern Recogn Lett 27(8): 918–927

    Article  Google Scholar 

  • Grigoroudis E, Nikolopoulou G, Zopounidis C (2008) Customer satisfaction barometers and economic development: An explorative ordinal regression analysis. Total Qual Manag Bus Excell 19(5): 441–460

    Article  Google Scholar 

  • Guermeur Y (2002) Combining discriminant models with multi-class SVMs. Pattern Anal Appl 5: 168–179

    Article  MathSciNet  MATH  Google Scholar 

  • Hand DJ, Till RJ (2001) A simple generalisation of the area under the roc curve for multiple class classification problems. Mach Learn 45(2): 171–186

    Article  MATH  Google Scholar 

  • Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 26(2): 451–471

    Article  MathSciNet  MATH  Google Scholar 

  • Herbrich R (2002) Learning theory classifiers. Theory and algorithms. MIT Press, Cambridge

    Google Scholar 

  • Herbrich R, Graepel T, Obermayer K (1999) Support vector learning for ordinal regression. In: In Ninth international conference on artificial neural networks ICANN, vol. 17, pp 97–102

  • Igel C (2005) Multi-objective model selection for supprot vector machines. In: Evolution multi-criterion optimization. Lecture notes in computer sciences, vol. 3410, pp 534–546

  • Jiao T, Peng J, Terlaky T (2009) A confidence voting process for ranking problems based on support vector machines. Ann Oper Res 166: 23–38

    Article  MathSciNet  MATH  Google Scholar 

  • Jin Y, Sendhoff B (2008) Pareto-based multiobjective machine learning: an overview and case studies. IEEE Trans Syst Man Cybern Part C Appl Rev 38(3): 397–415

    Article  Google Scholar 

  • Kupinski MA, Anastasio MA (1999) Multiobjective genetic optimization of diagnostic classifiers with implications for generating receiver operating characteristic curves. IEEE Trans Med Imaging 18(8): 675–685

    Article  Google Scholar 

  • Lall R, Campbell MJ, Walters SJ, Morgan K (2002) A review of ordinal regression models applied on health-related quality of life assessments. Stat Methods Med Res 11(1): 49–67

    Article  MATH  Google Scholar 

  • Li L, Lin HT (2007) Ordinal regression by extended binary classification. In: Schölkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems, vol. 19. MIT Press, Cambridge, pp 865–872

    Google Scholar 

  • Lin HT, Li L (2006) Large-margin thresholded ensembles for ordinal regression: theory and practice. In: Algorithmic learning theory: ALT 2006. Lecture notes in computer sciences, vol. 4264, Springer, Berlin, pp 319–333

  • Lin HT, Li L (2009) Combining ordinal preferences by boosting. In: Second preference learning workshop at ECML/PKDD’09

  • Mangasarian OL (1965) Linear and nonlinear separation of patterns by linear programming. Oper Res 13: 444–452

    Article  MathSciNet  MATH  Google Scholar 

  • Mercer J (1909) Functions of positive and negative type and their connection with the theory of integral equations. Philos Trans Royal Soc Lond A 209: 415–446

    Article  MATH  Google Scholar 

  • Nakayama H, Yun YB, Asada T, Yoon M (2005) MOP/GP models for machine learning. Eur J Oper Res 166: 756–768

    Article  MathSciNet  MATH  Google Scholar 

  • Pedroso JP, Murata N (2001) Support vector machines with different norms: motivation, formulations and results. Pattern Recognit Lett 22: 1263–1272

    Article  MATH  Google Scholar 

  • Plastria F (2009) Asymmetric distances, semidirected networks and majority in Fermat-Weber problems. Ann Oper Res 167(1): 121–155

    Article  MathSciNet  Google Scholar 

  • Platt JC, Cristianini N, Shawe-Taylor J (2000) Large margin DAGs for multiclass classification. Adv Neural Inform Process Syst 12: 547–553

    Google Scholar 

  • Rennie JDM, Srebro N (2005) Loss functions for preference levels: regression with discrete ordered labels. In: Proceedings of the IJCAI multidisciplinary workshop on advances in preference handling

  • Shashua A, Levin A (2003) Ranking with large margin principle: two approaches. In: Thrun S, Becker S, Obermayer K (eds) Advances in Neural Information Processing Systems, volume 15. MIT Press, Cambridge, pp 937–944

    Google Scholar 

  • Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Tatsumi K, Hayashida K, Higashi H, Tanino T (2007) Multi-objective multiclass support vector machine for pattern recognition. SICE, 2007. Annual Conference, pp 1095–1098

  • Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin

    MATH  Google Scholar 

  • Vapnik V (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  • Waegeman W, De Baets B, Boullart L (2008) Roc analysis in ordinal regression learning. Pattern Reognit Lett 29(1): 1–9

    Article  Google Scholar 

  • Weston J, Watkins C (1999) Multi-class support vector machines. In: Proceedings of ESANN99. D. Facto Press, Brussels

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Belen Martin-Barragan.

Additional information

This research was partially supported by project MTM2009-14039, ECO2008-05080 of Ministerio de Educación y Ciencia (Spain), FQM-329 of Plan Andaluz de Investigación (Andalucía, Spain) and CCG07-UC3M/ESP-3389 of the Comunidad de Madrid (Spain).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carrizosa, E., Martin-Barragan, B. Maximizing upgrading and downgrading margins for ordinal regression. Math Meth Oper Res 74, 381–407 (2011). https://doi.org/10.1007/s00186-011-0368-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00186-011-0368-z

Keywords

Navigation