Skip to main content

Maximizing upgrading and downgrading margins for ordinal regression

Abstract

In ordinal regression, a score function and threshold values are sought to classify a set of objects into a set of ranked classes. Classifying an individual in a class with higher (respectively lower) rank than its actual rank is called an upgrading (respectively downgrading) error. Since upgrading and downgrading errors may not have the same importance, they should be considered as two different criteria to be taken into account when measuring the quality of a classifier. In Support Vector Machines, margin maximization is used as an effective and computationally tractable surrogate of the minimization of misclassification errors. As an extension, we consider in this paper the maximization of upgrading and downgrading margins as a surrogate of the minimization of upgrading and downgrading errors, and we address the biobjective problem of finding a classifier maximizing simultaneously the two margins. The whole set of Pareto-optimal solutions of such biobjective problem is described as translations of the optimal solutions of a scalar optimization problem. For the most popular case in which the Euclidean norm is considered, the scalar problem has a unique solution, yielding that all the Pareto-optimal solutions of the biobjective problem are translations of each other. Hence, the Pareto-optimal solutions can easily be provided to the analyst, who, after inspection of the misclassification errors caused, should choose in a later stage the most convenient classifier. The consequence of this analysis is that it provides a theoretical foundation for a popular strategy among practitioners, based on the so-called ROC curve, which is shown here to equal the set of Pareto-optimal solutions of maximizing simultaneously the downgrading and upgrading margins.

This is a preview of subscription content, access via your institution.

References

  1. Adams NM, Hands DJ (1999) Comparing classifiers when the miallocation costs are uncertain. Pattern Recognit 32: 1139–1147

    Article  Google Scholar 

  2. Allwein EL, Schapire RE, Singer Y (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1: 113–141

    MathSciNet  Google Scholar 

  3. Ballarino G, Bernardi F, Requena M, Schadee H (2009) Persistent inequalities? expansion of education and class inequality in Italy and Spain. Eur Sociol Rev 25(1): 123–138

    Article  Google Scholar 

  4. Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30: 1145–1159

    Article  Google Scholar 

  5. Bredensteiner E, Bennet K (1999) Multicategory classification by support vector machines. Comput Opt Appl 12: 53–79

    MATH  Article  Google Scholar 

  6. Cardoso JS, da Costa JF Pinto, Cardoso MJ (2005) Modelling ordinal relations with SVMs: an application to objective aesthetic evalutaion of breast cancer conservative treatment. Neural Netw 18: 808–817

    Article  Google Scholar 

  7. Carrizosa E (2006) Deriving weights in multiple-criteria decision making with support vector machines. TOP 14(2): 399–424

    MathSciNet  MATH  Article  Google Scholar 

  8. Carrizosa E (2008) Support vector machines and distance minimization. In: Pardalos PM, Hansen P (eds) Data mining and mathematical programming. AMS, New York, pp 2–20

    Google Scholar 

  9. Carrizosa E, Martín-Barragán B (2006) Two-group classification via a biobjective margin maximization model. Eur J Oper Res 173(3): 746–761

    MATH  Article  Google Scholar 

  10. Carrizosa E, Martín-Barragán B, Morales D Romero (2008) Multi-group support vector machines with measurement costs: a biobjective approach. Discret Appl Math 156(6): 950–966

    MATH  Google Scholar 

  11. Chu W, Keerthi SS (2007) Support vector ordinal regression. Neural Comput 19(3): 792–815

    MathSciNet  MATH  Article  Google Scholar 

  12. Cortes C, Vapnik V (1995) Support-vector network. Mach Learn 20: 273–297

    MATH  Google Scholar 

  13. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge

    Google Scholar 

  14. Dembczyński K, Kotłowski W (2009) Decision rule-based algorithm for ordinal classification based on rank loss minimization. In: Preference learning, ECML/PKDD workshop

  15. Dembczyński K, Kotłowski W, Słowiński R (2008) Ordinal classification with decision rules. In: Proceedings of the 3rd ECML/PKDD international conference on Mining complex data. MCD’07. Springer, Berlin, pp 169–181

  16. Ehrgott M, Gandibleaux X (eds) (2002) Multiple criteria optimization. State of the art annotated bibliographic surveys, volume 52 of international series in operations research and management science. Kluwer Academic Publishers, Boston

  17. Everson RM, Fieldsend JE (2006) Multi-class ROC analysis from a multi-objective optimisation perspective. Pattern Recogn Lett 27(8): 918–927

    Article  Google Scholar 

  18. Grigoroudis E, Nikolopoulou G, Zopounidis C (2008) Customer satisfaction barometers and economic development: An explorative ordinal regression analysis. Total Qual Manag Bus Excell 19(5): 441–460

    Article  Google Scholar 

  19. Guermeur Y (2002) Combining discriminant models with multi-class SVMs. Pattern Anal Appl 5: 168–179

    MathSciNet  MATH  Article  Google Scholar 

  20. Hand DJ, Till RJ (2001) A simple generalisation of the area under the roc curve for multiple class classification problems. Mach Learn 45(2): 171–186

    MATH  Article  Google Scholar 

  21. Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 26(2): 451–471

    MathSciNet  MATH  Article  Google Scholar 

  22. Herbrich R (2002) Learning theory classifiers. Theory and algorithms. MIT Press, Cambridge

    Google Scholar 

  23. Herbrich R, Graepel T, Obermayer K (1999) Support vector learning for ordinal regression. In: In Ninth international conference on artificial neural networks ICANN, vol. 17, pp 97–102

  24. Igel C (2005) Multi-objective model selection for supprot vector machines. In: Evolution multi-criterion optimization. Lecture notes in computer sciences, vol. 3410, pp 534–546

  25. Jiao T, Peng J, Terlaky T (2009) A confidence voting process for ranking problems based on support vector machines. Ann Oper Res 166: 23–38

    MathSciNet  MATH  Article  Google Scholar 

  26. Jin Y, Sendhoff B (2008) Pareto-based multiobjective machine learning: an overview and case studies. IEEE Trans Syst Man Cybern Part C Appl Rev 38(3): 397–415

    Article  Google Scholar 

  27. Kupinski MA, Anastasio MA (1999) Multiobjective genetic optimization of diagnostic classifiers with implications for generating receiver operating characteristic curves. IEEE Trans Med Imaging 18(8): 675–685

    Article  Google Scholar 

  28. Lall R, Campbell MJ, Walters SJ, Morgan K (2002) A review of ordinal regression models applied on health-related quality of life assessments. Stat Methods Med Res 11(1): 49–67

    MATH  Article  Google Scholar 

  29. Li L, Lin HT (2007) Ordinal regression by extended binary classification. In: Schölkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems, vol. 19. MIT Press, Cambridge, pp 865–872

    Google Scholar 

  30. Lin HT, Li L (2006) Large-margin thresholded ensembles for ordinal regression: theory and practice. In: Algorithmic learning theory: ALT 2006. Lecture notes in computer sciences, vol. 4264, Springer, Berlin, pp 319–333

  31. Lin HT, Li L (2009) Combining ordinal preferences by boosting. In: Second preference learning workshop at ECML/PKDD’09

  32. Mangasarian OL (1965) Linear and nonlinear separation of patterns by linear programming. Oper Res 13: 444–452

    MathSciNet  MATH  Article  Google Scholar 

  33. Mercer J (1909) Functions of positive and negative type and their connection with the theory of integral equations. Philos Trans Royal Soc Lond A 209: 415–446

    MATH  Article  Google Scholar 

  34. Nakayama H, Yun YB, Asada T, Yoon M (2005) MOP/GP models for machine learning. Eur J Oper Res 166: 756–768

    MathSciNet  MATH  Article  Google Scholar 

  35. Pedroso JP, Murata N (2001) Support vector machines with different norms: motivation, formulations and results. Pattern Recognit Lett 22: 1263–1272

    MATH  Article  Google Scholar 

  36. Plastria F (2009) Asymmetric distances, semidirected networks and majority in Fermat-Weber problems. Ann Oper Res 167(1): 121–155

    MathSciNet  Article  Google Scholar 

  37. Platt JC, Cristianini N, Shawe-Taylor J (2000) Large margin DAGs for multiclass classification. Adv Neural Inform Process Syst 12: 547–553

    Google Scholar 

  38. Rennie JDM, Srebro N (2005) Loss functions for preference levels: regression with discrete ordered labels. In: Proceedings of the IJCAI multidisciplinary workshop on advances in preference handling

  39. Shashua A, Levin A (2003) Ranking with large margin principle: two approaches. In: Thrun S, Becker S, Obermayer K (eds) Advances in Neural Information Processing Systems, volume 15. MIT Press, Cambridge, pp 937–944

    Google Scholar 

  40. Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

    Book  Google Scholar 

  41. Tatsumi K, Hayashida K, Higashi H, Tanino T (2007) Multi-objective multiclass support vector machine for pattern recognition. SICE, 2007. Annual Conference, pp 1095–1098

  42. Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin

    MATH  Google Scholar 

  43. Vapnik V (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  44. Waegeman W, De Baets B, Boullart L (2008) Roc analysis in ordinal regression learning. Pattern Reognit Lett 29(1): 1–9

    Article  Google Scholar 

  45. Weston J, Watkins C (1999) Multi-class support vector machines. In: Proceedings of ESANN99. D. Facto Press, Brussels

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Belen Martin-Barragan.

Additional information

This research was partially supported by project MTM2009-14039, ECO2008-05080 of Ministerio de Educación y Ciencia (Spain), FQM-329 of Plan Andaluz de Investigación (Andalucía, Spain) and CCG07-UC3M/ESP-3389 of the Comunidad de Madrid (Spain).

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Carrizosa, E., Martin-Barragan, B. Maximizing upgrading and downgrading margins for ordinal regression. Math Meth Oper Res 74, 381–407 (2011). https://doi.org/10.1007/s00186-011-0368-z

Download citation

Keywords

  • Multi objective optimization
  • Support Vector Machines
  • Ordinal regression