Soft Computing

, Volume 21, Issue 24, pp 7447–7462 | Cite as

Cautious classification with nested dichotomies and imprecise probabilities

  • Gen Yang
  • Sébastien Destercke
  • Marie-Hélène Masson
Methodologies and Application


In some applications of machine learning and information retrieval (e.g. medical diagnosis, image recognition, pre-classification...), it can be preferable to provide less informative but more reliable predictions. This can be done by making partial predictions in the form of class subsets when the available information is insufficient to provide a reliable unique class. Imprecise probabilistic approaches offer nice tools to learn models from which such cautious predictions can be produced. However, the learning and inference processes of such models are computationally harder than their precise counterparts. In this paper, we introduce and study a particular binary decomposition strategy, nested dichotomies, that offer computational advantages in both the learning (due to the binarization process) and the inference (due to the decomposition strategy) processes. We show with experiments that these computational advantages do not lower the performances of the classifiers, and can even improve them when the class space has some structure.


Multi-class classification Binary decomposition Imprecise probabilities Indeterminate prediction Ordinal regression 



This work was carried out in the framework of the Labex MS2T, which was funded by the French Government, through the program “Investments for the future” managed by the National Agency for Research (Reference ANR-11-IDEX-0004-02).

Compliance with ethical standards

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.


  1. Abellán J, Masegosa A (2012) Imprecise classification with credal decision trees. Int J Uncertain Fuzziness Knowl Based Syst 20(05):763–787CrossRefMathSciNetGoogle Scholar
  2. Allwein E, Schapire R, Singer Y, Kaelbling P (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1:113–141zbMATHMathSciNetGoogle Scholar
  3. Alonso J, Del Coz J, Díez J, Luaces O, Bahamonde A (2008) Learning to predict one or more ranks in ordinal regression tasks. In: Conference on machine learning and knowledge discovery in databases. Springer, pp 39–54Google Scholar
  4. Bengio S, Weston J, Grangier D (2010) Label embedding trees for large multi-class tasks. NIPS 23:3Google Scholar
  5. Bernard J-M (2005) An introduction to the imprecise dirichlet model for multinomial data. Intl J Approx Reason 39(2–3):123–150CrossRefzbMATHMathSciNetGoogle Scholar
  6. Cesa-Bianchi N, Freund Y, Haussler D, Helmbold DP, Schapire RE, Warmuth MK (1997) How to use expert advice. JACM 44(3):427–485CrossRefzbMATHMathSciNetGoogle Scholar
  7. Chow C (1970) An optimum recognition error and reject tradeoff. IEEE Trans Inf Theory 16(1):41–46CrossRefzbMATHGoogle Scholar
  8. Corani G, Antonucci A, De Rosa R (2012) Compression-based AODE classifiers. In: European conference on artificial intelligence, pp 264–269Google Scholar
  9. Corani G, Mignatti A (2015) Credal model averaging for classification: representing prior ignorance and expert opinions. Intl J Approx Reason 56:264–277CrossRefzbMATHMathSciNetGoogle Scholar
  10. Corani G, Zaffalon M (2008) Credal model averaging: an extension of bayesian model averaging to imprecise probabilities. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 257–271Google Scholar
  11. De Cooman G, Hermans F (2008) Imprecise probability trees: bridging two theories of imprecise probability. Artif Intell 172:1400–1427CrossRefzbMATHMathSciNetGoogle Scholar
  12. del Coz J, Bahamonde A (2009) Learning nondeterministic classifiers. J Mach Learn Res 10:2273–2293zbMATHMathSciNetGoogle Scholar
  13. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30zbMATHMathSciNetGoogle Scholar
  14. Destercke S, Quost B (2011) Combining binary classifiers with imprecise probabilities. In: Proceedings of the 2011 international conference on Integrated uncertainty in knowledge modelling and decision making, IUKM’11. Springer, Berlin, pp 219–230Google Scholar
  15. Dietterich T, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2:263–286zbMATHGoogle Scholar
  16. Elkan C (2001) The foundations of cost-sensitive learning. Int Jt Conf Artif Intell 17:973–978Google Scholar
  17. Fayyad U, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: IJCAI, pp 1022–1029Google Scholar
  18. Fox J (1997) Applied regression analysis, linear models, and related methods. Sage, Beverly HillsGoogle Scholar
  19. Frank E, Hall M (2001) A simple approach to ordinal classification. In Proceedings of the 12th European conference on machine learning. Springer, pp 145–156Google Scholar
  20. Frank E, Kramer S (2004) Ensembles of nested dichotomies for multi-class problems. In: ICML 2004, p 39Google Scholar
  21. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701CrossRefzbMATHGoogle Scholar
  22. Grunbaum B, Perles MA, Shephard GC (1967) Convex polytopes. Springer, BerlinzbMATHGoogle Scholar
  23. Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 26:451–471CrossRefzbMATHMathSciNetGoogle Scholar
  24. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, BerlinCrossRefzbMATHGoogle Scholar
  25. Levi I (1983) The enterprise of knowledge: an essay on knowledge, credal probability, and chance. MIT Press, CambridgeGoogle Scholar
  26. Lichman M (2014) UCI machine learning repository.
  27. Lorena AC, De Carvalho A (2010) Building binary-tree-based multiclass classifiers using separability measures. Neurocomputing 73(16–18):2837–2845CrossRefGoogle Scholar
  28. Mantas C, Abellan J (2014) Credal-c4.5: decision tree based on imprecise probabilities to classify noisy data. Expert Syst Appl 41(10):4625–4637CrossRefGoogle Scholar
  29. Masnadi-Shirazi H, Vasconcelos N (2010) Risk minimization, probability elicitation, and cost-sensitive SVMs. In: International conference machine learning, pp 759–766Google Scholar
  30. Nemenyi P (1963) Distribution-free multiple comparisons. Ph.D. thesis, Princeton UniversityGoogle Scholar
  31. Pedregosa F (2013) Logistic ordinal regression.
  32. Rokach L (2006) Decomposition methodology for classification tasks: a meta decomposer framework. Pattern Anal Appl 9(2–3):257–271CrossRefMathSciNetGoogle Scholar
  33. Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2):1–39Google Scholar
  34. Troffaes M (2007) Decision making under uncertainty using imprecise probabilities. Int J Approx Reason 45(1):17–29CrossRefzbMATHMathSciNetGoogle Scholar
  35. Walley P (1991) Statistical reasoning with imprecise probabilities. Chapman and Hall, LondonCrossRefzbMATHGoogle Scholar
  36. Walley P (1996) Inferences from multinomial data: learning about a bag of marbles. J Roy Stat Soc Ser B Methodol 58(1):3–57Google Scholar
  37. Wu T, Lin C, Weng R (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5:975–1005zbMATHMathSciNetGoogle Scholar
  38. Xu P, Davoine F, Zha H, Denoeux T (2015) Evidential calibration of binary svm classifiers. Int J Approx Reason 72:55–70Google Scholar
  39. Yang G, Destercke S, Masson M-H (2014) Nested dichotomies with probability sets for multi-class classification. In: European conference on artificial intelligenceGoogle Scholar
  40. Zaffalon M (2002) The naive credal classifier. J Stat Plann Inference 105(1):5–21CrossRefzbMATHMathSciNetGoogle Scholar
  41. Zaffalon M, Corani G, Maua D (2012) Evaluating credal classifiers by utility-discounted predictive accuracy. Int J Approx Reason 53(8):1282–1301CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Gen Yang
    • 1
  • Sébastien Destercke
    • 1
  • Marie-Hélène Masson
    • 2
  1. 1.UMR CNRS 7253 HeudiasycGalileo Galilei, Sorbonne Universités, Université de Technologie de CompiégneCompiègne cedexFrance
  2. 2.UMR CNRS 7253 HeudiasycUniversité de Picardie Jules Verne, Galileo Galilei, Sorbonne Universités, Université de Technologie de CompiégneCompiègne cedexFrance

Personalised recommendations