# Cautious classification with nested dichotomies and imprecise probabilities

- 96 Downloads

## Abstract

In some applications of machine learning and information retrieval (e.g. medical diagnosis, image recognition, pre-classification...), it can be preferable to provide less informative but more reliable predictions. This can be done by making partial predictions in the form of class subsets when the available information is insufficient to provide a reliable unique class. Imprecise probabilistic approaches offer nice tools to learn models from which such cautious predictions can be produced. However, the learning and inference processes of such models are computationally harder than their precise counterparts. In this paper, we introduce and study a particular binary decomposition strategy, nested dichotomies, that offer computational advantages in both the learning (due to the binarization process) and the inference (due to the decomposition strategy) processes. We show with experiments that these computational advantages do not lower the performances of the classifiers, and can even improve them when the class space has some structure.

## Keywords

Multi-class classification Binary decomposition Imprecise probabilities Indeterminate prediction Ordinal regression## Notes

### Acknowledgments

This work was carried out in the framework of the Labex MS2T, which was funded by the French Government, through the program “Investments for the future” managed by the National Agency for Research (Reference ANR-11-IDEX-0004-02).

### Compliance with ethical standards

#### Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

## References

- Abellán J, Masegosa A (2012) Imprecise classification with credal decision trees. Int J Uncertain Fuzziness Knowl Based Syst 20(05):763–787CrossRefMathSciNetGoogle Scholar
- Allwein E, Schapire R, Singer Y, Kaelbling P (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1:113–141MATHMathSciNetGoogle Scholar
- Alonso J, Del Coz J, Díez J, Luaces O, Bahamonde A (2008) Learning to predict one or more ranks in ordinal regression tasks. In: Conference on machine learning and knowledge discovery in databases. Springer, pp 39–54Google Scholar
- Bengio S, Weston J, Grangier D (2010) Label embedding trees for large multi-class tasks. NIPS 23:3Google Scholar
- Bernard J-M (2005) An introduction to the imprecise dirichlet model for multinomial data. Intl J Approx Reason 39(2–3):123–150CrossRefMATHMathSciNetGoogle Scholar
- Cesa-Bianchi N, Freund Y, Haussler D, Helmbold DP, Schapire RE, Warmuth MK (1997) How to use expert advice. JACM 44(3):427–485CrossRefMATHMathSciNetGoogle Scholar
- Chow C (1970) An optimum recognition error and reject tradeoff. IEEE Trans Inf Theory 16(1):41–46CrossRefMATHGoogle Scholar
- Corani G, Antonucci A, De Rosa R (2012) Compression-based AODE classifiers. In: European conference on artificial intelligence, pp 264–269Google Scholar
- Corani G, Mignatti A (2015) Credal model averaging for classification: representing prior ignorance and expert opinions. Intl J Approx Reason 56:264–277CrossRefMATHMathSciNetGoogle Scholar
- Corani G, Zaffalon M (2008) Credal model averaging: an extension of bayesian model averaging to imprecise probabilities. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 257–271Google Scholar
- De Cooman G, Hermans F (2008) Imprecise probability trees: bridging two theories of imprecise probability. Artif Intell 172:1400–1427CrossRefMATHMathSciNetGoogle Scholar
- del Coz J, Bahamonde A (2009) Learning nondeterministic classifiers. J Mach Learn Res 10:2273–2293MATHMathSciNetGoogle Scholar
- Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MATHMathSciNetGoogle Scholar
- Destercke S, Quost B (2011) Combining binary classifiers with imprecise probabilities. In: Proceedings of the 2011 international conference on Integrated uncertainty in knowledge modelling and decision making, IUKM’11. Springer, Berlin, pp 219–230Google Scholar
- Dietterich T, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2:263–286MATHGoogle Scholar
- Elkan C (2001) The foundations of cost-sensitive learning. Int Jt Conf Artif Intell 17:973–978Google Scholar
- Fayyad U, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: IJCAI, pp 1022–1029Google Scholar
- Fox J (1997) Applied regression analysis, linear models, and related methods. Sage, Beverly HillsGoogle Scholar
- Frank E, Hall M (2001) A simple approach to ordinal classification. In Proceedings of the 12th European conference on machine learning. Springer, pp 145–156Google Scholar
- Frank E, Kramer S (2004) Ensembles of nested dichotomies for multi-class problems. In: ICML 2004, p 39Google Scholar
- Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701CrossRefMATHGoogle Scholar
- Grunbaum B, Perles MA, Shephard GC (1967) Convex polytopes. Springer, BerlinMATHGoogle Scholar
- Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 26:451–471CrossRefMATHMathSciNetGoogle Scholar
- Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, BerlinCrossRefMATHGoogle Scholar
- Levi I (1983) The enterprise of knowledge: an essay on knowledge, credal probability, and chance. MIT Press, CambridgeGoogle Scholar
- Lichman M (2014) UCI machine learning repository. http://archive.ics.uci.edu/ml
- Lorena AC, De Carvalho A (2010) Building binary-tree-based multiclass classifiers using separability measures. Neurocomputing 73(16–18):2837–2845CrossRefGoogle Scholar
- Mantas C, Abellan J (2014) Credal-c4.5: decision tree based on imprecise probabilities to classify noisy data. Expert Syst Appl 41(10):4625–4637CrossRefGoogle Scholar
- Masnadi-Shirazi H, Vasconcelos N (2010) Risk minimization, probability elicitation, and cost-sensitive SVMs. In: International conference machine learning, pp 759–766Google Scholar
- Nemenyi P (1963) Distribution-free multiple comparisons. Ph.D. thesis, Princeton UniversityGoogle Scholar
- Pedregosa F (2013) Logistic ordinal regression. https://github.com/fabianp/minirank/tree/master/minirank
- Rokach L (2006) Decomposition methodology for classification tasks: a meta decomposer framework. Pattern Anal Appl 9(2–3):257–271CrossRefMathSciNetGoogle Scholar
- Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2):1–39Google Scholar
- Troffaes M (2007) Decision making under uncertainty using imprecise probabilities. Int J Approx Reason 45(1):17–29CrossRefMATHMathSciNetGoogle Scholar
- Walley P (1991) Statistical reasoning with imprecise probabilities. Chapman and Hall, LondonCrossRefMATHGoogle Scholar
- Walley P (1996) Inferences from multinomial data: learning about a bag of marbles. J Roy Stat Soc Ser B Methodol 58(1):3–57Google Scholar
- Wu T, Lin C, Weng R (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5:975–1005MATHMathSciNetGoogle Scholar
- Xu P, Davoine F, Zha H, Denoeux T (2015) Evidential calibration of binary svm classifiers. Int J Approx Reason 72:55–70Google Scholar
- Yang G, Destercke S, Masson M-H (2014) Nested dichotomies with probability sets for multi-class classification. In: European conference on artificial intelligenceGoogle Scholar
- Zaffalon M (2002) The naive credal classifier. J Stat Plann Inference 105(1):5–21CrossRefMATHMathSciNetGoogle Scholar
- Zaffalon M, Corani G, Maua D (2012) Evaluating credal classifiers by utility-discounted predictive accuracy. Int J Approx Reason 53(8):1282–1301CrossRefMATHMathSciNetGoogle Scholar