Machine Learning

, Volume 107, Issue 8–10, pp 1537–1560 | Cite as

On the effectiveness of heuristics for learning nested dichotomies: an empirical analysis

  • Vitalik Melnikov
  • Eyke HüllermeierEmail author
Part of the following topical collections:
  1. Special Issue of the ECML PKDD 2018 Journal Track


In machine learning, so-called nested dichotomies are utilized as a reduction technique, i.e., to decompose a multi-class classification problem into a set of binary problems, which are solved using a simple binary classifier as a base learner. The performance of the (multi-class) classifier thus produced strongly depends on the structure of the decomposition. In this paper, we conduct an empirical study, in which we compare existing heuristics for selecting a suitable structure in the form of a nested dichotomy. Moreover, we propose two additional heuristics as natural completions. One of them is the Best-of-K heuristic, which picks the (presumably) best among K randomly generated nested dichotomies. Surprisingly, and in spite of its simplicity, it turns out to outperform the state of the art.


Nested dichotomies Multi-class classification Decomposition method 



This work has been conducted as part of the Collaborative Research Center “On-the-Fly Computing” (SFB 901) at Paderborn University, which is supported by the German Research Foundation (DFG).


  1. Dietterich, T., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.CrossRefzbMATHGoogle Scholar
  2. Ding, Y., & Simonoff, J. S. (2010). An investigation of missing data methods for classification trees applied to binary response data. Journal of Machine Learning Research, 11(Jan), 131–170.MathSciNetzbMATHGoogle Scholar
  3. Dong, L., Frank, E., & Kramer, S. (2005). Ensembles of balanced nested dichotomies for multi-class problems. Knowledge discovery in databases, Lecture Notes in computer science (Vol. 3721, pp. 84–95). Berlin and Heidelberg and New York: Springer.Google Scholar
  4. Duarte-Villaseñor, M. M., Carrasco-Ochoa, J. A., Martínez-Trinidad, J. F., & Flores-Garrido, M. (2012). Nested dichotomies based on clustering. In Progress in pattern recognition, image analysis, computer vision, and applications: 17th iberoamerican congress, CIARP 2012, Buenos Aires, Argentina, September 3–6, 2012. Proceedings (pp. 162–169). Berlin Heidelberg, Berlin, Heidelberg: Springer.Google Scholar
  5. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F. (2015). Efficient and robust automated machine learning. In Advances in neural information processing systems (pp. 2962–2970).Google Scholar
  6. Frank, E., & Kramer, S. (2004). Ensembles of nested dichotomies for multi-class problems. In Proceedings of the twenty-first international conference on machine learning, ICML ’04. New York: ACM.Google Scholar
  7. Furnas, G. W. (1984). The generation of random, binary unordered trees. Journal of Classification, 1(1), 187–233.MathSciNetCrossRefzbMATHGoogle Scholar
  8. Fürnkranz, J. (2002). Round robin classification. Journal of Machine Learning Research, 2, 721–747.MathSciNetzbMATHGoogle Scholar
  9. Leathart, T., Pfahringer, B., & Frank, E. (2016). Building ensembles of adaptive nested dichotomies with random-pair selection. In Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2016, Riva del Garda, Italy, September 19–23, 2016, Proceedings, Part II (pp. 179–194). Springer International Publishing.Google Scholar
  10. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.MathSciNetzbMATHGoogle Scholar
  11. Rifkin, R., & Klautau, A. (2004). In defense of one-vs-all classification. Journal of Machine Learning Research, 5, 101–141.MathSciNetzbMATHGoogle Scholar
  12. Rodríguez, J. J., García-Osorio, C., & Maudes, J. (2010). Forests of nested dichotomies. Pattern Recognition Letters, 31(2), 125–132.CrossRefGoogle Scholar
  13. Rohlf, F. J. (1983). Numbering binary trees with labeled terminal vertices. Bulletin of Mathematical Biology, 45(1), 33–40.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Sokal, R. R. (1958). A statistical method for evaluating systematic relationship. University of Kansas Science Bulletin, 28, 1409–1438.Google Scholar
  15. Stanley, R. P., & Fomin, S. (1999). Enumerative combinatorics, Cambridge studies in advanced mathematics (Vol. 2). Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  16. Thornton, C., Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2013). Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In The 19th ACM SIGKDD international conference on knowledge discovery and data mining, KDD 2013 (pp. 847–855). Chicago, IL, USA.Google Scholar
  17. Vanschoren, J., van Rijn, J. N., Bischl, B., & Torgo, L. (2013). Openml: Networked science in machine learning. SIGKDD Explorations, 15(2), 49–60.CrossRefGoogle Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.Department of Computer SciencePaderborn UniversityPaderbornGermany

Personalised recommendations