An empirical analysis of binary transformation strategies and base algorithms for multi-label learning

Abstract

Investigating strategies that are able to efficiently deal with multi-label classification tasks is a current research topic in machine learning. Many methods have been proposed, making the selection of the most suitable strategy a challenging issue. From this premise, this paper presents an extensive empirical analysis of the binary transformation strategies and base algorithms for multi-label learning. This subset of strategies uses the one-versus-all approach to transform the original data, generating one binary data set per label, upon which any binary base algorithm can be applied. Considering that the influence of the base algorithm on the predictive performance obtained by the strategies has not been considered in depth by many empirical studies, we investigated the influence of distinct base algorithms on the performance of several strategies. Thus, this study covers a family of multi-label strategies using a diversified range of base algorithms, exploring their relationship over different perspectives. This finding has significant implications concerning the methodology of evaluation adopted in multi-label experiments containing binary transformation strategies, given that multiple base algorithms should be considered. Despite these improvements in strategy and base algorithms, for many data sets, a large number of labels, mainly those less frequent, were either never predicted, or always misclassified. We conclude the experimental analysis by recommending strategies and base algorithms in accordance with different performance criteria.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    Although CC and NS also augment the input space, they are not considered stacking, given that only one-round is performed.

  2. 2.

    Also known as 2BR (Tsoumakas et al. 2009), Meta-Stacking (Read et al. 2009) and Stacking (Montañes et al. 2014).

  3. 3.

    It can either be a predefined value, such as 0.5 (Read et al. 2011) or dynamically defined using the cardinality value of the training data set (Read et al. 2009).

  4. 4.

    see https://www.kaggle.com/c/yelp-restaurant-photo-classification.

References

  1. Alali, A., & Kubat, M. (2015). PruDent: A pruned and confident stacking approach for multi-label classification. IEEE Transactions on Knowledge and Data Engineering, 27(9), 2480–2493. https://doi.org/10.1109/TKDE.2015.2416731.

    Article  Google Scholar 

  2. Benavoli, A., Corani, G., Demsar, J., & Zaffalon, M. (2017). Time for a change: A tutorial for comparing multiple classifiers through bayesian analysis. Journal of Machine Learning Research, 18, 77:1–77:36.

    MathSciNet  MATH  Google Scholar 

  3. Bernardini, F. C., Benito, E., & Meza, M. (2014). Cardinality and density measures and their influence to multi-label learning methods. Journal of the Brazilian Society on Computational Intelligence, 12(1), 53–71.

    Google Scholar 

  4. Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757–1771. https://doi.org/10.1016/j.patcog.2004.03.009.

    Article  Google Scholar 

  5. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324.

    MATH  Article  Google Scholar 

  6. Briggs, F., Huang, Y., Raich, R., Eftaxias, K., Lei, Z., Cukierski, W., Hadley, S. F., et al. (2013). The 9th annual MLSP competition: New methods for acoustic classification of multiple simultaneous bird species in a noisy environment. In IEEE International workshop on machine learning for signal processing (pp. 1–8). https://doi.org/10.1109/MLSP.2013.6661934.

  7. Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27. https://doi.org/10.1145/1961189.1961199.

    Article  Google Scholar 

  8. Charte, F., Rivera, A. J., del Jesus, M. J., & Herrera, F. (2015). QUINTA: A question tagging assistant to improve the answering ratio in electronic forums. In IEEE international conference on computer as a tool, IEEE (pp. 1–6). https://doi.org/10.1109/EUROCON.2015.7313677.

  9. Charte, F., & Charte, F. D. (2015). Working with multilabel datasets in R: The mldr Package. The R Journal, 7(2), 149–162.

    Article  Google Scholar 

  10. Charte, F., Rivera, A. J., Charte, D., del Jesús, M. J., & Herrera, F. (2018). Tips, guidelines and tools for managing multi-label datasets: The mldr.datasets R package and the cometa data repository. Neurocomputing, 289, 68–85. https://doi.org/10.1016/j.neucom.2018.02.011.

    Article  Google Scholar 

  11. Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22Nd ACM international conference on knowledge discovery and data mining (pp. 785–794). https://doi.org/10.1145/2939672.2939785.

  12. Cherman, E. A., Metz, J., & Monard, M. C. (2012). Incorporating label dependency into the binary relevance framework for multi-label classification. Expert Systems with Applications, 39(2), 1647–1655. https://doi.org/10.1016/j.eswa.2011.06.056.

    Article  Google Scholar 

  13. Cherman, E. A., Spolaôr, N., Valverde-Rebaza, J., & Monard, M. C. (2014). Lazy multi-label learning algorithms based on mutuality strategies. Journal of Intelligent & Robotic Systems,. https://doi.org/10.1007/s10846-014-0144-4.

    Article  Google Scholar 

  14. de Carvalho, A. C. P. L. F., & Freitas, A. A. (2009). A tutorial on multi-label classification techniques. In A. Abraham, A. E. Hassanien, & V. Snášel (Eds.), Foundations of computational intelligence (pp. 177–195). Berlin: Springer. https://doi.org/10.1007/978-3-642-01536-6_8.

    Google Scholar 

  15. de Sá, A. G. C., Freitas, A. A., & Pappa, G. L. (2018). Automated selection and configuration of multi-label classification algorithms with grammar-based genetic programming. In A. Auger, C. M. Fonseca, N. Lourenço, P. Machado, L. Paquete, D. Whitley (Eds.), Parallel Problem Solving from Nature - PPSN XV−15th international conference, Coimbra, Portugal, September 8–12, 2018, Proceedings, Part II, Springer, Lecture Notes in Computer Science (Vol. 11102, pp. 308–320). https://doi.org/10.1007/978-3-319-99259-4_25.

  16. de Sá, A. G. C., Pappa, G. L., & Freitas, A. A. (2017). Towards a method for automatically selecting and configuring multi-label classification algorithms. In Proceedings of the genetic and evolutionary computation conference companion (pp. 1125–1132) https://doi.org/10.1145/3067695.3082053.

  17. Duygulu, P., Barnard, K., de Freitas, J. F. G., & Forsyth, D. A. (2002). Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In A. Heyden, G. Sparr, M. Nielsen, P. Johansen (Eds.), Computer Vision—ECCV 2002, 7th European conference on computer vision, Copenhagen, Denmark, May 28–31, 2002, Proceedings, Part IV, Lecture Notes in Computer Science (Vol. 2353, pp. 97–112). Berlin: Springer. https://doi.org/10.1007/3-540-47979-1_7.

  18. Elisseeff, A., & Weston, J. (2001). A kernel method for multi-labeled classification. In Proceedings of the neural information processing systems (pp. 681–687).

  19. Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Analytical methods for social research. New York: Cambridge University Press.

    Google Scholar 

  20. Gibaja, E., & Ventura, S. (2014). Multi-label learning: A review of the state of the art and ongoing research. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(6), 411–444. https://doi.org/10.1002/widm.1139.

    Article  Google Scholar 

  21. Gibaja, E., & Ventura, S. (2015). A tutorial on multilabel learning. ACM Computing Surveys, 47(3), 1–38. https://doi.org/10.1145/2716262.

    Article  Google Scholar 

  22. Godbole, S., & Sarawagi, S. (2004). Discriminative methods for multi-labeled classification. In Proceedings of the 8th Pacific-Asia conference, (pp. 22–30) https://doi.org/10.1007/978-3-540-24775-3_5.

  23. Gonçalves, E. C., Plastino, A., & Freitas, A. A. (2013). A genetic algorithm for optimizing the label ordering in multi-label classifier chains. In Proceedings of the international conference on tools with artificial intelligence (pp. 469–476). https://doi.org/10.1109/ICTAI.2013.76.

  24. Jackson, P., & Moulinier, I. (2002). Natural language processing for online applications: Text retrieval, extraction & categorization. Amsterdam: John Benjamins.

    Book  Google Scholar 

  25. Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Upper Saddle River, NJ: Prentice-Hall Inc.

    MATH  Google Scholar 

  26. Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. Proceedings of the 10th European Conference on Machine Learning, 1398, 137–142.

    Google Scholar 

  27. Klimt, B., & Yang, Y. (2004). The Enron Corpus: A new dataset for email classification research. In Proceedings of the 15th European conference on Machine Learning (pp. 217–226) https://doi.org/10.1007/978-3-540-30115-8_22.

  28. Lang, K. (1995). Newsweeder: Learning to filter Netnews. In Proceedings of the twelfth international conference on machine learning, (pp. 331–339).

  29. Li, Y. k., & Zhang, M. L. (2014). Enhancing binary relevance for multi-label learning with controlled label correlations exploitation. In 13th Pacific Rim International Conference on Artificial Intelligence (pp. 91–103). https://doi.org/10.1007/978-3-319-13560-1_8.

  30. Liu, S. M., & Chen, J. (2015). An empirical study of empty prediction of multi-label classification. Expert Syst Appl, 42(13), 5567–5579. https://doi.org/10.1016/j.eswa.2015.01.024.

    Article  Google Scholar 

  31. Luaces, O., Díez, J., Barranquero, J., del Coz, J. J., & Bahamonde, A. (2012). Binary relevance efficacy for multilabel classification. Progress in Artificial Intelligence, 1(4), 303–313.

    Article  Google Scholar 

  32. Madjarov, G., Kocev, D., Gjorgjevikj, D., & Džeroski, S. (2012). An extensive experimental comparison of methods for multi-label learning. Pattern Recognition, 45(9), 3084–3104. https://doi.org/10.1016/j.patcog.2012.03.004.

    Article  Google Scholar 

  33. Mantovani, R. G., Rossi, A. L. D., Vanschoren, J., Bischl, B., & Carvalho, A. C. P. L. F. (2015). To tune or not to tune: Recommending when to adjust SVM hyper-parameters via meta-learning. In 2015 International Joint Conference on Neural Networks, IEEE, (pp. 1–8). https://doi.org/10.1109/IJCNN.2015.7280644.

  34. Metz, J., de Abreu, L. F., Cherman, E. A., & Monard, M. C. (2012). On the estimation of predictive evaluation measure baselines for multi-label learning. In 13th Ibero-American Conference on Artificial Intelligence (pp. 189–198).

  35. Montañes, E., Senge, R., Barranquero, J., Quevedo, J. R., Coz, J Jd, & Hüllermeier, E. (2014). Dependent binary relevance models for multi-label classification. Pattern Recognition, 47(3), 1494–1508. https://doi.org/10.1016/j.patcog.2013.09.029.

    Article  Google Scholar 

  36. Moyano, J. M., Galindo, E. L. G., Cios, K. J., & Ventura, S. (2018). Review of ensembles of multi-label classifiers: Models, experimental study and prospects. Information Fusion, 44, 33–45. https://doi.org/10.1016/j.inffus.2017.12.001.

    Article  Google Scholar 

  37. Pereira, R. B., Plastino, A., Zadrozny, B., & Merschmann, L. H. (2018). Correlation analysis of performance measures for multi-label classification. Information Processing & Management, 54(3), 359–369. https://doi.org/10.1016/j.ipm.2018.01.002.

    Article  Google Scholar 

  38. Pestian, J. P., Brew, C., Matykiewicz, P., Hovermale, D. J., Johnson, N., Cohen, K. B., & Duch, W. (2007). A shared task involving multi-label classification of clinical free text. In Proceedings of the workshop on biological, translational, and clinical language processing, association for computational linguistics (pp. 97–104).

  39. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Francisco, CA: Morgan Kaufmann Publishers Inc.

    Google Scholar 

  40. Raez, A. M., Lopez, L. A. U., Steinberger, R. (2004). Adaptive selection of base classifiers in one-against-all learning for large multi-labeled collections. In Advances in Natural Language Processing (pp. 1–12). https://doi.org/10.1007/978-3-540-30228-5_1.

  41. Rauber, T. W., Mello, L. H., Rocha, V. F., Luchi, D., & Varejão, F. M. (2014). Recursive dependent binary relevance model for multi-label classification. In A. L. Bazzan, K. Pichara (Eds), Advances in artificial intelligence—IBERAMIA 2014 (pp. 206–217). https://doi.org/10.1007/978-3-319-12027-0_17.

  42. Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2009). Classifier chains for multi-label classification. Proceedings of the European conference, Bled, Slovenia, 5782, 254–269.

    Google Scholar 

  43. Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2011). Classifier chains for multi-label classification. Machine Learning, 85(3), 333–359.

    MathSciNet  Article  Google Scholar 

  44. Rivolli, A., & de Carvalho, A. C. P. L. F. (2018). The utiml Package: Multi-label Classification in R. The R Journalhttps://journal.r-project.org/archive/2018/RJ-2018-041/index.html.

  45. Rivolli, A., Soares, C., & de Carvalho, A. C. P. L. F. (2018). Enhancing multilabel classification for food truck recommendation. Expert Systems,. https://doi.org/10.1111/exsy.12304.

    Article  Google Scholar 

  46. Schapire, E. R., & Singer, Y. (1999). Improved boosting algorithm using confidence-rated predictions. Machine Learning, 37(3), 297–336. https://doi.org/10.1023/A:1007614523901.

    MATH  Article  Google Scholar 

  47. Sechidis, K., Tsoumakas, G., & Vlahavas, I. (2011). On the stratification of multi-label data. In D. Gunopulos, T. Hofmann, D. Malerba, Vazirgiannis M. (Eds.), Machine learning and knowledge discovery in databases (pp. 145–158). https://doi.org/10.1007/978-3-642-23808-6_10.

  48. Senge, R., del Coz, J. J., & Hüllermeier, E. (2013). Rectifying classifier chains for multi-label classification. In Proceedings of the Workshop of Lernen, Wissen & Adaptivität, Bamberg, Germany (pp. 162–169).

  49. Snoek, C. G. M., Worring, M., van Gemert, J. C., Geusebroek, J. M., & Smeulders, A. W. M. (2006). The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the 14th ACM international conference on multimedia, (pp. 421–430) https://doi.org/10.1145/1180639.1180727.

  50. Srivastava, A. N., & Zane-Ulman, B. (2005). Discovering recurring anomalies in text reports regarding complex space systems. In IEEE aerospace conference (pp. 3853–3862). https://doi.org/10.1109/AERO.2005.1559692.

  51. Trohidis, K., Tsoumakas, G., Kalliris, G., & Vlahavas, I. (2011). Multi-label classification of music by emotion. Journal on Audio, Speech, and Music Processing, 2011(1), 4. https://doi.org/10.1186/1687-4722-2011-426793.

    Article  Google Scholar 

  52. Tsoumakas, G., Katakis, I., & Vlahavas, I. (2008). Effective and efficient multilabel classification in domains with large number of labels. In Proceedings of European conference on machine learning and principles and practice of knowledge discovery in databases, workshop on mining multidimensional data (pp. 30–44).

  53. Tsoumakas, G., Loza Mencía, E., Katakis, I., Park, S. H., & Fürnkranz, J. (2009). On the combination of two decompositive multi-label classification methods. In Proceedings of the European conference on machine learning and principles and practice of knowledge discovery, workshop on preference learning (pp. 114–129).

  54. Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3), 1–13.

    Article  Google Scholar 

  55. Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). Mining multi-label data. In O. Maimon & L. Rokach (Eds.), Data mining and knowledge discovery handbook, Chap 34 (2nd ed., pp. 667–685). Berlin: Springer. https://doi.org/10.1007/978-0-387-09823-4_34.

    Google Scholar 

  56. Tsoumakas, G., Katakis, I., & Vlahavas, I. (2011a). Random k-labelsets for multi-label classification. IEEE Transactions on Knowledge and Data Engineering, 23(7), 1079–1089.

    Article  Google Scholar 

  57. Tsoumakas, G., Katakis, I., & Vlahavas, I. (2011b). Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering, 23(7), 1079–1089. https://doi.org/10.1109/TKDE.2010.164.

    Article  Google Scholar 

  58. Turnbull, D., Barrington, L., Torres, D., & Lanckriet, G. (2008). Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech, and Language Processing, 16(2), 467–476. https://doi.org/10.1109/TASL.2007.913750.

    Article  Google Scholar 

  59. Wever, M., Mohr, F., & Hüllermeier, E. (2018). Automated multi-label classification based on ML-plan. arXiv:1811.04060.

  60. Wever, M. D., Mohr, F., Tornede, A., & Hüllermeier, E. (2019). Automating multi-label classification extending ml-plan. In 6th ICML Workshop on Automated Machine Learning.

  61. Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259. https://doi.org/10.1016/S0893-6080(05)80023-1.

    Article  Google Scholar 

  62. Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1–2), 69–90. https://doi.org/10.1023/A:1009982220290.

    Article  Google Scholar 

  63. Zhang, M. L., & Wu, L. (2015). Lift: Multi-Label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 107–120. https://doi.org/10.1109/TPAMI.2014.2339815.

    Article  Google Scholar 

  64. Zhang, M. L., & Zhou, Z. H. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837. https://doi.org/10.1109/TKDE.2013.39.

    Article  Google Scholar 

  65. Zhou, Z., & Zhang, M. (2006). Multi-instance multi-label learning with application to scene classification. In B. Schölkopf, J. C. Platt, & T. Hofmann (Eds.), Advances in neural information processing systems 19, Proceedings of the twentieth annual conference on neural information processing systems, Vancouver, British Columbia, December 4–7, 2006, (pp. 1609–1616). Cambridge: MIT Press.

  66. Zhou, T., Tao, D., & Wu, X. (2012). Compressed labeling on distilled labelsets for multi-label learning. Machine Learning, 88(1–2), 69–126.

    MathSciNet  MATH  Article  Google Scholar 

  67. Zufferey, D., Hofer, T., Hennebert, J., Schumacher, M., Ingold, R., & Bromuri, S. (2015). Performance comparison of multi-label learning algorithms on clinical data for chronic diseases. Computers in Biology and Medicine, 65, 34–43. https://doi.org/10.1016/j.compbiomed.2015.07.017.

    Article  Google Scholar 

Download references

Acknowledgements

This work was financially supported by CNPq (Processes 305291/2017-3 and 152098/2016-0), FAPESP (Processes 2016/18615-0, 2013/07375-0 and 2012/22608-8), CAPES and Intel. The experiments were performed using the computational resources of CeMEAI-FAPESP, Proc. 13/07375-0.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Adriano Rivolli.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Editor: Eyke Hüllermeier.

Appendices

Appendix 1: Best strategies/base algorithms

This section presents the strategy/base-algorithm’s ranking over all data sets (Figs. 9, 10, 11, 12, 13, 14, 15 and 16) and the performance value obtained for each strategy when combined with the best base algorithm (Tables 10, 11, 12, 13, 14, 15, 16 and 17). The median ranking is used to select the base-algorithm for each strategy.

Table 10 Results of best strategies for the F1 \(\uparrow\) measure
Fig. 9
figure9

Strategy/base-algorithm’s rankings for the F1 measure

Table 11 Results of best strategies for the hamming-loss \(\downarrow\) measure
Fig. 10
figure10

Strategy/base-algorithm’s rankings for the hamming-loss measure

Table 12 Results of best strategies for macro-F1 \(\uparrow\) measure
Fig. 11
figure11

Strategy/base-algorithm’s rankings for the macro-F1 measure

Table 13 Results of best strategies for macro-precision \(\uparrow\) measure
Fig. 12
figure12

Strategy/base-algorithm’s rankings for the macro-precision measure

Table 14 Results of best strategies for macro-recall \(\uparrow\) measure
Fig. 13
figure13

Strategy/base-algorithm’s rankings for the macro-recall measure

Table 15 Results of best strategies for one-error \(\downarrow\) measure
Fig. 14
figure14

Strategy/base-algorithm’s rankings for the one-error measure

Table 16 Results of best strategies for ranking-loss \(\downarrow\) measure
Fig. 15
figure15

Strategy/base-algorithm’s rankings for the ranking-loss measure

Table 17 Results of best strategies for subset-accuracy \(\uparrow\) measure
Fig. 16
figure16

Strategy/base-algorithm’s rankings for the subset-accuracy measure

Appendix 2: Statistical results

From the previous results, the best pairs of strategies/base-algorithms were statistically compared against the other pairs using the Bayesian statistical test. Tables 18, 19, 20, 21, 22, 23, 24 and 25 report the pairs that the considered strategies/base-algorithms statistically outperform with a probability greater than or equal to 95%.

Table 18 Bayesian statistical results for the F1 measure such that the strategies in the row improve the strategies in the columns with a probability greater than or equal to 95%
Table 19 Bayesian Statistical results for the hamming-loss measure such that the strategies in the row improve the strategies in the columns with a probability greater than or equal to 95%
Table 20 Bayesian Statistical results for the macro-F1 measure such that the strategies in the row improve the strategies in the columns with a probability greater than or equal to 95%
Table 21 Bayesian Statistical results for the macro-precision measure such that the strategies in the row improve the strategies in the columns with a probability greater than or equal to 95%
Table 22 Bayesian Statistical results for the macro-recall measure such that the strategies in the row improve the strategies in the columns with a probability greater than or equal to 95%
Table 23 Bayesian Statistical results for the one-error measure such that the strategies in the row improve the strategies in the columns with a probability greater than or equal to 95%
Table 24 Bayesian Statistical results for the ranking-loss measure such that the strategies in the row improve the strategies in the columns with a probability greater than or equal to 95%
Table 25 Bayesian Statistical results for the subset-accuracy measure such that the strategies in the row improve the strategies in the columns with a probability greater than or equal to 95%

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rivolli, A., Read, J., Soares, C. et al. An empirical analysis of binary transformation strategies and base algorithms for multi-label learning. Mach Learn 109, 1509–1563 (2020). https://doi.org/10.1007/s10994-020-05879-3

Download citation

Keywords

  • Multi-label learning
  • Binary transformation
  • Comparison of strategies
  • Base algorithms
  • Empirical analysis