Concurrence among Imbalanced Labels and Its Influence on Multilabel Resampling Algorithms

  • Francisco Charte
  • Antonio Rivera
  • María José del Jesus
  • Francisco Herrera
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8480)


In the context of multilabel classification, the learning from imbalanced data is getting considerable attention recently. Several algorithms to face this problem have been proposed in the late five years, as well as various measures to assess the imbalance level. Some of the proposed methods are based on resampling techniques, a very well-known approach whose utility in traditional classification has been proven.

This paper aims to describe how a specific characteristic of multilabel datasets (MLDs), the level of concurrence among imbalanced labels, could have a great impact in resampling algorithms behavior. Towards this goal, a measure named SCUMBLE, designed to evaluate this concurrence level, is proposed and its usefulness is experimentally tested. As a result, a straightforward guideline on the effectiveness of multilabel resampling algorithms depending on MLDs characteristics can be inferred.


Multilabel Classification Imbalanced Learning Resampling Measures 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining Multi-label Data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, ch. 34, pp. 667–685. Springer US, Boston (2010)Google Scholar
  2. 2.
    Katakis, I., Tsoumakas, G., Vlahavas, I.: Multilabel Text Classification for Automated Tag Suggestion. In: Proc. ECML PKDD 2008 Discovery Challenge, Antwerp, Belgium, pp. 75–83 (2008)Google Scholar
  3. 3.
    Diplaris, S., Tsoumakas, G., Mitkas, P.A., Vlahavas, I.: Protein Classification with Multiple Algorithms. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 448–456. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  4. 4.
    Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  5. 5.
    Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004)CrossRefGoogle Scholar
  6. 6.
    He, J., Gu, H., Liu, W.: Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites. PloS One 7(6), 7155 (2012)Google Scholar
  7. 7.
    Li, C., Shi, G.: Improvement of learning algorithm for the multi-instance multi-label rbf neural networks trained with imbalanced samples. J. Inf. Sci. Eng. 29(4), 765–776 (2013)Google Scholar
  8. 8.
    Tepvorachai, G., Papachristou, C.: Multi-label imbalanced data enrichment process in neural net classifier training. In: IEEE Int. Joint Conf. on Neural Networks, IJCNN, 2008, pp. 1301–1307 (2008)Google Scholar
  9. 9.
    Tahir, M.A., Kittler, J., Bouridane, A.: Multilabel classification using heterogeneous ensemble of multi-label classifiers. Pattern Recognit. Lett. 33(5), 513–523 (2012)CrossRefGoogle Scholar
  10. 10.
    Tahir, M.A., Kittler, J., Yan, F.: Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recognit. 45(10), 3738–3750 (2012)CrossRefGoogle Scholar
  11. 11.
    Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: A first approach to deal with imbalance in multi-label datasets. In: Pan, J.-S., Polycarpou, M.M., Woźniak, M., de Carvalho, A.C.P.L.F., Quintián, H., Corchado, E. (eds.) HAIS 2013. LNCS, vol. 8073, pp. 150–160. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  12. 12.
    Giraldo-Forero, A.F., Jaramillo-Garzón, J.A., Ruiz-Muñoz, J.F., Castellanos-Domínguez, C.G.: Managing imbalanced data sets in multi-label problems: A case study with the smote algorithm. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds.) CIARP 2013, Part I. LNCS, vol. 8258, pp. 334–342. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  13. 13.
    García, V., Sánchez, J., Mollineda, R.: On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl. Based Systems 25(1), 13–21 (2012)CrossRefGoogle Scholar
  14. 14.
    Szymański, P., Kajdanowicz, T.: MLG: Enchancing multi-label classification with modularity-based label grouping. In: Pan, J.-S., Polycarpou, M.M., Woźniak, M., de Carvalho, A.C.P.L.F., Quintián, H., Corchado, E. (eds.) HAIS 2013. LNCS, vol. 8073, pp. 431–440. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  15. 15.
    Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Semantic Annotation and Retrieval of Music and Sound Effects. IEEE Audio, Speech, Language Process. 16(2), 467–476 (2008)CrossRefGoogle Scholar
  16. 16.
    Klimt, B., Yang, Y.: The Enron Corpus: A New Dataset for Email Classification Research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  17. 17.
    Elisseeff, A., Weston, J.: A Kernel Method for Multi-Labelled Classification. In: Advances in Neural Information Processing Systems 14, vol. 14, pp. 681–687. MIT Press (2001)Google Scholar
  18. 18.
    Crammer, K., Dredze, M., Ganchev, K., Talukdar, P.P., Carroll, S.: Automatic Code Assignment to Medical Text. In: Proc. Workshop on Biological, Translational, and Clinical Language Processing, BioNLP 2007, Prague, Czech Republic, pp. 129–136 (2007)Google Scholar
  19. 19.
    Godbole, S., Sarawagi, S.: Discriminative Methods for Multi-labeled Classification. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 22–30. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  20. 20.
    Boutell, M., Luo, J., Shen, X., Brown, C.: Learning multi-label scene classification. Pattern Recognit. 37(9), 1757–1771 (2004)CrossRefGoogle Scholar
  21. 21.
    Zhang, M., Zhou, Z.: ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)CrossRefzbMATHGoogle Scholar
  22. 22.
    Clare, A.J., King, R.D.: Knowledge discovery in multi-label phenotype data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  23. 23.
    Zhang, M., Zhou, Z.: A Review on Multi-Label Learning Algorithms. IEEE Trans. Knowl. Data Eng., doi:10.1109/TKDE.2013.39Google Scholar
  24. 24.
    López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf. Sciences 250, 113–141 (2013)CrossRefGoogle Scholar
  25. 25.
    Fernández, A., López, V., Galar, M., del Jesus, M.J., Herrera, F.: Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Knowl. Based Systems 42, 97–110 (2013)CrossRefGoogle Scholar
  26. 26.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. J. Artificial Intelligence Res. 16, 321–357 (2002)zbMATHGoogle Scholar
  27. 27.
    Kotsiantis, S.B., Pintelas, P.E.: Mixture of expert agents for handling imbalanced data sets. Annals of Mathematics, Computing & Teleinformatics 1, 46–55 (2003)Google Scholar
  28. 28.
    Provost, F., Fawcett, T.: Robust classification for imprecise environments. Mach. Learn. 42, 203–231 (2001)CrossRefzbMATHGoogle Scholar
  29. 29.
    Atkinson, A.B.: On the measurement of inequality. Journal of Economic Theory 2(3), 244–263 (1970)CrossRefMathSciNetGoogle Scholar
  30. 30.
    Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and Efficient Multilabel Classification in Domains with Large Number of Labels. In: Proc. ECML/PKDD Workshop on Mining Multidimensional Data, MMD 2008, Antwerp, Belgium, pp. 30–44 (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Francisco Charte
    • 1
  • Antonio Rivera
    • 2
  • María José del Jesus
    • 2
  • Francisco Herrera
    • 1
  1. 1.Dep. of Computer Science and Artificial IntelligenceUniversity of GranadaGranadaSpain
  2. 2.Dep. of Computer ScienceUniversity of JaénJaénSpain

Personalised recommendations