Advertisement

Synthetic Oversampling of Multi-label Data Based on Local Label Distribution

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11907)

Abstract

Class-imbalance is an inherent characteristic of multi-label data which affects the prediction accuracy of most multi-label learning methods. One efficient strategy to deal with this problem is to employ resampling techniques before training the classifier. Existing multi-label sampling methods alleviate the (global) imbalance of multi-label datasets. However, performance degradation is mainly due to rare sub-concepts and overlapping of classes that could be analysed by looking at the local characteristics of the minority examples, rather than the imbalance of the whole dataset. We propose a new method for synthetic oversampling of multi-label data that focuses on local label distribution to generate more diverse and better labeled instances. Experimental results on 13 multi-label datasets demonstrate the effectiveness of the proposed approach in a variety of evaluation measures, particularly in the case of an ensemble of classifiers trained on repeated samples of the original data.

Keywords

Multi-label learning Class-imbalance Synthetic oversampling Local label distribution Ensemble methods 

Notes

Acknowledgements

Bin Liu is supported from the China Scholarship Council (CSC) under the Grant CSC No. 201708500095.

References

  1. 1.
    Benavoli, A., Corani, G., Mangili, F.: Should we really use post-hoc tests based on mean-ranks? J. Mach. Learn. Res. 17, 1–10 (2016)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004).  https://doi.org/10.1016/j.patcog.2004.03.009CrossRefGoogle Scholar
  3. 3.
    Cao, P., Liu, X., Zhao, D., Zaiane, O.: Cost sensitive ranking support vector machine for multi-label data learning. In: Abraham, A., Haqiq, A., Alimi, A.M., Mezzour, G., Rokbani, N., Muda, A.K. (eds.) HIS 2016. AISC, vol. 552, pp. 244–255. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-52941-7_25CrossRefGoogle Scholar
  4. 4.
    Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: A first approach to deal with imbalance in multi-label datasets. In: Pan, J.-S., Polycarpou, M.M., Woźniak, M., de Carvalho, A.C.P.L.F., Quintián, H., Corchado, E. (eds.) HAIS 2013. LNCS (LNAI), vol. 8073, pp. 150–160. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-40846-5_16CrossRefGoogle Scholar
  5. 5.
    Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: MLeNN: a first approach to heuristic multilabel undersampling. In: Corchado, E., Lozano, J.A., Quintián, H., Yin, H. (eds.) IDEAL 2014. LNCS, vol. 8669, pp. 1–9. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10840-7_1CrossRefGoogle Scholar
  6. 6.
    Charte, F., Rivera, A.J., Del Jesus, M.J., Herrera, F.: MLSMOTE: approaching imbalanced multilabel learning through synthetic instance generation. Knowl.-Based Syst. 89, 385–397 (2015).  https://doi.org/10.1016/j.knosys.2015.07.019CrossRefGoogle Scholar
  7. 7.
    Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163, 3–16 (2015).  https://doi.org/10.1016/j.neucom.2014.08.091CrossRefGoogle Scholar
  8. 8.
    Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Dealing with difficult minority labels in imbalanced mutilabel data sets. Neurocomputing 326–327, 39–53 (2019).  https://doi.org/10.1016/j.neucom.2016.08.158CrossRefGoogle Scholar
  9. 9.
    Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: REMEDIAL-HwR: tackling multilabel imbalance through label decoupling and data resampling hybridization. Neurocomputing 326–327, 110–122 (2019).  https://doi.org/10.1016/j.neucom.2017.01.118CrossRefGoogle Scholar
  10. 10.
    Chen, K., Lu, B.L., Kwok, J.T.: Efficient classification of multi-label and imbalanced data using min-max modular classifiers. In: Proceedings of the 2006 IEEE International Joint Conference on Neural Network, pp. 1770–1775. IEEE (2006).  https://doi.org/10.1109/IJCNN.2006.246893
  11. 11.
    Daniels, Z.A., Metaxas, D.N.: Addressing imbalance in multi-label classification using structured hellinger forests. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, pp. 1826–1832 (2017)Google Scholar
  12. 12.
    Dendamrongvit, S., Kubat, M.: Undersampling approach for imbalanced training sets and induction from multi-label text-categorization domains. In: Theeramunkong, T., et al. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5669, pp. 40–52. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-14640-4_4CrossRefGoogle Scholar
  13. 13.
    Fürnkranz, J., Hüllermeier, E., Loza Mencía, E., Brinker, K.: Multilabel classification via calibrated label ranking. Mach. Learn. 73(2), 133–153 (2008).  https://doi.org/10.1007/s10994-008-5064-8CrossRefGoogle Scholar
  14. 14.
    Garcia, S., Herrera, F.: An extension on “Statistical Comparisons of Classifiers over Multiple Data Sets” for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)zbMATHGoogle Scholar
  15. 15.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2016).  https://doi.org/10.1007/978-0-387-21606-5zbMATHCrossRefGoogle Scholar
  16. 16.
    Li, C., Shi, G.: Improvement of learning algorithm for the multi-instance multi-label RBF neural networks trained with imbalanced samples. J. Inf. Sci. Eng. 29(4), 765–776 (2013)MathSciNetGoogle Scholar
  17. 17.
    Li, L., Wang, H.: Towards label imbalance in multi-label classification with many labels. arXiv preprint arXiv:1604.01304 (2016)
  18. 18.
    Liu, B., Tsoumakas, G.: Making classifier chains resilient to class imbalance. In: 10th Asian Conference on Machine Learning (ACML 2018), Beijing, pp. 280–295 (2018)Google Scholar
  19. 19.
    Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. 46(3), 563–597 (2015)CrossRefGoogle Scholar
  20. 20.
    Sáez, J.A., Krawczyk, B., Woźniak, M.: Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets. Pattern Recogn. 57, 164–178 (2016).  https://doi.org/10.1016/j.patcog.2016.03.012CrossRefGoogle Scholar
  21. 21.
    Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE (2015).  https://doi.org/10.1371/journal.pone.0118432CrossRefGoogle Scholar
  22. 22.
    Sechidis, K., Tsoumakas, G., Vlahavas, I.: On the stratification of multi-label data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 145–158. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-23808-6_10CrossRefGoogle Scholar
  23. 23.
    Sozykin, K., Khan, A.M., Protasov, S., Hussain, R.: Multi-label class-imbalanced action recognition in hockey videos via 3D convolutional neural networks. In: 19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 146–151 (2018)Google Scholar
  24. 24.
    Sun, K.W., Lee, C.H.: Addressing class-imbalance in multi-label learning via two-stage multi-label hypernetwork. Neurocomputing 266, 375–389 (2017).  https://doi.org/10.1016/j.neucom.2017.05.049CrossRefGoogle Scholar
  25. 25.
    Tahir, M.A., Kittler, J., Yan, F.: Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recogn. 45(10), 3738–3750 (2012).  https://doi.org/10.1016/j.patcog.2012.03.014CrossRefGoogle Scholar
  26. 26.
    Tepvorachai, G., Papachristou, C.: Multi-label imbalanced data enrichment process in neural net classifier training. In: Proceedings of the International Joint Conference on Neural Networks, pp. 1301–1307 (2008).  https://doi.org/10.1109/IJCNN.2008.4633966
  27. 27.
    Tsoumakas, G., Katakis, I., Vlahavas, I.: Random k-labelsets for multilabel classification. IEEE Trans. Knowl. Data Eng. 23(7), 1079–1089 (2011)CrossRefGoogle Scholar
  28. 28.
    Wan, S., Duan, Y., Zou, Q.: HPSLPred: an ensemble multi-label classifier for human protein subcellular location prediction with imbalanced source. Proteomics 17(17–18), 1700262 (2017).  https://doi.org/10.1002/pmic.201700262CrossRefGoogle Scholar
  29. 29.
    Wu, B., Lyu, S., Ghanem, B.: Constrained submodular minimization for missing labels and class imbalance in multi-label learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence AAAI 2016, pp. 2229–2236. AAAI Press (2016)Google Scholar
  30. 30.
    Zeng, W., Chen, X., Cheng, H.: Pseudo labels for imbalanced multi-label learning. In: 2014 International Conference on Data Science and Advanced Analytics (DSAA), pp. 25–31, October 2014.  https://doi.org/10.1109/DSAA.2014.7058047
  31. 31.
    Zhang, M.L., Li, Y.K., Liu, X.Y.: Towards class-imbalance aware multi-label learning. In: Proceedings of the 24th International Conference on Artificial Intelligence, pp. 4041–4047 (2015)Google Scholar
  32. 32.
    Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of InformaticsAristotle University of ThessalonikiThessalonikiGreece

Personalised recommendations