MLeNN: A First Approach to Heuristic Multilabel Undersampling

Charte, Francisco; Rivera, Antonio J.; del Jesus, María J.; Herrera, Francisco

doi:10.1007/978-3-319-10840-7_1

Francisco Charte¹⁸,
Antonio J. Rivera¹⁹,
María J. del Jesus¹⁹ &
…
Francisco Herrera¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8669))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1669 Accesses
15 Citations

Abstract

Learning from imbalanced multilabel data is a challenging task that has attracted considerable attention lately. Some resampling algorithms used in traditional classification, such as random undersampling and random oversampling, have been already adapted in order to work with multilabel datasets.

In this paper MLeNN (MultiLabel edited Nearest Neighbor), a heuristic multilabel undersampling algorithm based on the well-known Wilson’s Edited Nearest Neighbor Rule, is proposed. The samples to be removed are heuristically selected, instead of randomly picked. The ability of MLeNN to improve classification results is experimentally tested, and its performance against multilabel random undersampling is analyzed. As will be shown, MLeNN is a competitive multilabel undersampling alternative, able to enhance significantly classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zhang, M.-L., Zhou, Z.-H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. (2013)
Google Scholar
He, J., Gu, H., Liu, W.: Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites. PloS One 7(6), 7155 (2012)
Google Scholar
Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: A first approach to deal with imbalance in multi-label datasets. In: Pan, J.-S., Polycarpou, M.M., Woźniak, M., de Carvalho, A.C.P.L.F., Quintián, H., Corchado, E. (eds.) HAIS 2013. LNCS, vol. 8073, pp. 150–160. Springer, Heidelberg (2013)
Chapter Google Scholar
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. on SMC-2(3), 408–421 (1972)
Google Scholar
Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining Multi-label Data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, ch. 34, pp. 667–685. Springer US, Boston (2010)
Google Scholar
Haibo, H., Yunqian, M.: Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley-IEEE Press (2013)
Google Scholar
Tahir, M.A., Kittler, J., Bouridane, A.: Multilabel classification using heterogeneous ensemble of multi-label classifiers. Pattern Recognit. Lett. 33(5), 513–523 (2012)
Article Google Scholar
García, V., Sánchez, J., Mollineda, R.: On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl. Based Systems 25(1), 13–21 (2012)
Article Google Scholar
Tsoumakas, G., Xioufis, E.S., Vilcek, J., Vlahavas, I.: MULAN: A Java Library for Multi-Label Learning. J. Mach. Learn. Res. 12, 2411–2414 (2011)
MathSciNet MATH Google Scholar
Godbole, S., Sarawagi, S.: Discriminative Methods for Multi-labeled Classification. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 22–30. Springer, Heidelberg (2004)
Chapter Google Scholar
Fürnkranz, J., Hüllermeier, E., Loza Mencía, E., Brinker, K.: Multilabel classification via calibrated label ranking. Mach. Learn. 73, 133–153 (2008)
Article Google Scholar
Tsoumakas, G., Vlahavas, I.: Random k-labelsets: An ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Dep. of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
Francisco Charte & Francisco Herrera
Dep. of Computer Science, University of Jaén, Jaén, Spain
Antonio J. Rivera & María J. del Jesus

Authors

Francisco Charte
View author publications
You can also search for this author in PubMed Google Scholar
Antonio J. Rivera
View author publications
You can also search for this author in PubMed Google Scholar
María J. del Jesus
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Herrera
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Salamanca, Plaza de la Merced S/N, 37008, Salamanca, Spain
Emilio Corchado & Héctor Quintián &
University of the Basque Country, Pasco Manuel de Lardizábal 1, 20018, San Sebastián, Spain
José A. Lozano
The University of Manchester, Sackville Street, M13 9PL, Manchester, UK
Hujun Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F. (2014). MLeNN: A First Approach to Heuristic Multilabel Undersampling. In: Corchado, E., Lozano, J.A., Quintián, H., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2014. IDEAL 2014. Lecture Notes in Computer Science, vol 8669. Springer, Cham. https://doi.org/10.1007/978-3-319-10840-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-10840-7_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10839-1
Online ISBN: 978-3-319-10840-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics