Skip to main content

MLeNN: A First Approach to Heuristic Multilabel Undersampling

  • Conference paper
Intelligent Data Engineering and Automated Learning – IDEAL 2014 (IDEAL 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8669))

Abstract

Learning from imbalanced multilabel data is a challenging task that has attracted considerable attention lately. Some resampling algorithms used in traditional classification, such as random undersampling and random oversampling, have been already adapted in order to work with multilabel datasets.

In this paper MLeNN (MultiLabel edited Nearest Neighbor), a heuristic multilabel undersampling algorithm based on the well-known Wilson’s Edited Nearest Neighbor Rule, is proposed. The samples to be removed are heuristically selected, instead of randomly picked. The ability of MLeNN to improve classification results is experimentally tested, and its performance against multilabel random undersampling is analyzed. As will be shown, MLeNN is a competitive multilabel undersampling alternative, able to enhance significantly classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhang, M.-L., Zhou, Z.-H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. (2013)

    Google Scholar 

  2. He, J., Gu, H., Liu, W.: Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites. PloS One 7(6), 7155 (2012)

    Google Scholar 

  3. Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: A first approach to deal with imbalance in multi-label datasets. In: Pan, J.-S., Polycarpou, M.M., Woźniak, M., de Carvalho, A.C.P.L.F., Quintián, H., Corchado, E. (eds.) HAIS 2013. LNCS, vol. 8073, pp. 150–160. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  4. Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. on SMC-2(3), 408–421 (1972)

    Google Scholar 

  5. Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining Multi-label Data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, ch. 34, pp. 667–685. Springer US, Boston (2010)

    Google Scholar 

  6. Haibo, H., Yunqian, M.: Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley-IEEE Press (2013)

    Google Scholar 

  7. Tahir, M.A., Kittler, J., Bouridane, A.: Multilabel classification using heterogeneous ensemble of multi-label classifiers. Pattern Recognit. Lett. 33(5), 513–523 (2012)

    Article  Google Scholar 

  8. García, V., Sánchez, J., Mollineda, R.: On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl. Based Systems 25(1), 13–21 (2012)

    Article  Google Scholar 

  9. Tsoumakas, G., Xioufis, E.S., Vilcek, J., Vlahavas, I.: MULAN: A Java Library for Multi-Label Learning. J. Mach. Learn. Res. 12, 2411–2414 (2011)

    MathSciNet  MATH  Google Scholar 

  10. Godbole, S., Sarawagi, S.: Discriminative Methods for Multi-labeled Classification. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 22–30. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  11. Fürnkranz, J., Hüllermeier, E., Loza Mencía, E., Brinker, K.: Multilabel classification via calibrated label ranking. Mach. Learn. 73, 133–153 (2008)

    Article  Google Scholar 

  12. Tsoumakas, G., Vlahavas, I.: Random k-labelsets: An ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F. (2014). MLeNN: A First Approach to Heuristic Multilabel Undersampling. In: Corchado, E., Lozano, J.A., Quintián, H., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2014. IDEAL 2014. Lecture Notes in Computer Science, vol 8669. Springer, Cham. https://doi.org/10.1007/978-3-319-10840-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10840-7_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10839-1

  • Online ISBN: 978-3-319-10840-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics