Skip to main content

A Comparison of Multi-label Feature Selection Methods Using the Algorithm Adaptation Approach

  • Conference paper
  • First Online:
Book cover Advances in Visual Informatics (IVIC 2015)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9429))

Included in the following conference series:

  • 1259 Accesses

Abstract

In a multi-label classification problem, each document is associated with a subset of labels. The documents often consist of multiple features. In addition, each document is usually associated with several labels. Therefore, feature selection is an important task in machine learning, which attempts to remove irrelevant and redundant features that can hinder the performance. This paper suggests transforming the multi-label documents into single-label documents before using the standard feature selection algorithm. Under this process, the document is copied into labels to which it belongs by adopting assigning all features to each label it belongs. With this context, we conducted a comparative study on five feature selection methods. These methods are incorporated into the traditional Naive Bayes classifiers, which are adapted to deal with multi-label documents. Experiments conducted with benchmark datasets showed that the multi-label Naive Bayes classifier coupled with the GSS method delivered a better performance than the MLNB classifier using other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)

    Article  MATH  Google Scholar 

  2. Schapire, R.E., Singer, Y.: BoosTexter: a boosting-based system for text categorization. Mach. Learn. 39, 135–168 (2000)

    Article  MATH  Google Scholar 

  3. Comité, F.D., Gilleron, R., Tommasi, M.: Learning multi-label alternating decision trees from texts and data. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNCS, vol. 2734, pp. 251–274. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  4. Min-Ling, Z., Zhi-Hua, Z.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans. Knowl. Data Eng. 18, 1338–1351 (2006)

    Article  Google Scholar 

  5. Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40, 2038–2048 (2007)

    Article  MATH  Google Scholar 

  6. Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems, pp. 681–687. MIT Press (2001)

    Google Scholar 

  7. Wei, Z., Zhang, H., Zhang, Z., Li, W., Miao, D.: A Naïve Bayesian multi-label classification algorithm with application to visualize text search results. Int. J. Adv. Intell. 3, 173–188 (2011)

    Google Scholar 

  8. Zhao, Z., Morstatter, F., Sharma, S., Alelyani, S., Anand, A., Liu, H.: Advancing feature selection research. ASU Feature Selection Repository. Technical report, Arizona State University (2010)

    Google Scholar 

  9. Spolaôr, N., Cherman, E.A., Monard, M.C., Lee, H.D.: A comparison of multi-label feature selection methods using the problem transformation approach. Electron. Notes Theoret. Comput. Sci. 292, 135–151 (2013)

    Article  Google Scholar 

  10. Yu, Y., Wang, Y.: Feature selection for multi-label learning using mutual information and GA. In: Miao, D., Pedrycz, W., Slezak, D., Peters, G., Hu, Q., Wang, R. (eds.) RSKT 2014. LNCS, vol. 8818, pp. 454–464. Springer, Heidelberg (2014)

    Google Scholar 

  11. Weizhu, C., Jun, Y., Benyu, Z., Zheng, C., Qiang, Y.: Document transformation for multi-label feature selection in text categorization. In: Seventh IEEE International Conference on Data Mining, pp. 451–456. IEEE Press, Omaha, NE, USA (2001)

    Google Scholar 

  12. Doquire, G., Verleysen, M.: Feature selection for multi-label classification problems. In: Cabestany, J., Rojas, I., Joya, G. (eds.) IWANN 2011, Part I. LNCS, vol. 6691, pp. 9–16. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  13. Zhang, M.L., Peña, J.M., Robles, V.: Feature selection for multi-label Naive Bayes classification. Inf. Sci. 179, 3218–3229 (2009)

    Article  MATH  Google Scholar 

  14. Gunal, S.: Hybrid feature selection for text classification. Turk. J. Electr. Eng. Comput. Sci. 20, 1296–1311 (2012)

    Google Scholar 

  15. Spolaôr, N., Tsoumakas, G.: Evaluating feature selection methods for multi-label text classification. In: BioASQ Workhsop (2012)

    Google Scholar 

  16. Shao, H., Li, G., Liu, G., Wang, Y.: Symptom selection for multi-label data of inquiry diagnosis in traditional Chinese medicine. Sci China Inf. Sci. 65, 1–13 (2013)

    Article  MathSciNet  Google Scholar 

  17. Yang, Y., Pederson, J.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, pp. 412–420. ICML, Tennessee, USA (1997)

    Google Scholar 

  18. Galavotti, L., Sebastiani, F., Simi, M.: Experiments on the use of feature selection and negative evidence in automated text categorization. In: Borbinha, J.L., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, pp. 59–68. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  19. Van, R.: Information Retrieval, 2nd edn. Butterworths, London (1979)

    Google Scholar 

  20. Dunja, M.: Machine learning on non-homogeneous, distributed text data. Ph.D. dissertation, University of Ljubljana, Slovenia (1998)

    Google Scholar 

  21. Hwee, T.N., Wei, B.G., Kok, L.: Feature selection, perceptron learning, and a usability case study for text categorization. In: Proceedings of SIGIR-97, 20th ACM International Conference on Research and Development in Information Retrieval, pp 67–73. ACM, Philadelphia, Pennsylvania, USA (1997)

    Google Scholar 

  22. Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer, New York (2010)

    Google Scholar 

  23. Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26, 1819–1837 (2014)

    Article  Google Scholar 

  24. Tsoumakas, G., Katakis, I.: Multi label classification: an overview. Int. J. Data Warehouse. Min. 3, 1–13 (2007)

    Article  Google Scholar 

  25. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR). 34, 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  26. Naonori, U., Kazumi, S.: Parametric mixture models for multi-labeled text. In: Advances in Neural Information Processing Systems, vol. 15, pp. 721—728. MIT Press (2003)

    Google Scholar 

  27. David, D.L., Yiming, Y., Tony, G.R., Fan, L.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)

    Google Scholar 

  28. Srivastava, A.N., Zane-Ulman, B.: Discovering recurring anomalies in text reports regarding complex space systems. In: Aerospace Conference, pp. 3853–3862. IEEE (2005)

    Google Scholar 

  29. Mark D.S., James A., Ben, C.: A comparison of statistical significance tests for information retrieval evaluation. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 623–632. ACM Press, Lisbon, Portugal (2007)

    Google Scholar 

Download references

Acknowledgments

The research of this paper is financially supported by the Malaysian Ministry of Education (MOE) grant no. ERGS/1/2013/ICT07/UKM/02/5.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roiss Alhutaish .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Alhutaish, R., Omar, N., Abdullah, S. (2015). A Comparison of Multi-label Feature Selection Methods Using the Algorithm Adaptation Approach. In: Badioze Zaman, H., et al. Advances in Visual Informatics. IVIC 2015. Lecture Notes in Computer Science(), vol 9429. Springer, Cham. https://doi.org/10.1007/978-3-319-25939-0_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25939-0_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25938-3

  • Online ISBN: 978-3-319-25939-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics