Skip to main content
Log in

Categorizing feature selection methods for multi-label classification

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

In many important application domains such as text categorization, biomolecular analysis, scene classification and medical diagnosis, examples are naturally associated with more than one class label, giving rise to multi-label classification problems. This fact has led, in recent years, to a substantial amount of research on feature selection methods that allow the identification of relevant and informative features for multi-label classification. However, the methods proposed for this task are scattered in the literature, with no common framework to describe them and to allow an objective comparison. Here, we revisit a categorization of existing multi-label classification methods and, as our main contribution, we provide a comprehensive survey and novel categorization of the feature selection techniques that have been created for the multi-label classification setting. We conclude this work with concrete suggestions for future research in multi-label feature selection which have been derived from our categorization and analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–168

    Article  Google Scholar 

  • Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28

    Article  Google Scholar 

  • Chen W, Yan J, Zhang B, Chen Z, Yang Q (2007) Document transformation for multi-label feature selection in text categorization. In: Proceedings of the 7th IEEE international conference on data mining. pp 451–456

  • Cheng W, Hüllermeier E (2009) Combining instance-based learning and logistic regression for multilabel classification. Mach Learn 76(2–3):211–225

    Article  Google Scholar 

  • Clare A, King RD (2001) Knowledge discovery in multi-label phenotype data. In: Proceedings of the 5th European conference on principles of data mining and knowledge discovery. pp 42–53

  • Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27

    Article  MATH  Google Scholar 

  • Dasarathy BV (1991) Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press, Los Alamitos

    Google Scholar 

  • Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156

    Article  Google Scholar 

  • de Carvalho ACPLF, Freitas AA (2009) A tutorial on multi-label classification techniques. In: Abraham A, Hassanien A-E, Snášel V (eds) Foundations of Computational Intelligence Volume 5. Springer, Berlin, pp 177–195

  • De Comité F, Gilleron R, Tommasi M (2003) Learning multi-label alternating decision trees from texts and data. In: Proceedings of the 3rd international conference on machine learning and data mining in pattern recognition. Springer, pp 35–49

  • Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2012) On label dependence and loss minimization in multi-label classification. Mach Learn 88(1–2):5–45

    Article  MathSciNet  MATH  Google Scholar 

  • Dendamrongvit S, Vateekul P, Kubat M (2011) Irrelevant attributes and imbalanced classes in multi-label text-categorization domains. Intell Data Anal 15(6):843–859

    Google Scholar 

  • Doquire G, Verleysen M (2011) Feature selection for multi-label classification problems. In: Proceedings of the 11th conference on artificial neural networks on advances in computational intelligence. Springer, pp 9–16

  • Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. Adv Neural Inf Process Syst 14:681–687

    Google Scholar 

  • Forman G (2004) A pitfall and solution in multi-class feature selection for text classification. In: Proceedings of the 21st international conference on machine learning. ACM, pp 1–38

  • Fürnkranz J, Hüllermeier E, Loza Mencía E, Brinker K (2008) Multilabel classification via calibrated label ranking. Mach Learn 73(2):133–153

    Article  Google Scholar 

  • Gibaja E, Ventura S (2015) A tutorial on multilabel learning. ACM Comput Surv (CSUR) 47(3):52

    Article  Google Scholar 

  • Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Proceedings of the 8th Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 22–30

  • Gu Q, Li Z, Han J (2011) Correlated multi-label feature selection. In: Proceedings of the 20th ACM international conference on information and knowledge management. pp 1087–1096

  • Guyon I, Elisseeff A (2006) An introduction to feature extraction. In: Guyon I, Nikravesh M, Gunn S, Zadeh LA (eds) Feature extraction, foundations and applications. Springer, Berlin, pp 1–24

  • Guyon I, Gunn S, Nikravesh M, Zadeh LA (2008) Feature extraction: foundations and applications, vol 207. Springer, Berlin

    MATH  Google Scholar 

  • Hüllermeier E, Fürnkranz J, Cheng W, Brinker K (2008) Label ranking by learning pairwise preferences. Artif Intell 172(16–17):1897–1916

    Article  MathSciNet  MATH  Google Scholar 

  • Jungjit S, Freitas A (2015) A lexicographic multi-objective genetic algorithm for multi-label correlation based feature selection. In: Proceedings of the companion publication of the 2015 annual conference on genetic and evolutionary computation. ACM, pp 989–996

  • Jungjit S, Michaelis M, Freitas AA, Cinatl J (2013) Two extensions to multi-label correlation-based feature selection: a case study in bioinformatics. In: Proceedings of the IEEE international conference on systems, man, and cybernetics. IEEE, pp 1519–1524

  • Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In: Proceedings of the science and information conference (SAI). IEEE, pp 372–378

  • Kocev D, Slavkov I, Dzeroski S (2013) Feature ranking for multi-label classification using predictive clustering trees. In: International workshop on solving complex machine learning problems with ensemble methods, in conjunction with ECML/PKDD. pp 56–68

  • Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324

    Article  MATH  Google Scholar 

  • Kong X, Yu PS (2012) gmlc: a multi-label feature selection framework for graph classification. Knowl Inf Syst 31(2):281–305

    Article  Google Scholar 

  • Lastra G, Luaces O, Quevedo JR, Bahamonde A (2011) Graphical feature selection for multilabel classification tasks. In: Proceedings of the 10th international conference on advances in intelligent data analysis. pp 246–257

  • Lee J, Kim DW (2013) Feature selection for multi-label classification using multivariate mutual information. Pattern Recognit Lett 34(3):349–357

    Article  Google Scholar 

  • Li GZ, You M, Ge L, Yang JY, Yang MQ (2010) Feature selection for semi-supervised multi-label learning with application to gene function analysis. In: Proceedings of the 1st ACM international conference on bioinformatics and computational biology. pp 354–357

  • Li L, Liu H, Ma Z, Mo Y, Duan Z, Zhou J, Zhao J (2014) Multi-label feature selection via information gain. In: Advanced data mining and applications, lecture notes in computer science. Springer International Publishing, pp 345–355

  • Li R, Zhang Y, Lu Z, Lu J, Tian Y (2010) Technique of image retrieval based on multi-label image annotation. In: Proceedings of the 2nd international conference on multimedia and information technology (MMIT), vol 2. IEEE, pp 10–13

  • Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of the 4th international conference on knowledge discovery and data mining. pp 80–86

  • Liu H, Motoda H (eds) (2008) Less is more. In: Computational methods of feature selection. Chapman & Hall/CRC, Boca Raton, pp 3–17

  • Liu Y, Jin R, Yang L (2006) Semi-supervised multi-label learning by constrained non-negative matrix factorization. In: Proceedings of the 21st national conference on artificial intelligence. pp 421–426

  • Mencía EL, Furnkranz J (2008) Pairwise learning of multilabel classifications with perceptrons. In: Proceeding of the 2008 IEEE international joint conference on neural networks. pp 2899–2906

  • Molina LC, Belanche L, Nebot A (2002) Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of the 2002 IEEE international conference on data mining. pp 306–313

  • Olsson J, Oard DW (2006) Combining feature selectors for text classification. In: Proceedings of the 15th ACM international conference on information and knowledge management. ACM, pp 798–799

  • Pereira RB, Plastino A, Zadrozny B, Merschmann LH (2015) Information gain feature selection for multi-label classification. J Inf Data Manag 6(1):48

    Google Scholar 

  • Pupo OGR, Morell C, Soto SV (2013) ReliefF-ML: an extension of ReliefF algorithm to multi-label learning. In: Ruiz-Shulcloper J, Sanniti di Baja G (eds) Progress in pattern recognition, image analysis, computer vision, and applications. Springer, Berlin, pp 528–535

  • Quinlan JR (1986) Induction of decision trees. Mach Lear 1:81–106

    Google Scholar 

  • Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, Massachusetts

    Google Scholar 

  • Read J (2008) A pruned problem transformation method for multi-label classification. In: Proceedings of the New Zealand computer science research student conference. pp 143–150

  • Read J (2010) Scalable multilabel classification. Ph.D. dissertation, Hamilton

  • Read J, Pfahringer B, Holmes G, Frank E (2009) Classifier chains for multi-label classification. In: Proceedings of the 20th European conference on machine learning and knowledge discovery in databases. pp 254–269

  • Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359

    Article  MathSciNet  Google Scholar 

  • Reyes O, Morell C, Ventura S (2015) Scalable extensions of the relieff algorithm for weighting and selecting features on the multi-label learning context. Neurocomputing 161:168–182

    Article  Google Scholar 

  • Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Rogati M, Yang Y (2002) High-performing feature selection for text classification. In: Proceedings of the 11th international conference on information and knowledge management. ACM, pp 659–661

  • Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517

    Article  Google Scholar 

  • Sechidis K, Nikolaou N, Brown G (2014) Information theoretic feature selection in multi-label data through composite likelihood. In: Fränti P, Brown G, Loog M, Escolano F, Pelillo M (eds) Structural, syntactic, and statistical pattern recognition. Springer, Berlin, pp 143–152

  • Shao H, Li G, Liu G, Wang Y (2013) Symptom selection for multi-label data of inquiry diagnosis in traditional chinese medicine. Sci China Inf Sci 56(5):1–13

    Article  MathSciNet  Google Scholar 

  • Sorower MS (2010) A literature survey on algorithms for multi-label learning. Technical Report, Oregon State University, Corvallis

  • Spolaôr N, Monard MC (2014) Evaluating relieff-based multi-label feature selection algorithm. In: Proceedings of the 14th edition of the Ibero-American conference on artificial intelligence. Springer, pp 194–205

  • Spolaôr N, Tsoumakas G (2013) Evaluating feature selection methods for multi-label text classification. In: Proceedings of the first workshop on bio-medical semantic indexing and question answering

  • Spolaôr N, Cherman EA, Monard MC, Lee HD (2013) A comparison of multi-label feature selection methods using the problem transformation approach. Electron Notes Theor Comput Sci 292:135–151

    Article  Google Scholar 

  • Spolaôr N, Monard MC, Tsoumakas G, Lee HD (2015) A systematic review of multi-label feature selection and a new method based on label construction. Neurocomput Prog Intell Syst Des 180:3–15

    Article  Google Scholar 

  • Trohidis K, Tsoumakas G, Kalliris G, Vlahavas IP (2008) Multi-label classification of music into emotions. In: Bello JP, Chew E, Turnbull D (eds) Proceedings of the 9th international conference on music information retrieval. pp 325–330

  • Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 3(3):1–13

  • Tsoumakas G, Vlahavas I (2007) Random k-labelsets: an ensemble method for multilabel classification. In: Proceedings of the 18th European conference on machine learning. pp 406–417

  • Tsoumakas G, Dimou A, Spyromitros E, Mezaris V, Kompatsiaris I, Vlahavas I (2009) Correlation based pruning of stacked binary relevance models for multi-label learning. In: Proceedings of the 1st international workshop on learning from multi-label data. pp 101–116

  • Tsoumakas G, Katakis I, Vlahavas I (2010) Mining multi-label data. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, Berlin, pp 667–685

    Google Scholar 

  • Wandekokem ED, Varejão FM, Rauber TW (2010) An overproduce-and-choose strategy to create classifier ensembles with tuned svm parameters applied to real-world fault diagnosis. In: Progress in pattern recognition, image analysis, computer vision, and applications, Lecture notes in computer science, vol 6419. Springer, pp 500–508

  • Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Proceedings of the 14th international conference on machine learning. pp 412–420

  • Yu K, Yu S, Tresp V (2005) Multi-label informed latent semantic indexing. In: Proceedings of the 28th ACM SIGIR conference on research and development in information retrieval. pp 258–265

  • Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224

    MathSciNet  MATH  Google Scholar 

  • Zhang ML, Zhou ZH (2006) Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18:1338–1351

    Article  Google Scholar 

  • Zhang ML, Zhou ZH (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048

    Article  MATH  Google Scholar 

  • Zhang ML, Zhou ZH (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26:1819–1837

    Article  Google Scholar 

  • Zhang ML, Peña JM, Robles V (2009) Feature selection for multi-label naive bayes classification. Inf Sci 179(19):3218–3229

    Article  MATH  Google Scholar 

  • Zhang Y, Zhou ZH (2010) Multilabel dimensionality reduction via dependence maximization. ACM Trans Knowl Discov Data 4(3):1411–1421

    Article  Google Scholar 

  • Zheng Z, Wu X, Srihari R (2004) Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor Newslett 6(1):80–89

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafael B. Pereira.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pereira, R.B., Plastino, A., Zadrozny, B. et al. Categorizing feature selection methods for multi-label classification. Artif Intell Rev 49, 57–78 (2018). https://doi.org/10.1007/s10462-016-9516-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-016-9516-4

Keywords

Navigation