Skip to main content
Log in

Local and global feature selection for multilabel classification with binary relevance

An empirical comparison on flat and hierarchical problems

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Multilabel classification has become increasingly important for various use cases. Amongst the existing multilabel classification methods, problem transformation approaches, such as Binary Relevance, Pruned Problem Transformation, and Classifier Chains, are some of the most popular, since they break a global multilabel classification problem into a set of smaller binary or multiclass classification problems. Transformation methods enable the use of two different feature selection approaches: local, where the selection is performed independently for each of the transformed problems, and global, where the selection is performed on the original dataset, meaning that all local classifiers work on the same set of features. While global methods have been widely researched, local methods have received little attention so far. In this paper, we compare those two strategies on one of the most straight forward transformation approaches, i.e., Binary Relevance. We empirically compare their performance on various flat and hierarchical multilabel datasets of different application domains. We show that local outperforms global feature selection in terms of classification accuracy, without drawbacks in runtime performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://www.kaggle.com/c/lshtc.

  2. http://bailando.sims.berkeley.edu/enron_email.html.

  3. http://www.aifb.kit.edu/web/Web_Science_und_Wissensmanagement/Portal.

  4. Note that this approach is only for generating benchmarks for hierarchical classification, and for comparing approaches to each other. However, we cannot transfer the results trivially to make a statement about how well the approaches work for the actual type prediction task in the original datasets.

  5. http://mulan.sourceforge.net/datasets-mlc.html.

  6. http://meka.sourceforge.net/#datasets.

  7. https://dtai.cs.kuleuven.be/clus/hmcdatasets/.

  8. http://dl-learner.org.

  9. The complete set of plots can be found at http://data.dws.informatik.uni-mannheim.de/hmctp/plots/report.pdf.

References

  • Bi W, Kwok JT (2011) Multi-label classification on tree- and dag-structured hierarchies. In: Getoor L, Scheffer T (eds) Proceedings of the 28th international conference on machine learning (ICML-11). ACM, New York, NY, USA, pp 17–24. http://www.icml-2011.org/papers/10_icmlpaper.pdf

  • Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. Int J Semant Web Inf Syst 5(3):1–22

    Article  Google Scholar 

  • Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia—a crystallization point for the Web of Data. Web Semant 7(3):154–165

    Article  Google Scholar 

  • Blockeel H, Raedt LD, Ramong J (1998) Top-down induction of clustering trees. In: In Proceedings of the 15th international conference on machine learning, Morgan Kaufmann, pp 55–63

  • Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognition 37(9):1757–1771. doi:10.1016/j.patcog.2004.03.009. http://www.sciencedirect.com/science/article/B6V14-4CF14JX-1/2/a17089f241a1d23f218e55d2c8d9f763

  • Briggs F, Huang Y, Raich R, Eftaxias K, Lei Z, Cukierski W, Hadley S, Hadley A, Betts M, Fern X, Irvine J, Neal L, Thomas A, Fodor G, Tsoumakas G, Ng HW, Nguyen TNT, Huttunen H, Ruusuvuori P, Manninen T, Diment A, Virtanen T, Marzat J, Defretin J, Callender D, Hurlburt C, Larrey K, Milakov M (2013) The 9th annual mlsp competition: New methods for acoustic classification of multiple simultaneous bird species in a noisy environment. In: 2013 IEEE international workshop on machine learning for signal processing (MLSP), pp 1–8. doi:10.1109/MLSP.2013.6661934

  • Brucker F, Benites F, Sapozhnikova E (2011) An empirical comparison of flat and hierarchical performance measures for multi-label classification with hierarchy extraction, Springer, Berlin, Heidelberg, pp 579–589. doi:10.1007/978-3-642-23851-2_59

  • Carlson A, Betteridge J, Wang RC, Hruschka Jr ER, Mitchell TM (2010) Coupled semi-supervised learning for information extraction. In: Proceedings of the third ACM international conference on Web search and data mining, ACM, pp 101–110

  • Cerri R, Pappa GL, de Leon Ferreira de Carvalho ACP, Freitas AA (2015) An extensive evaluation of decision tree-based hierarchical multilabel classification methods and performance measures. Comput Intell 31(1):1–46. doi:10.1111/coin.12011

    Article  MathSciNet  Google Scholar 

  • Cesa-bianchi N, Zaniboni L, Collins M (2004) Incremental algorithms for hierarchical classification. J Mach Learn Res :31–54

  • Clare A, King RD (2001) Knowledge discovery in multi-label phenotype data. In: Proceedings of the 5th european conference on principles of data mining and knowledge discovery, PKDD’01, pp 42–53

  • Costa E, Lorena A, Carvalho A, Freitas A (2007) A review of performance evaluation measures for hierarchical classifiers. In: Drummond C, Elazmeh W, Japkowicz N, Macskassy S (eds) Evaluation methods for machine learning II: papers from the AAAI-2007 Workshop, AAAI Technical Report WS-07-05, AAAI Press, pp 182–196. http://www.cs.kent.ac.uk/pubs/2007/2611

  • Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156

    Article  Google Scholar 

  • de Lannoy G, Franois D, Verleysen M (2011) Class-specific feature selection for one-against-all multiclass svms. In: ESANN. http://dblp.uni-trier.de/db/conf/esann/esann2011.html#LannoyFV11

  • Dimitrovski I, Kocev D, Loskovska S, Dzeroski S (2011) Hierarchical annotation of medical images. Pattern Recogn 44(10–11): 2436–2449. http://dblp.uni-trier.de/db/journals/pr/pr44.html#DimitrovskiKLD11

  • Diplaris S, Tsoumakas G, Mitkas PA, Vlahavas IP (2005) Protein classification with multiple algorithms. In: Bozanis P, Houstis EN (eds) Panhellenic conference on informatics, Lecture notes in computer science, vol. 3746, Springer, pp 448–456. http://dblp.uni-trier.de/db/conf/pci/pci2005.html#DiplarisTMV05

  • Doquire G, Verleysen M (2011) Feature selection for multi-label classification problems. In: Cabestany J, Rojas I, Caparrós GJ (eds) Advances in computational intelligence - 11th international work-conference on artificial neural networks, IWANN 2011, Torremolinos-Málaga, Spain. Proceedings, Part I, Lecture notes in computer science, vol. 6691, Springer, pp 9–16. doi:10.1007/978-3-642-21501-8_2

  • Duygulu P, Barnard K, Freitas JFGd, Forsyth DA (2002) Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Proceedings of the 7th European conference on computer vision-part IV, ECCV ’02, Springer, London, pp 97–112.http://dl.acm.org/citation.cfm?id=645318.649254

  • Eisner R, Poulin B, Szafron D, Lu P, Greiner R (2005) Improving protein function prediction using the hierarchical structure of the gene ontology. In: Proceedings of IEEE CIBCB

  • Elisseeff A, Weston J (2002) A kernel method for multi-labelled classification. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14 (NIPS-01), pp 681–687

  • Fagni T, Sebastiani F (2007) On the selection of negative examples for hierarchical text categorization. In: Proceedings of The 3rd language technology conference, pp 24–28

  • Huda S, Yearwood J, Stranieri A (2011) Hybrid wrapper-filter approaches for input feature selection using maximum relevance-minimum redundancy and artificial neural network input gain measurement approximation (annigma). In: Proceedings of the thirty-fourth australasian computer science conference - Volume 113, ACSC ’11, Australian Computer Society, Inc., Darlinghurst, Australia, pp 43–52. http://dl.acm.org/citation.cfm?id=2459296.2459302

  • Katakis I, Tsoumakas G, Vlahavas I (2008) Multilabel text classification for automated tag suggestion. In: Proceedings of the ECML/PKDD-08 workshop on discovery challenge

  • Kira K, Rendell LA (1992) The feature selection problem: Traditional methods and a new algorithm. In: Proceedings of the tenth national conference on artificial intelligence, AAAI’92, AAAI Press, pp 129–134. http://dl.acm.org/citation.cfm?id=1867135.1867155

  • Kiritchenko S, Matwin S, Famili AF (2005) Functional annotation of genes using hierarchical text categorization. In: Proceedings of the BioLINK SIG: linking literature, information and knowledge for biology (held at ISMB-05)

  • Kiritchenko S, Matwin S, Nock R, Famili AF (2006) Learning and evaluation in the presence of class hierarchies: application to text categorization. In: Proceedings of the 19th international conference on advances in artificial intelligence: Canadian society for computational studies of intelligence, AI’06, Springer, Berlin, Heidelberg, pp 395–406. doi:10.1007/11766247_34

  • Kosmopoulos A, Paliouras G, Androutsopoulos I (2014) The effect of dimensionality reduction on large scale hierarchical classification. In: Proceedings of information access evaluation. multilinguality, multimodality, and interaction - 5th international conference of the clef initiative, CLEF 2014, Sheffield, UK, pp 160–171. doi:10.1007/978-3-319-11382-1_16

  • Kosmopoulos A, Partalas I, Gaussier E, Paliouras G, Androutsopoulos I (2015) Evaluation measures for hierarchical classification: a unified view and novel approaches. Data Min Knowl Discov 29(3):820–865. doi:10.1007/s10618-014-0382-x

    Article  MathSciNet  Google Scholar 

  • Kozachenko LF, Leonenko NN (1987) Sample estimate of the entropy of a random vector. Probl Inf Trans 23(1–2):95–101

    MATH  Google Scholar 

  • Labrou YK (1999) Yahoo as an ontology - using Yahoo categories to describe documents. In: Proceedings of the 1999 ACM conference on information and knowledge management (CIKM’99)

  • Lewis DD, Yang Y, Rose TG, Li F (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5: 361–397. http://dl.acm.org/citation.cfm?id=1005332.1005345

  • Madjarov G, Kocev D, Gjorgjevikj D, Deroski S (2012) An extensive experimental comparison of methods for multi-label learning. Pattern Recogn 45(9):3084–3104

    Article  Google Scholar 

  • Mahdisoltani F, Biega J, Suchanek FM (2015) YAGO3: a knowledge base from multilingual wikipedias. In: Conference on innovative data systems research

  • Melo A, Paulheim H, Völker J (2016) Type prediction in rdf knowledge bases using hierarchical multilabel classification. In: 6th international conference on web-intelligence, mining and semantics (WIMS)

  • Molina LC, Belanche L, Nebot À (2002) Feature selection algorithms: a survey and experimental evaluation. In: International conference on data mining (ICDM), IEEE, pp 306–313

  • Opitz DW (1999) Feature selection for ensembles. In: Proceedings of 16th national conference on artificial intelligence AAAI Press, pp 379–384

  • Otero FE, Freitas AA, Johnson CG (2009) A hierarchical classification ant colony algorithm for predicting gene ontology terms. In: Proceedings of the 7th European conference on evolutionary computation, machine learning and data mining in bioinformatics, EvoBIO ’09, Springer, Berlin, Heidelberg, pp 68–79. doi:10.1007/978-3-642-01184-9_7

  • Partalas I, Kosmopoulos A, Baskiotis N, Artieres T, Paliouras G, Gaussier E, Androutsopoulos I, Amini MR, Galinari P (2015) Lshtc: a benchmark for large-scale text classification. CoRR abs/1503.08581. http://arxiv.org/abs/1503.08581

  • Paulheim H, Fürnkranz J (2012) Unsupervised generation of data mining features from linked open data. In: International conference on web intelligence, mining, and semantics (WIMS’12)

  • Pestian JP, Brew C, Matykiewicz P, Hovermale DJ, Johnson N, Cohen KB, Duch W (2007) A shared task involving multi-label classification of clinical free text. In: Proceedings of the workshop on BioNLP 2007: biological, translational, and clinical language processing, BioNLP ’07, Association for Computational Linguistics, Stroudsburg, PA, USA, pp 97–104. http://dl.acm.org/citation.cfm?id=1572392.1572411

  • Qu H, Zhang S, Liu H, Zhao J (2011) A multi-label classification algorithm based on label-specific features. Wuhan Univ J Nat Sci 16(6):520–524. doi:10.1007/s11859-011-0791-2

    Article  Google Scholar 

  • Read J (2008) A pruned problem transformation method for multi-label classification. In: Proceedings of 2008 New Zealand computer science research student conference (NZCSRS), pp 143–150

  • Read J, Bifet A, Holmes G, Pfahringer B (2012) Scalable and efficient multi-label classification for evolving data streams. Mach Learn 88(1–2):243–272. doi:10.1007/s10994-012-5279-6

    Article  MathSciNet  Google Scholar 

  • Read J, Pfahringer B, Holmes G (2008) Multi-label classification using ensembles of pruned sets. In: ICDM, IEEE Computer Society, pp 995–1000. http://dblp.uni-trier.de/db/conf/icdm/icdm2008.html#ReadPH08

  • Read J, Pfahringer B, Holmes G, Frank E (2009) Classifier chains for multi-label classification. In: Proceedings of the european conference on machine learning and knowledge discovery in databases: Part II, ECML PKDD’09, pp 254–269

  • Ristoski P, Paulheim H (2014) A comparison of propositionalization strategies for creating features from linked open data. In: LD4KD

  • Ristoski P, de Vries GKD, Paulheim H (2016) A collection of benchmark datasets for systematic evaluations of machine learning on the semantic web. In: International semantic web conference, Springer

  • Saeys Y, Abeel T, Peer Y (2008) Robust feature selection using ensemble feature selection techniques. In: Proceedings of the European conference on machine learning and knowledge discovery in databases - Part II, ECML PKDD ’08, Springer, Berlin, Heidelberg, pp 313–325. doi:10.1007/978-3-540-87481-2_21

  • Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39(2/3):135–168

    Article  MATH  Google Scholar 

  • Silla CN Jr, Freitas AA (2011) A survey of hierarchical classification across different application domains. Data Min Knowl Discov 22(1–2):31–72. doi:10.1007/s10618-010-0175-9

    Article  MathSciNet  MATH  Google Scholar 

  • Slavkov I, Karcheska J, Kocev D, Kalajdziski S, Dzeroski S (2013) Relieff for hierarchical multi-label classification. In: Appice A, Ceci M, Loglisci C, Manco G, Masciari E, Ras ZW (eds) New frontiers in mining complex patterns - second international workshop, NFMCP 2013, held in conjunction with ECML-PKDD 2013, Prague, Czech Republic, September 27, 2013, Revised selected papers, Lecture notes in computer science, vol 8399, Springer, pp 148–161. doi:10.1007/978-3-319-08407-7_10

  • Spolaôr N, Tsoumakas G (2013) Evaluating feature selection methods for multi-label text classication. In: Ngomo AN, Paliouras G (eds) Proceedings of the first Workshop on bio-medical semantic indexing and question answering, a post-conference workshop of conference and labs of the evaluation forum 2013 (CLEF 2013) , Valencia, Spain, September 27th, 2013, CEUR Workshop Proceedings, vol 1094. CEUR-WS.org

  • Srivastava A, Zane-Ulman B (2005) Discovering recurring anomalies in text reports regarding complex space systems. In: Proceedings of the 2005 IEEE aerospace conference

  • Trohidis K., Tsoumakas G, Kalliris G, Vlahavas IP (2008) Multi-label classification of music into emotions. In: Bello JP, Chew E, Turnbull D (eds) ISMIR, pp 325–330. http://dblp.uni-trier.de/db/conf/ismir/ismir2008.html#TrohidisTKV08

  • Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 2007:1–13

    Article  Google Scholar 

  • Tsoumakas G, Katakis I, Vlahavas I (2011) Random k-labelsets for multi-label classification. IEEE Trans Knowl Data Eng. doi:10.1109/TKDE.2010.164

  • Tsoumakas G, Katakis I, Vlahavas I (2008) Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings ECML/PKDD 2008 workshop on mining multidimensional data (MMD’08)

  • Turnbull D, Barrington L, Torres DA, Lanckriet GRG (2008) Semantic annotation and retrieval of music and sound effects. IEEE Trans Audio Speech Lang Process 16(2): 467–476. http://dblp.uni-trier.de/db/journals/taslp/taslp16.html#TurnbullBTL08

  • Ueda N, Saito K (2003) Parametric mixture models for multi-labeled text. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems 15, MIT Press, pp 737–744. http://papers.nips.cc/paper/2244-parametric-mixture-models-for-multi-labeled-text.pdf

  • Vens C, Struyf J, Schietgat L, Džeroski S, Blockeel H (2008) Decision trees for hierarchical multi-label classification. Mach Learn 73(2):185–214

    Article  Google Scholar 

  • Vrandečić D, Krötzsch M (2014) Wikidata: a free collaborative knowledge base. Commun ACM 57(10):78–85

    Article  Google Scholar 

  • Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83. doi:10.2307/3001968

    Article  Google Scholar 

  • Zhang M, Wu L (2015) Lift: Multi-label learning with label-specific features. IEEE Trans Pattern Anal Mach Intell 37(1):107–120. doi:10.1109/TPAMI.2014.2339815

    Article  Google Scholar 

  • Zhang M, Zhou Z (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837. doi:10.1109/TKDE.2013.39

    Article  Google Scholar 

  • Zhang ML, Peña JM, Robles V (2009) Feature selection for multi-label naive bayes classification. Inf Sci 179(19):3218–3229. doi:10.1016/j.ins.2009.06.010

    Article  MATH  Google Scholar 

  • Zhang ML, Zhou ZH (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048

    Article  MATH  Google Scholar 

  • Zhu Z, Ong YS, Dash M (2007) Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans Syst Man Cybern Part B 37(1):70–76. http://dblp.uni-trier.de/db/journals/tsmc/tsmcb37.html#ZhuOD07

Download references

Acknowledgements

The work presented in this paper has been partly supported by the Ministry of Science, Research and the Arts Baden-Württemberg in the project \(\hbox {SyKo}^2\hbox {W}^2\) (Synthesis of Completion and Correction of Knowledge Graphs on the Web).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to André Melo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Melo, A., Paulheim, H. Local and global feature selection for multilabel classification with binary relevance. Artif Intell Rev 51, 33–60 (2019). https://doi.org/10.1007/s10462-017-9556-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-017-9556-4

Keywords

Navigation