Multi-label classification by polytree-augmented classifier chains with label-dependent features

  • 220 Accesses

  • 2 Citations


Multi-label classification faces several critical challenges, including modeling label correlations, mitigating label imbalance, removing irrelevant and redundant features, and reducing the complexity for large-scale problems. To address these issues, in this paper, we propose a novel method—polytree-augmented classifier chains with label-dependent features—that models label correlations through flexible polytree structures based on low-dimensional label-dependent feature spaces learned by a two-stage feature selection approach. First, a feature weighting approach is applied to efficiently remove irrelevant features for each label and mitigate the effect of label imbalance. Second, a polytree structure is built in the label space using estimated conditional mutual information. Third, an appropriate label-dependent feature subset is found by taking account of label correlations in the polytree. Extensive empirical studies on six synthetic datasets and 12 real-world datasets demonstrate the superior performance of the proposed method. In addition, by incorporating the proposed two-stage feature selection approach, the multi-label classifiers with label-dependent features achieve on average 9.4% performance improvement in Exact-Match compared with the original classifiers.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 99

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  1. 1.

  2. 2.


  1. 1.

    Amoon M, Ga R, Daliri MR (2014) PSO-based optimal selection of Zernike moments for target discrimination in high-resolution SAR imagery. J Indian Soc Remote Sens 42(3):483–493

  2. 2.

    Aoki K, Kudo M (2002) Decision tree using class-dependent feature subsets. Struct Syntactic Stat Pattern Recognit 2396:761–769

  3. 3.

    Bhatia K, Jain H, Kar P, Varma M, Jain P (2015) Sparse local embeddings for extreme multi-label classification. Adv Neural Inf Process Syst 28:730–738

  4. 4.

    Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771

  5. 5.

    Brown G, Pocock A, Zhao MJ, Luján M (2012) Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J Mach Learn Res 13(Jan):27–66

  6. 6.

    Charte F, Rivera A, del Jesus M, Herrera F (2015) Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163:3–16

  7. 7.

    Chow C, Liu C (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 14(3):462–467

  8. 8.

    Daliri MR (2012a) Feature selection using binary particle swarm optimization and support vector machines for medical diagnosis. Biomed Tech Biomed Eng 57(5):395–402

  9. 9.

    Daliri MR (2012b) Predicting the cognitive states of the subjects in functional magnetic resonance imaging signals using the combination of feature selection strategies. Brain Topogr 25(2):129–135

  10. 10.

    Dembczynski K, Cheng W, Hullermeier E (2010) Bayes optimal multilabel classification via probabilistic classifier chains. In: Proceedings of the 27th international conference on machine learning, pp 279–286

  11. 11.

    Dembczynski K, Waegeman W, Cheng W, Hllermeier E (2012) On label dependence and loss minimization in multi-label classification. Mach Learn 88(1–2):5–45

  12. 12.

    Dembczyński K, Waegeman W, Hüllermeier E (2012) An analysis of chaining in multi-label classification. In: Proceedings of the 2012 European conference on artificial intelligence, vol 242. IOS Press, pp 294–299

  13. 13.

    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

  14. 14.

    Fürnkranz J, Hullermeier E, Mencia E, Brinker K (2008) Multilabel classification via calibrated label ranking. Mach Learn 73(2):133–153

  15. 15.

    Hall M (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th international conference on machine learning, pp 359–366

  16. 16.

    Huang J, Li G, Huang Q, Wu X (2016) Learning label-specific features and class-dependent labels for multi-label classification. IEEE Trans Knowl Data Eng 28(12):3309–3323

  17. 17.

    John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the 11th conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 338–345

  18. 18.

    Karegowda AG, Manjunath A, Jayaram M (2010) Comparative study of attribute selection using gain ratio and correlation based feature selection. Int J Inf Technol Knowl Manag 2(2):271–277

  19. 19.

    Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. AAAI 2:129–134

  20. 20.

    Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324

  21. 21.

    Kudo M, Sklansky J (1998) Classifier-independent feature selection for two-stage feature selection. Adv Pattern Recognit 1451:548–554

  22. 22.

    Kullback S, Leibler R (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

  23. 23.

    Kumar A, Vembu S, Menon AK, Elkan C (2012) Learning and inference in probabilistic classifier chains with beam search. In: Proceedings of the 2012 European conference on machine learning and knowledge discovery in databases—volume part I, ECML PKDD’12. Springer, Berlin, pp 665–680

  24. 24.

    Lee J, Kim DW (2015) Fast multi-label feature selection based on information-theoretic feature ranking. Pattern Recogn 48(9):2761–2771

  25. 25.

    Liu D, Nocedal J (1989) On the limited memory bfgs method for large scale optimization. Math Program 45(1–3):503–528

  26. 26.

    Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer Academic Publishers, Norwell

  27. 27.

    Nemenyi P (1963) Distribution-free multiple comparisons. Ph.D. thesis, Princeton University, New Jersey, USA

  28. 28.

    Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc., San Francisco, CA

  29. 29.

    Prabhu Y, Varma M (2014) Fastxml: a fast, accurate and stable tree-classifier for extreme multi-label learning. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 263–272

  30. 30.

    Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359

  31. 31.

    Read J, Martino L, Luengo D (2014) Efficient monte carlo methods for multi-dimensional learning with classifier chains. Pattern Recogn 47(3):1535–1546

  32. 32.

    Rebane G, Pearl J (1987) The recovery of causal polytrees from statistical data. In: Proceedings of the 3rd conference on uncertainty in artificial intelligence, pp 222–228

  33. 33.

    Sun L, Kudo M (2015) Polytree-augmented classifier chains for multi-label classification. In: Proceedings of the 24th international joint conference on artificial intelligence, pp 3834–3840

  34. 34.

    Tang L, Rajan S, Narayanan V (2009) Large scale multi-label classification via metalabeler. In: Proceedings of the 18th international conference on world wide web, pp 211–220

  35. 35.

    Toms J, Spolar N, Cherman E, Monard M (2014) A framework to generate synthetic multi-label datasets. Electron Notes Theor Comput Sci 302:155–176

  36. 36.

    Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehouse Min 3:1–13

  37. 37.

    Tsoumakas G, Katakis I, Vlahavas I (2008) Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of ECML/PKDD 2008 workshop on mining multidimensional data

  38. 38.

    Tsoumakas G, Katakis I, Vlahavas L (2011) Random k-label sets for multilabel classification. IEEE Trans Knowl Data Eng 23(7):1079–1089

  39. 39.

    Weston J, Bengio S, Usunier N (2011) Wsabie: scaling up to large vocabulary image annotation. In: Proceedings of the 22nd international joint conference on artificial intelligence, pp 2764–2770

  40. 40.

    Weston J, Makadia A, Yee H (2013) Label partitioning for sublinear ranking. In: Proceedings of the 30th international conference on machine learning, pp 181–189

  41. 41.

    Yang Y, Pedersen J (1997) A comparative study on feature selection in text categorization. In: Proceedings of the 14th international conference on machine learning, pp 412–420

  42. 42.

    Yu H, Jain P, Kar P, Dhillon IS (2014) Large-scale multi-label learning with missing labels. In: Proceedings of the 31st international conference on machine learning, pp 593–601

  43. 43.

    Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning, pp 856–863

  44. 44.

    Zaragoza J, Sucar L, Morales E, Bielza C, Naga PL (2011) Bayesian chain classifiers for multidimensional classification. In: Proceedings of the 22nd international joint conference on artificial intelligence, pp 2192–2197

  45. 45.

    Zhang M, Wu L (2015) Lift: multi-label learning with label-specific features. IEEE Trans Pattern Anal Mach Intell 37(1):107–120

  46. 46.

    Zhang M, Zhang K (2010) Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 999–1008

  47. 47.

    Zhang M, Zhou Z (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn 40:2038–2048

  48. 48.

    Zhang M, Pea J, Robles V (2009) Feature selection for multi-label naive Bayes classification. Inf Sci 179(19):3218–3229

  49. 49.

    Zhang Y, Zhou ZH (2010) Multilabel dimensionality reduction via dependence maximization. ACM Trans Knowl Discov Data 4(3):14:1–14:21

Download references

Author information

Correspondence to Lu Sun.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sun, L., Kudo, M. Multi-label classification by polytree-augmented classifier chains with label-dependent features. Pattern Anal Applic 22, 1029–1049 (2019).

Download citation


  • Multi-label classification
  • Label correlation
  • Polytree-augmented classifier chain
  • Label-dependent feature
  • Label imbalance