Machine Learning

, Volume 108, Issue 8–9, pp 1193–1230 | Cite as

Dynamic principal projection for cost-sensitive online multi-label classification

  • Hong-Min Chu
  • Kuan-Hao Huang
  • Hsuan-Tien LinEmail author
Part of the following topical collections:
  1. Special Issue of the ECML PKDD 2019 Journal Track


We study multi-label classification (MLC) with three important real-world issues: online updating, label space dimension reduction (LSDR), and cost-sensitivity. Current MLC algorithms have not been designed to address these three issues simultaneously. In this paper, we propose a novel algorithm, cost-sensitive dynamic principal projection (CS-DPP) that resolves all three issues. The foundation of CS-DPP is an online LSDR framework derived from a leading LSDR algorithm. In particular, CS-DPP is equipped with an efficient online dimension reducer motivated by matrix stochastic gradient, and establishes its theoretical backbone when coupled with a carefully-designed online regression learner. In addition, CS-DPP embeds the cost information into label weights to achieve cost-sensitivity along with theoretical guarantees. Experimental results verify that CS-DPP achieves better practical performance than current MLC algorithms across different evaluation criteria, and demonstrate the importance of resolving the three issues simultaneously.


Multi-label classification Cost-sensitive Label space dimension reduction 



  1. Arora, R., Cotter, A., & Srebro, N. (2013). Stochastic optimization of PCA with capped MSG. In NIPS 2013 (pp. 1815–1823).Google Scholar
  2. Balasubramanian, K., & Lebanon, G. (2012). The landmark selection method for multiple output prediction. In ICML 2012.Google Scholar
  3. Bartlett, P. (2008). Online convex optimization: Ridge regression, adaptivity. Accessed 4 Nov 2017.
  4. Bello, J. P., Chew, E., & Turnbull, D. (2008). Multilabel classification of music into emotions. In ICMIR 2008 (pp. 325–330).Google Scholar
  5. Bhatia, K., Jain, H., Kar, P., Varma, M., & Jain, P. (2015). Sparse local embeddings for extreme multi-label classification. In NIPS 2015 (pp. 730–738).Google Scholar
  6. Bi, W., & Kwok, J. T. (2013). Efficient multi-label classification with many labels. In ICML 2013 (pp. 405–413).Google Scholar
  7. Chen, Y., & Lin, H. (2012). Feature-aware label space dimension reduction for multi-label classification. In NIPS 2012 (pp. 1538–1546).Google Scholar
  8. Chua, T., Tang, J., Hong, R., Li, H., Luo, Z., & Zheng, Y. (2009). NUS-WIDE: A real-world web image database from National University of Singapore. In CIVR 2009.Google Scholar
  9. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research, 7, 551–585.MathSciNetzbMATHGoogle Scholar
  10. Dembczynski, K., Cheng, W., & Hüllermeier, E. (2010). Bayes optimal multilabel classification via probabilistic classifier chains. In ICML 2010 (pp. 279–286).Google Scholar
  11. Dembczynski, K., Waegeman, W., Cheng, W., & Hüllermeier, E. (2011). An exact algorithm for F-measure maximization. In NIPS 2011 (pp. 1404–1412).Google Scholar
  12. Elisseeff, A., & Weston, J. (2001). A kernel method for multilabelled classification. In NIPS 2001.Google Scholar
  13. Hsu, D., Kakade, S., Langford, J., & Zhang, T. (2009). Multi-label prediction via compressed sensing. In NIPS 2009 (pp. 772–780).Google Scholar
  14. Kapoor, A., Viswanathan, R., & Jain, P. (2012). Multilabel classification using Bayesian compressed sensing. In NIPS 2012 (pp. 2654–2662).Google Scholar
  15. Li, C., & Lin, H. (2014). Condensed filter tree for cost-sensitive multi-label classification. In ICML 2014 (pp. 423–431).Google Scholar
  16. Li, C., Lin, H., & Lu, C. (2016). Rivalry of two families of algorithms for memory-restricted streaming PCA. In AISTATS 2016.Google Scholar
  17. Lin, Z., Ding, G., Hu, M., & Wang, J. (2014). Multi-label classification via feature-aware implicit label space encoding. In ICML 2014 (pp. 325–333).Google Scholar
  18. Liu, W., Tsang, I. W., & Müller, K. R. (2017). An easy-to-hard learning paradigm for multiple classes and multiple labels. Journal of Machine Learning Research, 18, 1–38.MathSciNetzbMATHGoogle Scholar
  19. Lo, H., Wang, J., Wang, H., & Lin, S. (2011). Cost-sensitive multi-label learning for audio tag annotation and retrieval. IEEE Transactions on Multimedia, 13(3), 518–529.CrossRefGoogle Scholar
  20. Mao, Q., Tsang, I. W. H., & Gao, S. (2013). Objective-guided image annotation. IEEE Transactions on Image Processing, 22, 1585–1597.MathSciNetCrossRefzbMATHGoogle Scholar
  21. Nie, J., Kotlowski, W., & Warmuth, M. K. (2016). Online PCA with optimal regrets. Journal of Machine Learning Research, 17, 194–200.MathSciNetzbMATHGoogle Scholar
  22. Osojnik, A., Panov, P., & Deroski, S. (2017). Multi-label classification via multi-target regression on data streams. Machine Learning, 106, 745–770.MathSciNetCrossRefzbMATHGoogle Scholar
  23. Read, J., Bifet, A., Holmes, G., & Pfahringer, B. (2011). Streaming multi-label classification. In Proceedings of the workshop on applications of pattern analysis (WAPA) 2011 (pp. 19–25).Google Scholar
  24. Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2011). Classifier chains for multi-label classification. Machine Learning, 85(3), 333–359.MathSciNetCrossRefGoogle Scholar
  25. Sun, L., Ji, S., & Ye, J. (2011). Canonical correlation analysis for multilabel classification: A least-squares formulation, extensions, and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 194–200.CrossRefGoogle Scholar
  26. Tai, F., & Lin, H. (2012). Multilabel classification with principal label space transformation. Neural Computation, 24(9), 2508–2542.MathSciNetCrossRefzbMATHGoogle Scholar
  27. Tang, L., Rajan, S., & Narayanan, V. K. (2009). Large scale multi-label classification via metalabeler. In WWW 2009 (pp. 211–220).Google Scholar
  28. Tsoumakas, G., Katakis, I., & Vlahavas, I. P. (2010). Mining multi-label data. In O. Maimon & L. Rokach (Eds.), Data mining and knowledge discovery handbook (2nd ed., pp. 667–685). Boston, MA: Springer.Google Scholar
  29. Tsoumakas, G., & Vlahavas, I. P. (2007). Random k-labelsets: An ensemble method for multilabel classification. In ECML 2007 (pp. 406–417).Google Scholar
  30. Tsoumakas, G., Xioufis, E. S., Vilcek, J., & Vlahavas, I. P. (2011). MULAN: A java library for multi-label learning. Journal of Machine Learning Research, 12, 2411–2414.MathSciNetzbMATHGoogle Scholar
  31. Wu, Y., & Lin, H. (2017). Progressive \(k\)-labelsets for cost-sensitive multi-label classification. Machine Learning, 106(5), 671–694.MathSciNetCrossRefzbMATHGoogle Scholar
  32. Xioufis, E. S., Spiliopoulou, M., Tsoumakas, G., & Vlahavas, I. P. (2011). Dealing with concept drift and class imbalance in multi-label stream classification. In IJCAI 2011 (pp. 1583–1588).Google Scholar
  33. Yu, H., Jain, P., Kar, P., & Dhillon, I. S. (2014). Large-scale multi-label learning with missing labels. In ICML 2014 (pp. 593–601).Google Scholar
  34. Zhang, X., Graepel, T., & Herbrich, R. (2010). Bayesian online learning for multi-label and multi-variate performance measures. In AISTATS 2010.Google Scholar

Copyright information

© The Author(s) 2019

Authors and Affiliations

  1. 1.CSIE DepartmentNational Taiwan UniversityTaipeiTaiwan

Personalised recommendations