Skip to main content
Log in

Label prompt for multi-label text classification

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multi-label text classification has been widely concerned by scholars due to its contribution to practical applications. One of the key challenges in multi-label text classification is how to extract and leverage the correlation among labels. However, it is quite challenging to directly model the correlations among labels in a complex and unknown label space. In this paper, we propose a Label Prompt Multi-label Text Classification model (LP-MTC), which is inspired by the idea of prompt learning of pre-trained language model. Specifically, we design a set of templates for multi-label text classification, integrate labels into the input of the pre-trained language model, and jointly optimize by Masked Language Models (MLM). In this way, the correlations among labels as well as semantic information between labels and text with the help of self-attention can be captured, and thus the model performance is effectively improved. Extensive empirical experiments on multiple datasets demonstrate the effectiveness of our method. Compared with BERT, LP-MTC improved 3.4% micro-F1 on average over the four public datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://github.com/lancopku/SGM

  2. https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge

  3. https://github.com/lancopku/SGM

  4. https://github.com/laddie132/LW-PT

References

  1. Li W, Xu H (2014) Text-based emotion classification using emotion cause extraction. Expert Syst Appl 41(4):1742–1749

    Article  Google Scholar 

  2. Rios A, Kavuluru R (2015) Convolutional neural networks for biomedical text classification: application in indexing biomedical articles. In: Proceedings of the 6th ACM conference on bioinformatics, computational biology and health informatics, pp 258–267

  3. Cambria E, Olsher D, Rajagopal D (2014) Senticnet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis. AAAI, p 1515–1521

  4. Yang Z, Yang D, Dyer C, He X, Smola JA, Hovy HE (2016) Hierarchical attention networks for document classification. HLT-NAACL, p 1480–1489

  5. Gopal S, Yang Y (2010) Multilabel classification with meta-level features. SIGIR, p 315–322

  6. Katakis I, Tsoumakas G, Vlahavas I (2008) Multilabel text classification for automated tag suggestion 18, 5. Citeseer

  7. Boutell RM, Luo J, Shen X, Brown MC (2004) Learning multi-label scene classification. Pattern Recognition, p 1757–1771

  8. Liu J, Chang W-C, Wu Y, Yang Y (2017) Deep learning for extreme multi-label text classification, p 115–124

  9. Xiao L, Zhang X, Jing L, Huang C, Song M (2021) Does head label help for long-tailed multi-label text classification 35(16), p 14103–14111

  10. Yang P, Sun X, Li W, Ma S, Wu W, Wang H (2018) Sgm: Sequence generation model for multi-label classification, p 3915–3926

  11. Pappas N, Henderson J (2019) Gile: a generalized input-label embedding for text classification. TACL, p 139–155

  12. Liu H, Yuan C, Wang X (2020) Label-wise document pre-training for multi-label text classification. international conference natural language processing, p 641–653

  13. Zhu Y, Kwok TJ, Zhou ZH (2018) Multi-label learning with global and local label correlation. IEEE Transactions on Knowledge and Data Engineering, p 1081–1094

  14. Ankit P, Muru S, Malaikannan S (2020) Multi-label text classification using attention-based graph neural network. ICAART. In: Proceedings of the 12th International conference on agents and artificial intelligence, vol 2, pp 494–505

  15. Kenton Jdm-wc, Toutanova LK (2019) Bert: Pre-training of deep bidirectional transformers for language understanding, p 4171–4186

  16. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Advances in neural information processing systems 33:1877–1901

    Google Scholar 

  17. Ding N, Chen Y, Han X, Xu G, Xie P, Zheng H-T, Liu Z, Li J, Kim H-G (2021) Prompt-learning for fine-grained entity typing. arXiv:2009.07118

  18. Schick T, Schütze H (2021) It’s not just size that matters: Small language models are also few-shot learners, p 2339–2352

  19. Schick T, Schütze H (2021) Exploiting cloze-questions for few-shot text classification and natural language inference. EACL, p 255–269

  20. Hüllermeier E, Fürnkranz J, Cheng W, Brinker K (2008) Label ranking by learning pairwise preferences. Artif. Intell., p 1897–1916

  21. Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Machine Learning, p 333–359

  22. Tsoumakas G, Vlahavas I (2007) Random k-labelsets: An ensemble method for multilabel classification, p 406–417. Springer

  23. Chen G, Ye D, Xing Z, Chen J, Cambria E (2017) Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. IJCNN, p 2377–2383

  24. Barutcuoglu Z, Schapire ER, Troyanskaya GO (2006) Hierarchical multi-label prediction of gene function. Bioinformatics, p 830–836

  25. Zhang M-L, Zhang K (2010) Multi-label learning by exploiting label dependency. KDD, p 999–1008

  26. Wang S, Wang J, Wang Z, Ji Q (2015) Multiple emotion tagging for multimedia data by exploiting high-order dependencies among emotions. IEEE Trans. Multimedia, p 2185–2197

  27. Wang S, Peng G, Zheng Z (2020) Capturing joint label distribution for multi-label classification through adversarial learning. IEEE Trans. Knowl. Data Eng., p 2310–2321

  28. Scarselli F, Gori M, Tsoi CA, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Transactions on Neural Networks, p 61–80

  29. Xiao L, Huang X, Chen B, Jing L (2019) Label-specific document representation for multi-label text classification, p 466–475

  30. Wang Y, Yao Q, Kwok J, Ni ML (2020) Generalizing from a few examples: a survey on few-shot learning. ACM Computing Surveys, p 1–34

  31. Liu X, Zheng Y, Du Z, Ding M, Qian Y, Yang Z, Tang J (2021) Gpt understands, too. arXiv:2103.10385

  32. Chen Z, Zhang Y (2021) Better few-shot text classification with pre-trained language model. ICANN, p 537–548

  33. Shin T, Razeghi Y, Logan IV RL, Wallace E, Singh S (2020) Autoprompt: Eliciting knowledge from language models with automatically generated prompts. Empirical Methods in Natural Language Processing, p 4222–4235

  34. Schick T, Schmid H, Schütze H (2020) Automatically identifying words that can serve as labels for few-shot text classification. COLING, p 5569–5578

  35. Gao T, Fisch A, Chen D (2021) Making pre-trained language models better few-shot learners, p 3816–3830

  36. Li XL, Liang P (2021) Prefix-tuning: Optimizing continuous prompts for generation, p 4582–4597

  37. Lester B, Al-Rfou R, Constant N (2021) The power of scale for parameter-efficient prompt tuning. EMNLP, p 3045–3059

  38. Debole F, Sebastiani F (2005) An analysis of the relative hardness of reuters-21578 subsets: Research articles. Journal of the American Society for Information Science and Technology, p 584–596

  39. Dorottya D, Dana M-A, Jeongwoo K, Alan C, Gaurav N, Sujith R (2020) Goemotions: a dataset of fine-grained emotions. ACL, p 4040–4054

  40. Schapire ER, Singer Y (1998) Improved boosting algorithms using confidence-rated predictions. Machine Learning, p 80–91

  41. Kim Y (2014) Convolutional neural networks for sentence classification. EMNLP, p 1746–1751

  42. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  43. Loshchilov I, Hutter F (2018) Fixing weight decay regularization in adam. arXiv: Learning

  44. Jawahar G, Sagot B, Seddah D (2019) What does bert learn about the structure of language. ACL (1), p 3651–3657

  45. Chen Z, Badrinarayanan V, Lee C-Y, Rabinovich A (2018) Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. international conference on machine learning, p 793–802

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (NSFC), “From Learning Outcome to Proactive Learning: Towards a Humancentered AI Based Approach to Intervention on Learning Motivation” (No.62077027).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoguang Wang.

Ethics declarations

Conflict of Interests

The authors declare no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, R., Liu, Z., Chen, X. et al. Label prompt for multi-label text classification. Appl Intell 53, 8761–8775 (2023). https://doi.org/10.1007/s10489-022-03896-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03896-4

Keywords

Navigation