Abstract
Multi-label text classification has been widely concerned by scholars due to its contribution to practical applications. One of the key challenges in multi-label text classification is how to extract and leverage the correlation among labels. However, it is quite challenging to directly model the correlations among labels in a complex and unknown label space. In this paper, we propose a Label Prompt Multi-label Text Classification model (LP-MTC), which is inspired by the idea of prompt learning of pre-trained language model. Specifically, we design a set of templates for multi-label text classification, integrate labels into the input of the pre-trained language model, and jointly optimize by Masked Language Models (MLM). In this way, the correlations among labels as well as semantic information between labels and text with the help of self-attention can be captured, and thus the model performance is effectively improved. Extensive empirical experiments on multiple datasets demonstrate the effectiveness of our method. Compared with BERT, LP-MTC improved 3.4% micro-F1 on average over the four public datasets.
Similar content being viewed by others
References
Li W, Xu H (2014) Text-based emotion classification using emotion cause extraction. Expert Syst Appl 41(4):1742–1749
Rios A, Kavuluru R (2015) Convolutional neural networks for biomedical text classification: application in indexing biomedical articles. In: Proceedings of the 6th ACM conference on bioinformatics, computational biology and health informatics, pp 258–267
Cambria E, Olsher D, Rajagopal D (2014) Senticnet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis. AAAI, p 1515–1521
Yang Z, Yang D, Dyer C, He X, Smola JA, Hovy HE (2016) Hierarchical attention networks for document classification. HLT-NAACL, p 1480–1489
Gopal S, Yang Y (2010) Multilabel classification with meta-level features. SIGIR, p 315–322
Katakis I, Tsoumakas G, Vlahavas I (2008) Multilabel text classification for automated tag suggestion 18, 5. Citeseer
Boutell RM, Luo J, Shen X, Brown MC (2004) Learning multi-label scene classification. Pattern Recognition, p 1757–1771
Liu J, Chang W-C, Wu Y, Yang Y (2017) Deep learning for extreme multi-label text classification, p 115–124
Xiao L, Zhang X, Jing L, Huang C, Song M (2021) Does head label help for long-tailed multi-label text classification 35(16), p 14103–14111
Yang P, Sun X, Li W, Ma S, Wu W, Wang H (2018) Sgm: Sequence generation model for multi-label classification, p 3915–3926
Pappas N, Henderson J (2019) Gile: a generalized input-label embedding for text classification. TACL, p 139–155
Liu H, Yuan C, Wang X (2020) Label-wise document pre-training for multi-label text classification. international conference natural language processing, p 641–653
Zhu Y, Kwok TJ, Zhou ZH (2018) Multi-label learning with global and local label correlation. IEEE Transactions on Knowledge and Data Engineering, p 1081–1094
Ankit P, Muru S, Malaikannan S (2020) Multi-label text classification using attention-based graph neural network. ICAART. In: Proceedings of the 12th International conference on agents and artificial intelligence, vol 2, pp 494–505
Kenton Jdm-wc, Toutanova LK (2019) Bert: Pre-training of deep bidirectional transformers for language understanding, p 4171–4186
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Advances in neural information processing systems 33:1877–1901
Ding N, Chen Y, Han X, Xu G, Xie P, Zheng H-T, Liu Z, Li J, Kim H-G (2021) Prompt-learning for fine-grained entity typing. arXiv:2009.07118
Schick T, Schütze H (2021) It’s not just size that matters: Small language models are also few-shot learners, p 2339–2352
Schick T, Schütze H (2021) Exploiting cloze-questions for few-shot text classification and natural language inference. EACL, p 255–269
Hüllermeier E, Fürnkranz J, Cheng W, Brinker K (2008) Label ranking by learning pairwise preferences. Artif. Intell., p 1897–1916
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Machine Learning, p 333–359
Tsoumakas G, Vlahavas I (2007) Random k-labelsets: An ensemble method for multilabel classification, p 406–417. Springer
Chen G, Ye D, Xing Z, Chen J, Cambria E (2017) Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. IJCNN, p 2377–2383
Barutcuoglu Z, Schapire ER, Troyanskaya GO (2006) Hierarchical multi-label prediction of gene function. Bioinformatics, p 830–836
Zhang M-L, Zhang K (2010) Multi-label learning by exploiting label dependency. KDD, p 999–1008
Wang S, Wang J, Wang Z, Ji Q (2015) Multiple emotion tagging for multimedia data by exploiting high-order dependencies among emotions. IEEE Trans. Multimedia, p 2185–2197
Wang S, Peng G, Zheng Z (2020) Capturing joint label distribution for multi-label classification through adversarial learning. IEEE Trans. Knowl. Data Eng., p 2310–2321
Scarselli F, Gori M, Tsoi CA, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Transactions on Neural Networks, p 61–80
Xiao L, Huang X, Chen B, Jing L (2019) Label-specific document representation for multi-label text classification, p 466–475
Wang Y, Yao Q, Kwok J, Ni ML (2020) Generalizing from a few examples: a survey on few-shot learning. ACM Computing Surveys, p 1–34
Liu X, Zheng Y, Du Z, Ding M, Qian Y, Yang Z, Tang J (2021) Gpt understands, too. arXiv:2103.10385
Chen Z, Zhang Y (2021) Better few-shot text classification with pre-trained language model. ICANN, p 537–548
Shin T, Razeghi Y, Logan IV RL, Wallace E, Singh S (2020) Autoprompt: Eliciting knowledge from language models with automatically generated prompts. Empirical Methods in Natural Language Processing, p 4222–4235
Schick T, Schmid H, Schütze H (2020) Automatically identifying words that can serve as labels for few-shot text classification. COLING, p 5569–5578
Gao T, Fisch A, Chen D (2021) Making pre-trained language models better few-shot learners, p 3816–3830
Li XL, Liang P (2021) Prefix-tuning: Optimizing continuous prompts for generation, p 4582–4597
Lester B, Al-Rfou R, Constant N (2021) The power of scale for parameter-efficient prompt tuning. EMNLP, p 3045–3059
Debole F, Sebastiani F (2005) An analysis of the relative hardness of reuters-21578 subsets: Research articles. Journal of the American Society for Information Science and Technology, p 584–596
Dorottya D, Dana M-A, Jeongwoo K, Alan C, Gaurav N, Sujith R (2020) Goemotions: a dataset of fine-grained emotions. ACL, p 4040–4054
Schapire ER, Singer Y (1998) Improved boosting algorithms using confidence-rated predictions. Machine Learning, p 80–91
Kim Y (2014) Convolutional neural networks for sentence classification. EMNLP, p 1746–1751
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Loshchilov I, Hutter F (2018) Fixing weight decay regularization in adam. arXiv: Learning
Jawahar G, Sagot B, Seddah D (2019) What does bert learn about the structure of language. ACL (1), p 3651–3657
Chen Z, Badrinarayanan V, Lee C-Y, Rabinovich A (2018) Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. international conference on machine learning, p 793–802
Acknowledgements
This work was supported by National Natural Science Foundation of China (NSFC), “From Learning Outcome to Proactive Learning: Towards a Humancentered AI Based Approach to Intervention on Learning Motivation” (No.62077027).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Song, R., Liu, Z., Chen, X. et al. Label prompt for multi-label text classification. Appl Intell 53, 8761–8775 (2023). https://doi.org/10.1007/s10489-022-03896-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03896-4