Skip to main content

Improving text classification via a soft dynamical label strategy


Labels play a central role in the text classification tasks. However, most studies has a lossy label encoding problem, in which the label will be represented by a meaningless and independent one-hot vector. This paper proposes a novel strategy to dynamically generate a soft pseudo label based on the prediction for each training. This history-based soft pseudo label will be taken as the target to optimize parameters by minimizing the distance between the target and the prediction. In addition, we augment the training data with Mix-up, a widely used method, to prevent overfitting on the small dataset. Extensive experimental results demonstrate that the proposed dynamical soft label strategy significantly improves the performance of several widely used deep learning classification models on binary and multi-class text classification tasks. Not only is our simple and efficient strategy much easier to implement and train, it is also exhibits substantial improvements (up to 2.54% relative improvement on FDCNews datasets with an LSTM encoder) over Label Confusion Learning (LCM)—a state-of-the-art label smoothing model—under the same experimental setting. The experimental result also demonstrate that Mix-up improves our method's performance on smaller datasets, but introduce excess noise in larger datasets, which diminishes the model’s performance.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3







  1. Wang X, Zhao Y, Pourpanah F (2020) Recent advances in deep learning. Int J Mach Learn Cybernet 11(4):747–750

    Article  Google Scholar 

  2. Qiao X, Peng C, Liu Z, Hu Y (2019) Word-character attention model for chinese text classification. Int J Mach Learn Cybernet 10(12):3521–3537

    Article  Google Scholar 

  3. Li Y, Wang J, Wang S, Liang J, Li J (2019) Local dense mixed region cutting+ global rebalancing: a method for imbalanced text sentiment classification. Int J Mach Learn Cybernet 10(7):1805–1820

    Article  Google Scholar 

  4. Li X, Xie H, Rao Y, Chen Y, Liu X, Huang H, Wang FL( 2016) Weighted multi-label classification model for sentiment analysis of online news. In: 2016 International Conference on Big Data and Smart Computing (BigComp), pp. 215– 222 . IEEE

  5. Huang X, Rao Y, Xie H, Wong T-L, Wang FL( 2017) Cross-domain sentiment classification via topic-related tradaboost. In: Thirty-First AAAI Conference on Artificial Intelligence

  6. Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89

    Article  Google Scholar 

  7. Liu J, Dolan P, Pedersen ER ( 2010) Personalized news recommendation based on click behavior. In: Proceedings of the 15th International Conference on Intelligent User Interfaces, pp. 31– 40

  8. Yang S, Wang Y, Chu X (2020) A survey of deep learning techniques for neural machine translation. arXiv preprint arXiv:2002.07526

  9. Blei DM, Ng AY, Jordan MI ( 2003) Latent dirichlet allocation. Journal of machine Learning research 3( Jan), 993– 1022

  10. Medsker LR, Jain L (2001) Recurrent neural networks. Design Appl 5:64–67

    Google Scholar 

  11. Müller R, Kornblith S, Hinton GE (2019) When does label smoothing help? Advances in neural information processing systems 32

  12. Geng X (2016) Label distribution learning. IEEE Trans Knowl Data Eng 28(7):1734–1748

    Article  Google Scholar 

  13. Yang CC, Wang FL( 2003) Fractal summarization: summarization based on fractal theory. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 391– 392

  14. Yang CC, Wang FL (2007) An information delivery system with automatic summarization for mobile commerce. Decision Support Syst 43(1):46–61

    Article  Google Scholar 

  15. Liang W, Xie H, Rao Y, Lau RY, Wang FL (2018) Universal affective model for readers’ emotion classification over short texts. Expert Syst Appl 114:322–333

    Article  Google Scholar 

  16. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  17. Al-Smadi M, Talafha B, Al-Ayyoub M, Jararweh Y (2019) Using long short-term memory deep neural networks for aspect-based sentiment analysis of arabic reviews. Int J Mach Learn Cybernet 10(8):2163–2175

    Article  Google Scholar 

  18. Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18(5–6):602–610

    Article  Google Scholar 

  19. Zulqarnain M, Ghazali R, Ghouse MG, Mushtaq MF (2019) Efficient processing of gru based on word embedding for text classification. JOIV 3(4):377–383

    Article  Google Scholar 

  20. Liu B, Zhou Y, Sun W (2020) Character-level text classification via convolutional neural network and gated recurrent unit. Int J Mach Learn Cybernet 11(8):1939–1949

    Article  Google Scholar 

  21. Kalchbrenner N, Grefenstette E, Blunsom P(2014) A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188

  22. Lai S, Xu L, Liu K, Zhao J ( 2015) Recurrent convolutional neural networks for text classification. In: Twenty-ninth AAAI Conference on Artificial Intelligence

  23. Dos Santos C, Gatti M( 2014) Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 69– 78

  24. Huang M, Xie H, Rao Y, Feng J, Wang FL (2020) Sentiment strength detection with a context-dependent lexicon-based convolutional neural network. Inform Sci 520:389–399

    Article  Google Scholar 

  25. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  26. Lee K, Palsetia D, Narayanan R, Patwary MMA, Agrawal A, Choudhary A( 2011) Twitter trending topic classification. In: 2011 IEEE 11th International Conference on Data Mining Workshops, pp. 251– 258 . IEEE

  27. Wei J, Zou K (2019) Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196

  28. Zhang H, Cisse M, Dauphin, Y.N., Lopez-Paz, D(2017) mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412

  29. Liang D, Yang F, Zhang T, Yang P (2018) Understanding mixup training methods. IEEE. Access 6:58774–58783

    Article  Google Scholar 

  30. Guo, H., Mao, Y., Zhang, R(2019) Augmenting data with mixup for sentence classification: An empirical study. arXiv preprint arXiv:1905.08941

  31. Tang J, Qu M, Mei Q ( 2015) Pte: Predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1165– 1174

  32. Zhang H, Xiao L, Chen W, Wang Y, Jin Y (2017) Multi-task label embedding for text classification. arXiv preprint arXiv:1710.07210

  33. Wang, G., Li, C., Wang, W., Zhang, Y., Shen, D., Zhang, X., Henao, R., Carin, L(2018) Joint embedding of words and labels for text classification. arXiv preprint arXiv:1805.04174

  34. Yang P, Sun X, Li W, Ma S, Wu W, Wang H (2018) Sgm: sequence generation model for multi-label classification. arXiv preprint arXiv:1806.04822

  35. Du C, Chen Z, Feng F, Zhu L, Gan T, Nie L ( 2019) Explicit interaction model towards text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6359– 6366

  36. Lienen J, Hüllermeier E ( 2021) From label smoothing to label relaxation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 8583– 8591

  37. Li Y, Yang J, Song Y, Cao L, Luo J, Li L-J ( 2017) Learning from noisy labels with distillation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1910– 1918

  38. Xu Y, Qiu X, Zhou L, Huang X (2020) Improving bert fine-tuning via self-ensemble and self-distillation. arXiv preprint arXiv:2002.10345

  39. Zhang Z-Y, Sheng X-R, Zhang Y, Jiang B, Han S, Deng H, Zheng B (2022) Towards understanding the overfitting phenomenon of deep click-through rate prediction models. arXiv preprint arXiv:2209.06053

  40. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    Article  MathSciNet  MATH  Google Scholar 

  41. Guo B, Han S, Han X, Huang H. Lu T ( 2021) Label confusion learning to enhance text classification models. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 12929– 12936

  42. Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K ( 2019) Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3713– 3722

  43. Zhang, X., Zhao, J., LeCun, Y (2015) Character-level convolutional networks for text classification. Advances in neural information processing systems 28

  44. Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C ( 2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631– 1642

  45. Liu, P., Qiu, X., Huang, X (2016) Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101

  46. Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820

  47. Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., Wang, F., Liu, Q (2019) Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351

  48. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R (2019) Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942

  49. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532– 1543 ( 2014)

  50. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv Neural Inform Process Syst 26 (2013)

  51. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  52. Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Fu Lee Wang.

Ethics declarations


The research described in this article has been supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (UGC/FDS16/E01/19), the Lam Woo Research Fund (LWP20019) and the Faculty Research Grants (DB22A5 and DB22B4) of Lingnan University, Hong Kong. The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Xie, H., Wang, F.L. et al. Improving text classification via a soft dynamical label strategy. Int. J. Mach. Learn. & Cyber. 14, 2395–2405 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: