Abstract
Labels play a central role in the text classification tasks. However, most studies has a lossy label encoding problem, in which the label will be represented by a meaningless and independent one-hot vector. This paper proposes a novel strategy to dynamically generate a soft pseudo label based on the prediction for each training. This history-based soft pseudo label will be taken as the target to optimize parameters by minimizing the distance between the target and the prediction. In addition, we augment the training data with Mix-up, a widely used method, to prevent overfitting on the small dataset. Extensive experimental results demonstrate that the proposed dynamical soft label strategy significantly improves the performance of several widely used deep learning classification models on binary and multi-class text classification tasks. Not only is our simple and efficient strategy much easier to implement and train, it is also exhibits substantial improvements (up to 2.54% relative improvement on FDCNews datasets with an LSTM encoder) over Label Confusion Learning (LCM)—a state-of-the-art label smoothing model—under the same experimental setting. The experimental result also demonstrate that Mix-up improves our method's performance on smaller datasets, but introduce excess noise in larger datasets, which diminishes the model’s performance.
This is a preview of subscription content, access via your institution.



References
Wang X, Zhao Y, Pourpanah F (2020) Recent advances in deep learning. Int J Mach Learn Cybernet 11(4):747–750
Qiao X, Peng C, Liu Z, Hu Y (2019) Word-character attention model for chinese text classification. Int J Mach Learn Cybernet 10(12):3521–3537
Li Y, Wang J, Wang S, Liang J, Li J (2019) Local dense mixed region cutting+ global rebalancing: a method for imbalanced text sentiment classification. Int J Mach Learn Cybernet 10(7):1805–1820
Li X, Xie H, Rao Y, Chen Y, Liu X, Huang H, Wang FL( 2016) Weighted multi-label classification model for sentiment analysis of online news. In: 2016 International Conference on Big Data and Smart Computing (BigComp), pp. 215– 222 . IEEE
Huang X, Rao Y, Xie H, Wong T-L, Wang FL( 2017) Cross-domain sentiment classification via topic-related tradaboost. In: Thirty-First AAAI Conference on Artificial Intelligence
Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89
Liu J, Dolan P, Pedersen ER ( 2010) Personalized news recommendation based on click behavior. In: Proceedings of the 15th International Conference on Intelligent User Interfaces, pp. 31– 40
Yang S, Wang Y, Chu X (2020) A survey of deep learning techniques for neural machine translation. arXiv preprint arXiv:2002.07526
Blei DM, Ng AY, Jordan MI ( 2003) Latent dirichlet allocation. Journal of machine Learning research 3( Jan), 993– 1022
Medsker LR, Jain L (2001) Recurrent neural networks. Design Appl 5:64–67
Müller R, Kornblith S, Hinton GE (2019) When does label smoothing help? Advances in neural information processing systems 32
Geng X (2016) Label distribution learning. IEEE Trans Knowl Data Eng 28(7):1734–1748
Yang CC, Wang FL( 2003) Fractal summarization: summarization based on fractal theory. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 391– 392
Yang CC, Wang FL (2007) An information delivery system with automatic summarization for mobile commerce. Decision Support Syst 43(1):46–61
Liang W, Xie H, Rao Y, Lau RY, Wang FL (2018) Universal affective model for readers’ emotion classification over short texts. Expert Syst Appl 114:322–333
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Al-Smadi M, Talafha B, Al-Ayyoub M, Jararweh Y (2019) Using long short-term memory deep neural networks for aspect-based sentiment analysis of arabic reviews. Int J Mach Learn Cybernet 10(8):2163–2175
Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18(5–6):602–610
Zulqarnain M, Ghazali R, Ghouse MG, Mushtaq MF (2019) Efficient processing of gru based on word embedding for text classification. JOIV 3(4):377–383
Liu B, Zhou Y, Sun W (2020) Character-level text classification via convolutional neural network and gated recurrent unit. Int J Mach Learn Cybernet 11(8):1939–1949
Kalchbrenner N, Grefenstette E, Blunsom P(2014) A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188
Lai S, Xu L, Liu K, Zhao J ( 2015) Recurrent convolutional neural networks for text classification. In: Twenty-ninth AAAI Conference on Artificial Intelligence
Dos Santos C, Gatti M( 2014) Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 69– 78
Huang M, Xie H, Rao Y, Feng J, Wang FL (2020) Sentiment strength detection with a context-dependent lexicon-based convolutional neural network. Inform Sci 520:389–399
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Lee K, Palsetia D, Narayanan R, Patwary MMA, Agrawal A, Choudhary A( 2011) Twitter trending topic classification. In: 2011 IEEE 11th International Conference on Data Mining Workshops, pp. 251– 258 . IEEE
Wei J, Zou K (2019) Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196
Zhang H, Cisse M, Dauphin, Y.N., Lopez-Paz, D(2017) mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412
Liang D, Yang F, Zhang T, Yang P (2018) Understanding mixup training methods. IEEE. Access 6:58774–58783
Guo, H., Mao, Y., Zhang, R(2019) Augmenting data with mixup for sentence classification: An empirical study. arXiv preprint arXiv:1905.08941
Tang J, Qu M, Mei Q ( 2015) Pte: Predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1165– 1174
Zhang H, Xiao L, Chen W, Wang Y, Jin Y (2017) Multi-task label embedding for text classification. arXiv preprint arXiv:1710.07210
Wang, G., Li, C., Wang, W., Zhang, Y., Shen, D., Zhang, X., Henao, R., Carin, L(2018) Joint embedding of words and labels for text classification. arXiv preprint arXiv:1805.04174
Yang P, Sun X, Li W, Ma S, Wu W, Wang H (2018) Sgm: sequence generation model for multi-label classification. arXiv preprint arXiv:1806.04822
Du C, Chen Z, Feng F, Zhu L, Gan T, Nie L ( 2019) Explicit interaction model towards text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6359– 6366
Lienen J, Hüllermeier E ( 2021) From label smoothing to label relaxation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 8583– 8591
Li Y, Yang J, Song Y, Cao L, Luo J, Li L-J ( 2017) Learning from noisy labels with distillation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1910– 1918
Xu Y, Qiu X, Zhou L, Huang X (2020) Improving bert fine-tuning via self-ensemble and self-distillation. arXiv preprint arXiv:2002.10345
Zhang Z-Y, Sheng X-R, Zhang Y, Jiang B, Han S, Deng H, Zheng B (2022) Towards understanding the overfitting phenomenon of deep click-through rate prediction models. arXiv preprint arXiv:2209.06053
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Guo B, Han S, Han X, Huang H. Lu T ( 2021) Label confusion learning to enhance text classification models. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 12929– 12936
Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K ( 2019) Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3713– 3722
Zhang, X., Zhao, J., LeCun, Y (2015) Character-level convolutional networks for text classification. Advances in neural information processing systems 28
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C ( 2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631– 1642
Liu, P., Qiu, X., Huang, X (2016) Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101
Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820
Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., Wang, F., Liu, Q (2019) Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R (2019) Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532– 1543 ( 2014)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv Neural Inform Process Syst 26 (2013)
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
The research described in this article has been supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (UGC/FDS16/E01/19), the Lam Woo Research Fund (LWP20019) and the Faculty Research Grants (DB22A5 and DB22B4) of Lingnan University, Hong Kong. The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, J., Xie, H., Wang, F.L. et al. Improving text classification via a soft dynamical label strategy. Int. J. Mach. Learn. & Cyber. 14, 2395–2405 (2023). https://doi.org/10.1007/s13042-022-01770-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01770-w