Word-character attention model for Chinese text classification

  • Xue QiaoEmail author
  • Chen Peng
  • Zhen Liu
  • Yanfeng Hu
Original Article


Recent progress in applying neural networks to image classification has motivated the exploration of their applications to text classification tasks. Unlike the majority of these researches devoting to English corpus, in this paper, we focus on Chinese text, which is more intricate in semantic representations. As the basic unit of Chinese words, character plays a vital role in Chinese linguistic. However, most existing Chinese text classification methods typically regard word features as the basic unit of text representation but ignore the beneficial performance of character features. Besides, existing approaches compress the entire word features into a semantic representation, without considering attention mechanism which allows for capturing salient features. To tackle these issues, we propose the word-character attention model (WCAM) for Chinese text classification. This WCAM approach integrates two levels of attention models: word-level attention model captures salient words which have closer semantic relationship to the text meaning, and character-level attention model selects discriminative characters of text. Both are jointly employed to learn representation of texts. Meanwhile, the word-character constraint model and character alignment are introduced in our proposed approach to ensure the highly representative of selected characters as well as enhance their discrimination. Both are jointly employed to exploit the subtle and local differences for distinguishing the text classes. Extensive experiments on two benchmark datasets demonstrate that our WCAM approach achieves comparable or even better performance than the state-of-the-art methods for Chinese text classification.


Chinese text classification Attention mechanism Word-character attention Word-character constraint 



This work was supported by Gusu Innovation Talent Foundation of Suzhou under Grant ZXT2017002 and National Key R&D Program of China under Grant 2017YFC08219.


  1. 1.
    Pratama BY, Sarno R (2016) Personality classification based on Twitter text using Naive Bayes, KNN and SVM. In: IEEE international conference on data and software engineering, pp 170–174Google Scholar
  2. 2.
    Wandabwa H, Zhang D, Sammy K (2017) Text categorization via attribute distance weighted k-nearest neighbor classification. In: IEEE international conference on information technology, pp 225–228Google Scholar
  3. 3.
    Steyn C, Waal AD (2017) Semi-supervised machine learning for textual anomaly detection. In: IEEE pattern recognition association of South Africa and robotics and mechatronics international conference, pp 1–5Google Scholar
  4. 4.
    Haddoud M, Mokhtari A, Lecroq T et al (2016) Combining supervised term-weighting metrics for svm text classification with extended term representation. Knowl Inf Syst 49(3):1–23CrossRefGoogle Scholar
  5. 5.
    Tuteja SK, Bogiri N (2017) Email spam filtering using BPNN classification algorithm. In: IEEE international conference on automatic control and dynamic optimization techniques, pp 915–919Google Scholar
  6. 6.
    Sun RH, Hao J (2017) Comparisons of word representations for convolutional neural network: an exploratory study on tourism Weibo classification. In: IEEE international conference on service systems and service management, pp 1–5Google Scholar
  7. 7.
    Li J, Li J, Fu X et al (2016) Learning distributed word representation with multi-contextual mixed embedding. Knowl Based Syst 106(C):220–230CrossRefGoogle Scholar
  8. 8.
    Cheng J, Li P, Ding Z et al (2017) Sentiment classification of chinese microblogging texts with global RNN. In: IEEE international conference on data science in cyberspace, pp 653–657Google Scholar
  9. 9.
    Liu S, Bremer PT, Thiagarajan JJ et al (2017) Visual exploration of semantic relationships in neural word embeddings. IEEE Trans Vis Comput Graph 99:1–1Google Scholar
  10. 10.
    Zhang L, Chen C (2017) Sentiment classification with convolutional neural networks: an experimental study on a large-scale chinese conversation corpus. In: IEEE international conference on computational intelligence and security, pp 165–169Google Scholar
  11. 11.
    Zhuang H, Wang C, Li C et al (2017) Natural language processing service based on stroke-level convolutional networks for Chinese text classification. In: IEEE international conference on web services, pp 404–411Google Scholar
  12. 12.
    Chen X, Xu L, Liu Z et al (2015) Joint learning of character and word embeddings. In: International conference on artificial intelligence, pp 1236–1242Google Scholar
  13. 13.
    Lai S, Xu L, Liu K, Zhao (2015) Recurrent convolutional neural networks for text classification. In: AAAI, pp 2267–2273Google Scholar
  14. 14.
    Li Y, Wang X, Xu P (2018) Chinese text classification model based on deep learning. Future Internet 10(11):113CrossRefGoogle Scholar
  15. 15.
    Yang J, Lyu Q, Gao S et al (2017) Review aspect extraction based on character-enhanced embedding models. In: IEEE international conference on network infrastructure and digital content, pp 219–223Google Scholar
  16. 16.
    Zhang X, Zhao J, Lecun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657Google Scholar
  17. 17.
    Zhou Y, Xu J, Cao J et al (2017) Hybrid attention networks for Chinese short text classification. Computación y Sistemas 21(4):759–769Google Scholar
  18. 18.
    Cho K, Van Merrienboer B, Gulcehre C et al (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv:1406.1078Google Scholar
  19. 19.
    Bafna P, Pramod D, Vaidya A (2016) Document clustering: TF-IDF approach. In: IEEE international conference on electrical, electronics, and optimization techniques, pp 61–66Google Scholar
  20. 20.
    Qu Z, Song X, Zheng S, Wang X, Song X, Li Z (2018) Improved Bayes method based on TF-IDF feature and grade factor feature for chinese information classification. In: 2018 IEEE international conference on big data and smart computing, pp 677–680Google Scholar
  21. 21.
    Le QV, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196Google Scholar
  22. 22.
    Socher R, Huval B, Manning CD, Ng AY (2012) Semantic compositionality through recursive matrix-vector spaces. In: The 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pp 1201–1211Google Scholar
  23. 23.
    Pengfei Liu X, Qiu X, Chen S, Wu XH (2015) Multi-timescale long short-term memory neural network for modelling sentences and documents. In: The 2015 conference on empirical methods in natural language processing, pp 2326–2335Google Scholar
  24. 24.
    Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. In: Proceedings of the 52nd ACL, pp 655–665Google Scholar
  25. 25.
    Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on EMNLP, pp 1746–1751Google Scholar
  26. 26.
    Yang Z, Yang D, Dyer C et al (2017) Hierarchical attention networks for document classification. In: Conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1480–1489Google Scholar
  27. 27.
    Li Y, Li W, Sun F, Li S (2015) Component-enhanced chinese character embeddings. In: Proceedings of the 2015 conference on EMNLP, pp 829–834Google Scholar
  28. 28.
    Zhou Y, Xu B, Xu J et al (2017) Compositional recurrent neural networks for chinese short text classification. In: IEEE international conference on web intelligence, pp 137–144Google Scholar
  29. 29.
    Gao S, Ramanathan A, Tourassi G (2017) Hierarchical convolutional attention networks for text classification. In: The 3rd workshop on representation learning for NLP, pp 11–23Google Scholar
  30. 30.
    Su J, Zeng J, Xiong D et al (2018) A hierarchy-to-sequence attentional neural machine translation model. IEEE/ACM Trans Audio Speech Lang Process 26(3):623–632CrossRefGoogle Scholar
  31. 31.
    Gao L, Guo Z, Zhang H et al (2017) Video captioning with attention-based lstm and semantic consistency. IEEE Trans Multimed 19(9):2045–2055CrossRefGoogle Scholar
  32. 32.
    Yang Z, He X, Gao J et al (2016) Stacked attention networks for image question answering. In: IEEE conference on computer vision and pattern recognition, pp 21–29Google Scholar
  33. 33.
    Zhou P, Shi W, Tian J et al (2016) Attention-based bidirectional long short-term memory networks for relation classification. In: Meeting of the association for computational linguistics, pp 207–212Google Scholar
  34. 34.
    Wang L, Cao Z, Melo GD et al (2016) Relation classification via multi-level attention CNNs. In: Meeting of the association for computational linguistics, pp 1298–1307Google Scholar
  35. 35.
    Ling Y, An Y, Liu M et al (2017) Integrating extra knowledge into word embedding models for biomedical NLP tasks. In: IEEE international joint conference on neural networks, pp 968–975Google Scholar
  36. 36.
    Erhan D, Bengio Y, Courville A et al (2010) Why does unsupervised pre-training help deep learning. J Mach Learn Res 11(3):625–660MathSciNetzbMATHGoogle Scholar
  37. 37.
    Wang Q, Xu J, Chen H et al (2017) Two improved continuous bag-of-word models. In: IEEE international joint conference on neural networks, pp 2851–2856Google Scholar
  38. 38.
    Wang J, Liu F, Qin S (2017) Global exponential stability of uncertain memristor-based recurrent neural networks with mixed time delays. Int J Mach Learn Cybern 2:1–13Google Scholar
  39. 39.
    Na Liu F, Chen M, Lu (2013) Spectral co-clustering documents and words using fuzzy K-harmonic means. Int J Mach Learn Cybern 4(1):75–83CrossRefGoogle Scholar
  40. 40.
    Li P, Yan Ye (2016) Chinese spam filtering based on back-propagation neural networks. Softw Eng 4(2):9–12Google Scholar
  41. 41.
    Sang L, Xie F, Liu X et al (2017) WEFEST: word embedding feature extension for short text classification. In: IEEE international conference on data mining workshops, pp 677–683Google Scholar
  42. 42.
    Musa AB (2013) Comparative study on classification performance between support vector machine and logistic regression. Int J Mach Learn Cybern 4(1):13–24CrossRefGoogle Scholar
  43. 43.
    Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119Google Scholar
  44. 44.
    Zhang D, Wang D (2015) Relation classification via recurrent neural network. Comput Sci. arXiv:1508.01006Google Scholar
  45. 45.
    Graves A, Jürgen Schmidhuber (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18(5):602–610CrossRefGoogle Scholar
  46. 46.
    Yong Z, Meng JE, Venkatesan R et al (2016) Sentiment classification using comprehensive attention recurrent models. In: IEEE international joint conference on neural networks, pp 1562–1569Google Scholar
  47. 47.
    Luong MT, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Conference on empirical methods in natural language processing, pp 1412–1421Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Institute of Electronics, Chinese Academy of Sciences, SuzhouSuzhouChina

Personalised recommendations