Advertisement

Text Classification Based on Word2vec and Convolutional Neural Network

  • Lin Li
  • Linlong Xiao
  • Wenzhen Jin
  • Hong Zhu
  • Guocai YangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11305)

Abstract

Text representations in text classification usually have high dimensionality and are lack of semantics, resulting in poor classification effect. In this paper, TF-IDF is optimized by using optimization factors, then word2vec with semantic information is weighted, and the single-text representation model CD_STR is obtained. Based on the CD_STR model, the latent semantic index (LSI) and the TF-IDF weighted vector space model (T_VSM) are merged to obtain a fusion model, CD_MTR, which is more efficient. The text classification method MTR_MCNN of the fusion model CD_MTR combined with convolutional neural network is further proposed. This method first designs convolution kernels of different sizes and numbers, allowing them to extract text features from different aspects. Then the text vectors trained by the CD_MTR model are used as the input to the improved convolutional neural network. Tests on two datasets have verified that the performance of the two models, CD_STR and CD_MTR, is superior to other comparable textual representation models. The classification effect of MTR_MCNN method is better than that of other comparison methods, and the classification accuracy is higher than that of CD_MTR model.

Keywords

Text classification Text representation Word2vec Convolutional neural network 

References

  1. 1.
    Hinton, G.E.: Learning distributed representations of concepts. In: Eighth Conference of the Cognitive Science Society, pp. 1–12 (1986)Google Scholar
  2. 2.
    Bengio, Y., Ducharme, R., Vincent, P., et al.: A neural probabilistic language model. J. Mach. Learn. Res. 3(2), 1137–1155 (2003)zbMATHGoogle Scholar
  3. 3.
    Mnih, A., Hinton, G.: A scalable hierarchical distributed language model. In: International Conference on Neural Information Processing Systems, pp. 1081–1088. Curran Associates Inc., (2008)Google Scholar
  4. 4.
    Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. Comput. Sci. (2013)Google Scholar
  5. 5.
    Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: HLT-NAACL (2013)Google Scholar
  6. 6.
    Hu, B., Lu, Z., Li, H., et al.: Convolutional neural network architectures for matching natural language sentences. In: International Conference on Neural Information Processing Systems, pp. 2042–2050. MIT Press (2014)Google Scholar
  7. 7.
    Yan, Y.: Text representation and classification with deep learning. University of Science and Technology, Beijing (2016)Google Scholar
  8. 8.
    Salton, G.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Lecun, Y., Boser, B., Denker, J.S., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (2014)CrossRefGoogle Scholar
  10. 10.
    Bouvrie, J.: Notes on convolutional neural network. Neural Nets (2006)Google Scholar
  11. 11.
    Liu, X., Zhang, Y., Zheng, Q.: Sentiment classification of short texts on internet based on convolutional neural network model. Comput. Mod. 2017(4), 73–77 (2017)Google Scholar
  12. 12.
    Wang, P., Xu, J., Xu, B., et al.: Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 2, pp. 352–357 (2015)Google Scholar
  13. 13.
    Zhou, C., Sun, C., Liu, Z., et al.: A C-LSTM neural network for text classification. Comput. Sci. 1(4), 39–44 (2015)Google Scholar
  14. 14.
    Lai, S., Xu, L.H., Liu, K., et al.: Recurrent convolutional neural networks for text classification. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2267–2273 (2015)Google Scholar
  15. 15.
    Cai, H.: Research of short-text classification method based on convolution neural network. Southwest University (2016)Google Scholar
  16. 16.
    Yin, Y., Yang, W., Yang, H., et al.: Research on short text classification algorithm based on convolutional neural network and KNN. Comput. Eng. (2017)Google Scholar
  17. 17.
    Kim, Y.: Convolutional Neural network for Sentence Classification. Eprint Arxiv (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Lin Li
    • 1
  • Linlong Xiao
    • 1
  • Wenzhen Jin
    • 1
  • Hong Zhu
    • 1
  • Guocai Yang
    • 1
    Email author
  1. 1.School of Computer and Information ScienceSouthwest UniversityChongqingChina

Personalised recommendations