Large Scale Text Classification with Efficient Word Embedding

  • Xiaohan Ma
  • Rize Jin
  • Joon-Young Paik
  • Tae-Sun Chung
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 425)

Abstract

This article offers an empirical exploration on the efficient use of word-level convolutional neural networks (word-CNN) for large-scale text classification. Generally, the word-CNNs are difficult to train on large-scale datasets as the size of word embedding dramatically increases as the size of vocabulary increases. In order to handle this issue, this paper presents a de-noise approach to word embedding. We compare our model with several recently proposed CNN models on publicly available dataset. The experimental results show that proposed method improves the usefulness of word-CNN and increases the accuracy of text classification.

Keywords

Text Classification Sentiment Analysis Convolutional Neural Network Vocabulary Size Distinct Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

This research was supported by the MISP(Ministry of Science, ICT & Future Planning), Korea, under the National Program for Excellence in SW) supervised by the IITP(Institute for Information & communications Technology Promotion).

References

  1. 1.
    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRefGoogle Scholar
  2. 2.
    Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: Proceedings of AAAI, vol 333, pp 2267–2273Google Scholar
  3. 3.
    Yih W, He X, Meek C (2014) Semantic parsing for single-relation question answering. In: Proceedings of the 52th annual meeting of the association for computational linguistics, pp 643–648Google Scholar
  4. 4.
    Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882
  5. 5.
    dos Santos CN, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of 24th international conference on computational linguistics, pp 69–78Google Scholar
  6. 6.
    Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657Google Scholar
  7. 7.
    Mikolov T, et al. (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119Google Scholar
  8. 8.
    Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, pp 655–665Google Scholar
  9. 9.
    Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S et al (2015) DBPedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semant Web 6(2):167–195Google Scholar
  10. 10.

Copyright information

© Springer Science+Business Media Singapore 2018

Authors and Affiliations

  • Xiaohan Ma
    • 1
  • Rize Jin
    • 1
  • Joon-Young Paik
    • 1
  • Tae-Sun Chung
    • 1
  1. 1.Computer EngineeringAjou UniversitySuwonSouth Korea

Personalised recommendations