Skip to main content
Log in

A novel approach to generate a large scale of supervised data for short text sentiment analysis

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

As for the complexity of language structure, the semantic structure, and the relative scarcity of labeled data and context information, sentiment analysis has been regarded as a challenging task in Natural Language Processing especially in the field of short-text processing. Deep learning model need a large scale of training data to overcome data sparseness and the over-fitting problem, we propose multi-granularity text-oriented data augmentation technologies to generate large-scale artificial data for training model, which is compared with Generative adversarial network(GAN). In this paper, a novel hybrid neural network model architecture(LSCNN) was proposed with our data augmentation technology, which is can outperforms many single neural network models. The proposed data augmentation method enhances the generalization ability of the proposed model. Experiment results show that the proposed data augmentation method in combination with the neural networks model can achieve astonishing performance without any handcrafted features on sentiment analysis or short text classification. It was validated on a Chinese on-line comment dataset and Chinese news headline corpus, and outperforms many state-of-the-art models. Evidence shows that the proposed data argumentation technology can obtain more accurate distribution representation from data for deep learning, which improves the generalization characteristics of the extracted features. The combination of the data argumentation technology and LSCNN fusion model is well suited to short text sentiment analysis, especially on small scale corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://nlp.stanford.edu/software/lex-parser.shtml#Download

  2. https://code.google.com/archive/p/word2vec/

  3. http://code.google.com/p/ik-analyzer/downloads/list

  4. http://www.datatang.com/data/11970

  5. http://www.datatang.com/data/11970

  6. https://github.com/JerrikEph/nlpcc

References

  1. Chen H (2013) Classification of commodity evaluation based on parsing. Shanghai Jiao Tong University

  2. Collobert R et al (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(1):2493–2537

    MATH  Google Scholar 

  3. Fawzi A et al (2016) Adaptive data augmentation for image classification. In: IEEE international conference on image processing IEEE, pp 3688–3692

  4. Glover J (2016) Modeling documents with generative adversarial networks. arXiv:1612.09122

  5. Goodfellow IJ, Pouget-Abadie J, Mirza M et al (2014) Generative adversarial Networks[J]. Adv Neural Inf Proces Syst 3:2672–2680

    Google Scholar 

  6. Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 18 (5–6):602–610

    Article  Google Scholar 

  7. Hochreiter S, Schmidhuber J (2012) Long short-term memory. Neural Comput 9(8):1735

    Article  Google Scholar 

  8. Hua L (2014) Study on chinese text sentiment classification. Chongqing University

  9. Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. Eprint Arxiv 1

  10. Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv:1506.02078

  11. Kim Y (2014) Convolutional neural networks for sentence classification. Eprint Arxiv

  12. Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv:1312.6114

  13. Kiritchenko S et al (2014) NRC-Canada-2014: detecting aspects and sentiment in customer reviews. In: International workshop on semantic evaluation, pp 437–442

  14. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems curran associates inc., pp 1097–1105

  15. Le QV, Mikolov T (2014) Distributed representations of sentences and documents. Computer Science 4:1188–1196

    Google Scholar 

  16. Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. arXiv:1506.00019

  17. Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Meeting on association for computational linguistics association for computational linguistics, pp 115–124

  18. Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135

    Article  Google Scholar 

  19. Rezende DJ, Mohamed S, Wierstra D (2014) Stochastic backpropagation and approximate inference in deep generative Models[J]. Eprint Arxiv, 1278–1286

  20. Rosario B, Hearst MA (2004) Classifying semantic relations in bioscience text. Meeting of the association for computational linguistics, 21-26 July, 2004, Barcelona, Spain DBLP, 430–437

  21. Ruder S, Ghaffari P, Breslin JG (2016) Insight-1 at semeval-2016 task 5: Deep learning for multilingual aspect-based sentiment analysis. arXiv:1609.02748

  22. Xiang R, Sun M (2016) Sentiment analysis of Chinese sentences based on word embedding and syntax tree[J]. Computer and Modernization 8:27–31

  23. Russell EWB (2015) Real-time topic and sentiment analysis in human-robot conversation. Dissertations & Theses - Gradworks

  24. Salamon J, Bello J (2016) Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process Lett 99:1–1

    Google Scholar 

  25. Srivastava N et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  26. Sun X, Li C, Ren F (2016) Sentiment analysis for Chinese microblog based on deep neural networks with convolutional extension features. Neurocomputing 210:227–236

    Article  Google Scholar 

  27. Sun X, Pan D, Ren F (2016) Facial expression recognition using ROI-KNN deep convolutional neural networks. Automation Journal 42(6):883–891

    Google Scholar 

  28. Tang D, Qin B, Liu T (2016) Aspect level sentiment classification with deep memory network. arXiv:1605.08900

  29. Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(10):988–999

    Article  Google Scholar 

  30. Wang KK (2015) Image Classification with Pyramid Representation and Rotated Data Augmentation on Torch 7 [EB/OL]. https://hgpu.org/?p=13858

  31. Wang J et al (2016) Dimensional sentiment analysis using a regional CNN-LSTM model. In: Meeting of the association for computational linguistics, pp 225–230

  32. Yu L, Zhang W, Wang J et al (2017) SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient[C]. AAAI, 2852–2858

  33. Zhang L, Han Y, Yang Y, Song M, Yan S, Tian Q (2013) Discovering discriminative graphlets for aerial image categories recognition. IEEE T-IP 22(12):5071–5084

    Article  MathSciNet  Google Scholar 

  34. Zhang L, Gao Y, Hong C, Feng Y, Zhu J, Cai D (2014) Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition. IEEE T-CYB 44(8):1408–1419

    Google Scholar 

  35. Zhang L, Gao Y, Ji R, Dai Q, Li X (2014) Actively learning human gaze shifting paths for photo cropping. IEEE T-IP 23(5):2235–2245

    Article  MathSciNet  Google Scholar 

  36. Zhang L, Song M, Yang Y, Zhao Q, Zhao C, Sebe N (2014) Weakly supervised photo cropping. IEEE T-MM 16(1):94–107

    Google Scholar 

  37. Zhang L, Hong R, Gao Y, Ji R, Dai Q, Li X (2016) Image categorization by learning a propagated graphlet path. IEEE T-NNLS 27(3):674–685

    MathSciNet  Google Scholar 

  38. Zhang L, Li X, Nie L, Yan Y (2016) Roger zimmermann, semantic photo retargeting under noisy image labels. ACM TOMCCAP 12(3):37

    Google Scholar 

  39. Zhang L, Wang M, Hong R, Yin B -C, Li X (2016) Large-scale aerial image categorization using a multitask topological codebook. IEEE T-CYB 46 (2):535–545

    Google Scholar 

  40. Zhang X, Lecun Y (2015) Text understanding from scratch. arXiv:1502.01710

  41. Zhang Y, Marshall I, Wallace BC (2016) Rationale-augmented convolutional neural networks for text classification. EMNLP 2016:795

    Google Scholar 

  42. Zhou C et al (2015) A c-LSTM neural network for text classification. Computer Science 1(4):39–44

    MathSciNet  Google Scholar 

Download references

Acknowledgment

The work is supported by the Natural Science Foundation of Anhui Province (1508085QF119) and State Key Program of National Natural Science of China (61432004, 71571058, 61461045). This work was partially supported by the China Postdoctoral Science Foundation funded project (No.2015M580532 and No.2017T100447). This research has been partially supported by National Natural Science Foundation of China under Grant No.61472117.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao Sun.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, X., He, J. A novel approach to generate a large scale of supervised data for short text sentiment analysis. Multimed Tools Appl 79, 5439–5459 (2020). https://doi.org/10.1007/s11042-018-5748-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5748-4

Keywords

Navigation