A novel approach to generate a large scale of supervised data for short text sentiment analysis

Sun, Xiao; He, Jiajin

doi:10.1007/s11042-018-5748-4

A novel approach to generate a large scale of supervised data for short text sentiment analysis

Published: 12 February 2018

Volume 79, pages 5439–5459, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

1517 Accesses
34 Citations
Explore all metrics

Abstract

As for the complexity of language structure, the semantic structure, and the relative scarcity of labeled data and context information, sentiment analysis has been regarded as a challenging task in Natural Language Processing especially in the field of short-text processing. Deep learning model need a large scale of training data to overcome data sparseness and the over-fitting problem, we propose multi-granularity text-oriented data augmentation technologies to generate large-scale artificial data for training model, which is compared with Generative adversarial network(GAN). In this paper, a novel hybrid neural network model architecture(LSCNN) was proposed with our data augmentation technology, which is can outperforms many single neural network models. The proposed data augmentation method enhances the generalization ability of the proposed model. Experiment results show that the proposed data augmentation method in combination with the neural networks model can achieve astonishing performance without any handcrafted features on sentiment analysis or short text classification. It was validated on a Chinese on-line comment dataset and Chinese news headline corpus, and outperforms many state-of-the-art models. Evidence shows that the proposed data argumentation technology can obtain more accurate distribution representation from data for deep learning, which improves the generalization characteristics of the extracted features. The combination of the data argumentation technology and LSCNN fusion model is well suited to short text sentiment analysis, especially on small scale corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentiment Analysis in the Age of Generative AI

Article Open access 05 March 2024

Impact of word embedding models on text analytics in deep learning environment: a review

Article 22 February 2023

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Article Open access 05 March 2024

Notes

References

Chen H (2013) Classification of commodity evaluation based on parsing. Shanghai Jiao Tong University
Collobert R et al (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(1):2493–2537
MATH Google Scholar
Fawzi A et al (2016) Adaptive data augmentation for image classification. In: IEEE international conference on image processing IEEE, pp 3688–3692
Glover J (2016) Modeling documents with generative adversarial networks. arXiv:1612.09122
Goodfellow IJ, Pouget-Abadie J, Mirza M et al (2014) Generative adversarial Networks[J]. Adv Neural Inf Proces Syst 3:2672–2680
Google Scholar
Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 18 (5–6):602–610
Article Google Scholar
Hochreiter S, Schmidhuber J (2012) Long short-term memory. Neural Comput 9(8):1735
Article Google Scholar
Hua L (2014) Study on chinese text sentiment classification. Chongqing University
Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. Eprint Arxiv 1
Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv:1506.02078
Kim Y (2014) Convolutional neural networks for sentence classification. Eprint Arxiv
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv:1312.6114
Kiritchenko S et al (2014) NRC-Canada-2014: detecting aspects and sentiment in customer reviews. In: International workshop on semantic evaluation, pp 437–442
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems curran associates inc., pp 1097–1105
Le QV, Mikolov T (2014) Distributed representations of sentences and documents. Computer Science 4:1188–1196
Google Scholar
Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. arXiv:1506.00019
Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Meeting on association for computational linguistics association for computational linguistics, pp 115–124
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
Article Google Scholar
Rezende DJ, Mohamed S, Wierstra D (2014) Stochastic backpropagation and approximate inference in deep generative Models[J]. Eprint Arxiv, 1278–1286
Rosario B, Hearst MA (2004) Classifying semantic relations in bioscience text. Meeting of the association for computational linguistics, 21-26 July, 2004, Barcelona, Spain DBLP, 430–437
Ruder S, Ghaffari P, Breslin JG (2016) Insight-1 at semeval-2016 task 5: Deep learning for multilingual aspect-based sentiment analysis. arXiv:1609.02748
Xiang R, Sun M (2016) Sentiment analysis of Chinese sentences based on word embedding and syntax tree[J]. Computer and Modernization 8:27–31
Russell EWB (2015) Real-time topic and sentiment analysis in human-robot conversation. Dissertations & Theses - Gradworks
Salamon J, Bello J (2016) Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process Lett 99:1–1
Google Scholar
Srivastava N et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Sun X, Li C, Ren F (2016) Sentiment analysis for Chinese microblog based on deep neural networks with convolutional extension features. Neurocomputing 210:227–236
Article Google Scholar
Sun X, Pan D, Ren F (2016) Facial expression recognition using ROI-KNN deep convolutional neural networks. Automation Journal 42(6):883–891
Google Scholar
Tang D, Qin B, Liu T (2016) Aspect level sentiment classification with deep memory network. arXiv:1605.08900
Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(10):988–999
Article Google Scholar
Wang KK (2015) Image Classification with Pyramid Representation and Rotated Data Augmentation on Torch 7 [EB/OL]. https://hgpu.org/?p=13858
Wang J et al (2016) Dimensional sentiment analysis using a regional CNN-LSTM model. In: Meeting of the association for computational linguistics, pp 225–230
Yu L, Zhang W, Wang J et al (2017) SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient[C]. AAAI, 2852–2858
Zhang L, Han Y, Yang Y, Song M, Yan S, Tian Q (2013) Discovering discriminative graphlets for aerial image categories recognition. IEEE T-IP 22(12):5071–5084
Article MathSciNet Google Scholar
Zhang L, Gao Y, Hong C, Feng Y, Zhu J, Cai D (2014) Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition. IEEE T-CYB 44(8):1408–1419
Google Scholar
Zhang L, Gao Y, Ji R, Dai Q, Li X (2014) Actively learning human gaze shifting paths for photo cropping. IEEE T-IP 23(5):2235–2245
Article MathSciNet Google Scholar
Zhang L, Song M, Yang Y, Zhao Q, Zhao C, Sebe N (2014) Weakly supervised photo cropping. IEEE T-MM 16(1):94–107
Google Scholar
Zhang L, Hong R, Gao Y, Ji R, Dai Q, Li X (2016) Image categorization by learning a propagated graphlet path. IEEE T-NNLS 27(3):674–685
MathSciNet Google Scholar
Zhang L, Li X, Nie L, Yan Y (2016) Roger zimmermann, semantic photo retargeting under noisy image labels. ACM TOMCCAP 12(3):37
Google Scholar
Zhang L, Wang M, Hong R, Yin B -C, Li X (2016) Large-scale aerial image categorization using a multitask topological codebook. IEEE T-CYB 46 (2):535–545
Google Scholar
Zhang X, Lecun Y (2015) Text understanding from scratch. arXiv:1502.01710
Zhang Y, Marshall I, Wallace BC (2016) Rationale-augmented convolutional neural networks for text classification. EMNLP 2016:795
Google Scholar
Zhou C et al (2015) A c-LSTM neural network for text classification. Computer Science 1(4):39–44
MathSciNet Google Scholar

Download references

Acknowledgment

The work is supported by the Natural Science Foundation of Anhui Province (1508085QF119) and State Key Program of National Natural Science of China (61432004, 71571058, 61461045). This work was partially supported by the China Postdoctoral Science Foundation funded project (No.2015M580532 and No.2017T100447). This research has been partially supported by National Natural Science Foundation of China under Grant No.61472117.

Author information

Authors and Affiliations

School of Computer and Information, Hefei University of Technology, No. 193 TunXi Road, BaoHe District, Hefei, China
Xiao Sun & Jiajin He

Authors

Xiao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jiajin He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao Sun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, X., He, J. A novel approach to generate a large scale of supervised data for short text sentiment analysis. Multimed Tools Appl 79, 5439–5459 (2020). https://doi.org/10.1007/s11042-018-5748-4

Download citation

Received: 25 December 2017
Revised: 19 January 2018
Accepted: 01 February 2018
Published: 12 February 2018
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11042-018-5748-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel approach to generate a large scale of supervised data for short text sentiment analysis

Abstract

Access this article

Similar content being viewed by others

Sentiment Analysis in the Age of Generative AI

Impact of word embedding models on text analytics in deep learning environment: a review

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel approach to generate a large scale of supervised data for short text sentiment analysis

Abstract

Access this article

Similar content being viewed by others

Sentiment Analysis in the Age of Generative AI

Impact of word embedding models on text analytics in deep learning environment: a review

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation