Imbalanced Sentiment Classification Enhanced with Discourse Marker

Zhang, Tao; Wu, Xing; Lin, Meng; Han, Jizhong; Hu, Songlin

doi:10.1007/978-3-030-30490-4_11

Tao Zhang^12,13,
Xing Wu^12,13,
Meng Lin¹²,
Jizhong Han¹² &
…
Songlin Hu^12,13

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11730))

Included in the following conference series:

International Conference on Artificial Neural Networks

4550 Accesses
2 Citations

Abstract

Imbalanced data commonly exists in real world, especially in sentiment-related corpus, making it difficult to train a classifier to distinguish latent sentiment in text data. We observe that humans often express transitional emotion between two adjacent discourses with discourse markers like “but”, “though”, “while”, etc., and the head discourse and the tail discourse usually indicate opposite emotional tendencies. Based on this observation, we propose a novel plug-and-play method, which first samples discourses according to transitional discourse markers and then validates sentimental polarities with the help of a pre-trained attention-based model. Our method increases sample diversity in the first place, obtaining a expanded dataset with relatively low imbalanced-ratio, can serve as a upstream preprocessing part in data augmentation. We conduct experiments on three public sentiment datasets, with several frequently used algorithms. Results show that our method is found to be consistently effective, even in highly imbalanced scenario, and easily be integrated with oversampling method to boost the performance on imbalanced sentiment classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We define imbalanced-ratio to be: number of samples in majority class/number of samples in minority class.
2.
In this paper, we use the term “head discourse” to denote the sentence before the discourse marker and “tail discourse” to denote the sentence after the discourse marker.
3.
In this paper, we only discuss relatively short sentences. The discourse structure in long sentences may be complex and we leave it for future work.
4.
https://github.com/mmihaltz/word2vec-GoogleNews-vectors.
5.
In binary classification scenario, 50% accuracy means random guess.

References

Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Dos Santos, C., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 69–78 (2014)
Google Scholar
Fadaee, M., Bisazza, A., Monz, C.: Data augmentation for low-resource neural machine translation. arXiv preprint arXiv:1705.00440 (2017)
He, R., McAuley, J.: Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of the 25th International Conference on World Wide Web, pp. 507–517. International World Wide Web Conferences Steering Committee (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177. ACM (2004)
Google Scholar
Hu, Z., Yang, Z., Liang, X., Salakhutdinov, R., Xing, E.P.: Toward controlled generation of text. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1587–1596. JMLR. org (2017)
Google Scholar
Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. arXiv preprint arXiv:1707.07328 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Li, S., Zhou, G., Wang, Z., Lee, S.Y.M., Wang, R.: Imbalanced sentiment classification. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 2469–2472. ACM (2011)
Google Scholar
Liu, B., Zhang, M., Ma, W., Li, X., Liu, Y., Ma, S.: A two-step information accumulation strategy for learning from highly imbalanced data. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1289–1298. ACM (2017)
Google Scholar
Ng, W.W., Hu, J., Yeung, D.S., Yin, S., Roli, F.: Diversified sensitivity-based undersampling for imbalance classification problems. IEEE Trans. Cybern. 45(11), 2402–2412 (2015)
Article Google Scholar
Nie, A., Bennett, E.D., Goodman, N.D.: DisSent: sentence representation learning from explicit discourse relations. CoRR abs/1710.04334 (2017). http://arxiv.org/abs/1710.04334
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Google Scholar
Tang, D., Qin, B., Feng, X., Liu, T.: Effective LSTMs for target-dependent sentiment classification. arXiv preprint arXiv:1512.01100 (2015)
Tang, D., Qin, B., Liu, T.: Aspect level sentiment classification with deep memory network. arXiv preprint arXiv:1605.08900 (2016)
Wu, F., Wu, C., Liu, J.: Imbalanced sentiment classification with multi-task learning. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 1631–1634. ACM (2018)
Google Scholar
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)
Google Scholar

Download references

Acknowledgments

This work is supported by the National Key Research and Development Program of China (No. 2017YFB1010001). We also appreciate the valuable comments from anonymous reviewers.

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Tao Zhang, Xing Wu, Meng Lin, Jizhong Han & Songlin Hu
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Tao Zhang, Xing Wu & Songlin Hu

Authors

Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Meng Lin
View author publications
You can also search for this author in PubMed Google Scholar
Jizhong Han
View author publications
You can also search for this author in PubMed Google Scholar
Songlin Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meng Lin .

Editor information

Editors and Affiliations

Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Igor V. Tetko
Institute of Computer Science, Czech Academy of Sciences, Prague 8, Czech Republic
Věra Kůrková
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Pavel Karpov
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Fabian Theis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, T., Wu, X., Lin, M., Han, J., Hu, S. (2019). Imbalanced Sentiment Classification Enhanced with Discourse Marker. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series. ICANN 2019. Lecture Notes in Computer Science(), vol 11730. Springer, Cham. https://doi.org/10.1007/978-3-030-30490-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-30490-4_11
Published: 09 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30489-8
Online ISBN: 978-3-030-30490-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics