Abstract
Imbalanced data commonly exists in real world, especially in sentiment-related corpus, making it difficult to train a classifier to distinguish latent sentiment in text data. We observe that humans often express transitional emotion between two adjacent discourses with discourse markers like “but”, “though”, “while”, etc., and the head discourse and the tail discourse usually indicate opposite emotional tendencies. Based on this observation, we propose a novel plug-and-play method, which first samples discourses according to transitional discourse markers and then validates sentimental polarities with the help of a pre-trained attention-based model. Our method increases sample diversity in the first place, obtaining a expanded dataset with relatively low imbalanced-ratio, can serve as a upstream preprocessing part in data augmentation. We conduct experiments on three public sentiment datasets, with several frequently used algorithms. Results show that our method is found to be consistently effective, even in highly imbalanced scenario, and easily be integrated with oversampling method to boost the performance on imbalanced sentiment classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We define imbalanced-ratio to be: number of samples in majority class/number of samples in minority class.
- 2.
In this paper, we use the term “head discourse” to denote the sentence before the discourse marker and “tail discourse” to denote the sentence after the discourse marker.
- 3.
In this paper, we only discuss relatively short sentences. The discourse structure in long sentences may be complex and we leave it for future work.
- 4.
- 5.
In binary classification scenario, 50% accuracy means random guess.
References
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Dos Santos, C., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 69–78 (2014)
Fadaee, M., Bisazza, A., Monz, C.: Data augmentation for low-resource neural machine translation. arXiv preprint arXiv:1705.00440 (2017)
He, R., McAuley, J.: Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of the 25th International Conference on World Wide Web, pp. 507–517. International World Wide Web Conferences Steering Committee (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177. ACM (2004)
Hu, Z., Yang, Z., Liang, X., Salakhutdinov, R., Xing, E.P.: Toward controlled generation of text. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1587–1596. JMLR. org (2017)
Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. arXiv preprint arXiv:1707.07328 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Li, S., Zhou, G., Wang, Z., Lee, S.Y.M., Wang, R.: Imbalanced sentiment classification. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 2469–2472. ACM (2011)
Liu, B., Zhang, M., Ma, W., Li, X., Liu, Y., Ma, S.: A two-step information accumulation strategy for learning from highly imbalanced data. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1289–1298. ACM (2017)
Ng, W.W., Hu, J., Yeung, D.S., Yin, S., Roli, F.: Diversified sensitivity-based undersampling for imbalance classification problems. IEEE Trans. Cybern. 45(11), 2402–2412 (2015)
Nie, A., Bennett, E.D., Goodman, N.D.: DisSent: sentence representation learning from explicit discourse relations. CoRR abs/1710.04334 (2017). http://arxiv.org/abs/1710.04334
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Tang, D., Qin, B., Feng, X., Liu, T.: Effective LSTMs for target-dependent sentiment classification. arXiv preprint arXiv:1512.01100 (2015)
Tang, D., Qin, B., Liu, T.: Aspect level sentiment classification with deep memory network. arXiv preprint arXiv:1605.08900 (2016)
Wu, F., Wu, C., Liu, J.: Imbalanced sentiment classification with multi-task learning. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 1631–1634. ACM (2018)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)
Acknowledgments
This work is supported by the National Key Research and Development Program of China (No. 2017YFB1010001). We also appreciate the valuable comments from anonymous reviewers.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, T., Wu, X., Lin, M., Han, J., Hu, S. (2019). Imbalanced Sentiment Classification Enhanced with Discourse Marker. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series. ICANN 2019. Lecture Notes in Computer Science(), vol 11730. Springer, Cham. https://doi.org/10.1007/978-3-030-30490-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-30490-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30489-8
Online ISBN: 978-3-030-30490-4
eBook Packages: Computer ScienceComputer Science (R0)