Skip to main content

Imbalanced Sentiment Classification Enhanced with Discourse Marker

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series (ICANN 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11730))

Included in the following conference series:

Abstract

Imbalanced data commonly exists in real world, especially in sentiment-related corpus, making it difficult to train a classifier to distinguish latent sentiment in text data. We observe that humans often express transitional emotion between two adjacent discourses with discourse markers like “but”, “though”, “while”, etc., and the head discourse and the tail discourse usually indicate opposite emotional tendencies. Based on this observation, we propose a novel plug-and-play method, which first samples discourses according to transitional discourse markers and then validates sentimental polarities with the help of a pre-trained attention-based model. Our method increases sample diversity in the first place, obtaining a expanded dataset with relatively low imbalanced-ratio, can serve as a upstream preprocessing part in data augmentation. We conduct experiments on three public sentiment datasets, with several frequently used algorithms. Results show that our method is found to be consistently effective, even in highly imbalanced scenario, and easily be integrated with oversampling method to boost the performance on imbalanced sentiment classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We define imbalanced-ratio to be: number of samples in majority class/number of samples in minority class.

  2. 2.

    In this paper, we use the term “head discourse” to denote the sentence before the discourse marker and “tail discourse” to denote the sentence after the discourse marker.

  3. 3.

    In this paper, we only discuss relatively short sentences. The discourse structure in long sentences may be complex and we leave it for future work.

  4. 4.

    https://github.com/mmihaltz/word2vec-GoogleNews-vectors.

  5. 5.

    In binary classification scenario, 50% accuracy means random guess.

References

  1. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  2. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805

  3. Dos Santos, C., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 69–78 (2014)

    Google Scholar 

  4. Fadaee, M., Bisazza, A., Monz, C.: Data augmentation for low-resource neural machine translation. arXiv preprint arXiv:1705.00440 (2017)

  5. He, R., McAuley, J.: Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of the 25th International Conference on World Wide Web, pp. 507–517. International World Wide Web Conferences Steering Committee (2016)

    Google Scholar 

  6. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  7. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177. ACM (2004)

    Google Scholar 

  8. Hu, Z., Yang, Z., Liang, X., Salakhutdinov, R., Xing, E.P.: Toward controlled generation of text. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1587–1596. JMLR. org (2017)

    Google Scholar 

  9. Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. arXiv preprint arXiv:1707.07328 (2017)

  10. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  11. Li, S., Zhou, G., Wang, Z., Lee, S.Y.M., Wang, R.: Imbalanced sentiment classification. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 2469–2472. ACM (2011)

    Google Scholar 

  12. Liu, B., Zhang, M., Ma, W., Li, X., Liu, Y., Ma, S.: A two-step information accumulation strategy for learning from highly imbalanced data. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1289–1298. ACM (2017)

    Google Scholar 

  13. Ng, W.W., Hu, J., Yeung, D.S., Yin, S., Roli, F.: Diversified sensitivity-based undersampling for imbalance classification problems. IEEE Trans. Cybern. 45(11), 2402–2412 (2015)

    Article  Google Scholar 

  14. Nie, A., Bennett, E.D., Goodman, N.D.: DisSent: sentence representation learning from explicit discourse relations. CoRR abs/1710.04334 (2017). http://arxiv.org/abs/1710.04334

  15. Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005)

    Google Scholar 

  16. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  17. Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)

    Google Scholar 

  18. Tang, D., Qin, B., Feng, X., Liu, T.: Effective LSTMs for target-dependent sentiment classification. arXiv preprint arXiv:1512.01100 (2015)

  19. Tang, D., Qin, B., Liu, T.: Aspect level sentiment classification with deep memory network. arXiv preprint arXiv:1605.08900 (2016)

  20. Wu, F., Wu, C., Liu, J.: Imbalanced sentiment classification with multi-task learning. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 1631–1634. ACM (2018)

    Google Scholar 

  21. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the National Key Research and Development Program of China (No. 2017YFB1010001). We also appreciate the valuable comments from anonymous reviewers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meng Lin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, T., Wu, X., Lin, M., Han, J., Hu, S. (2019). Imbalanced Sentiment Classification Enhanced with Discourse Marker. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series. ICANN 2019. Lecture Notes in Computer Science(), vol 11730. Springer, Cham. https://doi.org/10.1007/978-3-030-30490-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30490-4_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30489-8

  • Online ISBN: 978-3-030-30490-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics