Skip to main content

A Sentence Vector Based Over-Sampling Method for Imbalanced Emotion Classification

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

Abstract

Imbalanced training data poses a serious problem for supervised learning based text classification. Such a problem becomes more serious in emotion classification task with multiple emotion categories as the training data can be quite skewed. This paper presents a novel over-sampling method to form additional sum sentence vectors for minority classes in order to improve emotion classification for imbalanced data. Firstly, a large corpus is used to train a continuous skip-gram model to form each word vector using word/POS pair as the unit of word vector. The sentence vectors of the training data are then constructed as the sum vector of their word/POS vectors. The new minority class training samples are then generated by randomly add two sentence vectors in the corresponding class until the training samples for each class are the same so that the classifiers can be trained on fully balanced training dataset. Evaluations on NLP&CC2013 Chinese micro blog emotion classification dataset shows that the obtained classifier achieves 48.4% average precision, an 11.9 percent improvement over the state-of-art performance on this dataset (at 36.5%). This result shows that the proposed over-sampling method can effectively address the problem of data imbalance and thus achieve much improved performance for emotion classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Turney, P.-D.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: Proceedings of ACL 2002, pp. 417–424 (2002)

    Google Scholar 

  2. Kamps, J., Marx, M., Mokken, R.-J., de Rijke, M.: Using WordNet to Measure Semantic Orientation of Adjectives. In: Proceedings of LREC 2004, pp. 1115–1118 (2004)

    Google Scholar 

  3. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: Proceedings of EMNLP 2002, pp. 79–86 (2002)

    Google Scholar 

  4. Gu, X.-J., Wang, Z.-L., Liu, J.-W., Liu, S.: Research on Modeling Artificial Psychology Based on HMM. Application Research of Computers 12, 30–32 (2006)

    Google Scholar 

  5. Quan, C., Ren, F.: Construction of a Blog Emotion Corpus for Chinese Emotional Expression Analysis. In: Proceedings of EMNLP 2009, pp. 1446–1454 (2009)

    Google Scholar 

  6. Chawla, N.V., Japkowicz, N., Kolcz, A.: Editorial: Special Issue on Learning from Imbalanced Data Sets. SIGKDD Explorations 6(1), 1–6 (2004)

    Article  Google Scholar 

  7. Zhou, Z.-H., Liu, X.-Y.: Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem. Knowledge and Data Engineering 18(1), 63–77 (2006)

    Article  Google Scholar 

  8. Ertekin, S., Huang, J., Bottou, L., Giles, C.-L.: Learning on the Border: Active Learning in Imbalanced Data Classification. In: Proceedings of CIKM 2007 (2007)

    Google Scholar 

  9. Chen, T., Xu, R., Wu, M., Liu, B.: A Sentiment Classification Approach based on Sentiment Sentence Framework. Journal of Chinese Information Processing 27(5), 67–74 (2013)

    Google Scholar 

  10. Ren, J.-W., Yang, Y., Wang, H., Lin, H.: Construction of the Binary Affective Commonsense Knowledgebase and its Application in Text Affective Analysis. China Science Paper Online (2013), http://www.paper.edu.cn/releasepaper/content/201301-158

  11. Longadge, R., Dongre, S.-S., Malik, L.: Class Imbalance Problem in Data Mining Review. International Journal of Computer Science and Network 2(1), 1305–1707 (2013)

    Google Scholar 

  12. Wang, Z.-Q., Li, S.-S., Zhu, Q.-M., Li, P.-F., Zhou, G.-D.: Chinese Sentiment Classification on Imbalanced Data Distribution. Journal of Chinese Information Processing 26(3), 33–37 (2012)

    Google Scholar 

  13. Deerwester, S., Dumais, S.-T., Furnas, G.-W., Landauer, T.-K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  14. Bellegarda, J.-R.: A Latent Semantic Analysis Framework for Large–span Language Modeling. In: Proceedings of Eurospeech 1997, pp. 1451–1454 (1997)

    Google Scholar 

  15. Blei, D.-M., Ng, A.-Y., Jordan, M.-I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  16. Riis, S., Krogh, A.: Improving Protein Secondary Structure Prediction using Structured Neural Networks and Multiple Sequence Profiles. Journal of Computational Biology, 163–183 (1996)

    Google Scholar 

  17. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. In: Proceedings of ICLR Workshop (2013)

    Google Scholar 

  18. Han, J., Kamber, M.: Data mining: Concepts and Technique. Morgan Kaufman, San Francisco (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, T. et al. (2014). A Sentence Vector Based Over-Sampling Method for Imbalanced Emotion Classification. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54903-8_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54902-1

  • Online ISBN: 978-3-642-54903-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics