Skip to main content

Predicting Emotion Labels for Chinese Microblog Texts

Part of the Studies in Computational Intelligence book series (SCI,volume 602)

Abstract

We describe an experiment into detecting emotions in texts on the Chinese microblog service Sina Weibo (www.weibo.com) using distant supervision via various author-supplied emotion labels (emoticons and smilies). Existing word segmentation tools proved unreliable; better accuracy was achieved using character-based features. Higher-order n-grams proved to be useful features. Accuracy varied according to label and emotion: while smilies are used more often, emoticons are more reliable. Happiness is the most accurately predicted emotion, with accuracies around 90 % on both distant and gold-standard labels. This approach works well and achieves high accuracies for happiness and anger, while it is less effective for sadness, surprise, disgust and fear, which are also difficult for human annotators to detect.

Keywords

  • Sentiment Analysis
  • Word Segmentation
  • Social Media Data
  • Lexical Feature
  • Human Annotator

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-18458-6_7
  • Chapter length: 21 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-18458-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   109.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. 1.

    http://digg.com.

  2. 2.

    http://open.weibo.com/wiki/API/en.

  3. 3.

    http://open.weibo.com/wiki/2/statuses/public_timeline/en.

  4. 4.

    http://www.mongodb.org/.

  5. 5.

    Available at: http://www.sojump.com/jq/1935017.aspx?npb=1.

  6. 6.

    https://code.google.com/p/pymmseg-cpp/.

  7. 7.

    https://code.google.com/p/smallseg/.

  8. 8.

    https://nlp.stanford.edu/software/segmenter.shtml.

  9. 9.

    That is how we constructed our training datasets for previous experiments.

  10. 10.

    https://www.mturk.com/mturk/welcome.

References

  1. Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G. (2008) Finding high-quality content in social media. In: Proceedings of the 2008 International Conference on Web Search and Data Mining (WSDM’08). pp. 183–194

    Google Scholar 

  2. Bloodgood, M., Callison-Burch, C.: Bucking the trend: large-scale cost-focused active learning for statistical machine translation. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 854–864. Uppsala, Sweden (2010)

    Google Scholar 

  3. Chang, C., Lin, C.: LIBSVM: a library for Support Vector Machines (2001). http://www.csie.ntu.edu.tw/cjlin/papers/libsvm.pdf. Cited 4 Feb 2014

  4. Chen, K., Liu, S.: Word identification for Mandarin Chinese sentences. In: Proceedings of the 14th Conference on Computational Linguistics, (1992), vol. 1, pp. 101–107

    Google Scholar 

  5. China Internet Network Information Center (CINIC).: The 32nd Statistical Report on Internet Development in China (2013). http://www1.cnnic.cn/IDR/ReportDownloads/201310/P020131029430558704972.pdf. Cited 2 Feb 2014

  6. China, SINA Corporation (SINA) Q3 2013 Earnings Conference Call (2013). http://seekingalpha.com/article/1835112-sina-corporations-ceo-discusses-q3-2013-results-earnings-call-transcript. Cited 2 Feb 2014

  7. Callison-Burch, C.: Fast, Cheap, and Creative: evaluating translation quality using Amazons mechanical turk. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP-2009), pp. 286–295. Singapore (2009)

    Google Scholar 

  8. Chuang, Z., Wu, C.: Multimodal emotion recognition from speech and text. Comput. Linguist. Chin. Lang. 9(2), 45–62 (2004)

    Google Scholar 

  9. Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: WWW2003, pp. 519–528

    Google Scholar 

  10. Derks, D., Bos, A., von Grumbkow, J.: Emoticons and online message interpretation. Soc. Sci. Comput. Rev. 26(3), 379–388 (2008)

    CrossRef  Google Scholar 

  11. Ekman, P.: Universal facial expressions of emotion. In: California Mental Health Research Digest, vol. 8, no. 4 (1970)

    Google Scholar 

  12. Fan, C., Tsai, W.: Automatic word identification in Chinese sentences by the relaxation technique. In: Computer Processing of Chinese and Oriental Languages (1988)

    Google Scholar 

  13. Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(2008), 1871–1874 (2008)

    MATH  Google Scholar 

  14. Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)

    MATH  Google Scholar 

  15. Gan, K., Palmer, M., Lua, K.: A statistically emergent approach for language processing: application to modeling context effects in ambiguous Chinese word boundary perception. Comput. Linguist. 22(4), 53153 (1996)

    Google Scholar 

  16. Geisser, S.: The predictive sample reuse method with applications. In: Journal of the American Statistical Association, pp. 320–328 (1975)

    Google Scholar 

  17. Go, A., Bhayani, R., Huang, L.: Twitter Sentiment Classification using Distant Supervision. Master’s thesis, Stanford University (2009)

    Google Scholar 

  18. Guo, J.: Critical tokenization and its properties. Comput. Linguist. 23(4), 569596 (1997)

    Google Scholar 

  19. Hatzivassiloglou, V., Wiebe, J.M.: Effects of adjective orientation and gradability on sentence subjectivity. In: Proceedings of the 18th International Conference on Computational Linguistics (2000)

    Google Scholar 

  20. Jiang, W., Huang, L., Liu, Q.: Automatic adaptation of annotation standards: Chinese word segmentation and pos tagging a case study. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 522–530. Suntec, Singapore (2009)

    Google Scholar 

  21. Jin, W., Chen, L.: Identifying unknown words in Chinese corpora. In: First Workshop on Chinese Language, University of Pennsylvania, Philadelphia (1998)

    Google Scholar 

  22. Joachims, T.: Text categorization with suport vector machines: learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning (ECML’08), pp. 137–142 (1998)

    Google Scholar 

  23. Kayan, S., Fussell, S.R., Setlock, L.D.: Cultural differences in the use of instant messaging in Asia and North America. In: Proceedings of the 20th Anniversary Conference on Computer Supported Cooperative Work (CSCW’06), pp. 525–528. Banff, Alberta, Canada (2006)

    Google Scholar 

  24. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI). Morgan Kaufmann, San Mateo (1995)

    Google Scholar 

  25. Nakov, P.: Noun compound interpretation using paraphrasing verbs: feasibility study. In: Proceedings of the 13th International Conference on Artificial Intelligence: Methodology, Systems and Applications (AIMSA 2008), pp. 103–117

    Google Scholar 

  26. Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC’10). Valletta, Malta (2010)

    Google Scholar 

  27. Pang, B., Lee, L.: Opinion mining and sentiment analysis. In: Foundations and Trends in Information Retrieval (2008)

    Google Scholar 

  28. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of Empirical Methods in Natural Language Processing, (2002), pp. 79–86

    Google Scholar 

  29. Provine, R., Spencer, R., Mandell, D.: Emotional expression online: emoticons punctuate website text messages. J. Lang. Soc. Psychol. 26(3), 299–307 (2007)

    CrossRef  Google Scholar 

  30. Ptaszynski, M., Maciejewski, J., Dybala, P., Rzepka, R., Araki, K.: CAO: A fully automatic emoticon analysis system based on theory of kinesics. In: Affective Computing, IEEE Transactions (2010)

    Google Scholar 

  31. Purver, M., Battersby, S.: Experimenting with distant supervision for emotion classification. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 482–491. Avignon, France (2012)

    Google Scholar 

  32. Read, J.: Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the ACL Student Research Workshop, pp. 43–48. Ann Arbor, Michigan (2005)

    Google Scholar 

  33. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)

    CrossRef  Google Scholar 

  34. Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast but is it good? Evaluating non-expert annotations for natural language tasks. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP-2008). Honolulu, Hawaii (2008)

    Google Scholar 

  35. Sproat, R., Shih, C.: A statistical method for finding word boundaries in Chinese text. In: Computer Processing of Chinese and Oriental Languages (1990)

    Google Scholar 

  36. Sun, W.: Word-based and characterbased word segmentation models: Comparison and combination. In: Coling 2010: Posters, pp. 1211–1219. Beijing, China (2010)

    Google Scholar 

  37. Sun, X., Zhang, Y., Matsuzaki, T., Tsuruoka, Y., Tsujii, J.: A discriminative latent variable Chinese segmenter with hybrid word/character information. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 56–64. Boulder, Colorado (2009)

    Google Scholar 

  38. Tsai, C.: MMSEG: A Word Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm (2000). http://technology.chtsai.org/mmseg/. Cited 4 Feb 2014

  39. Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter. In: Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing (2005)

    Google Scholar 

  40. Tsutsumi, K., Shimada, K., Endo, T.: Movie review classification based on a multiple classifier. In: Proceedings of the 21st Pacific Asia Conforence on Language, Information and Computation (PACLIC) (2007)

    Google Scholar 

  41. Turney, P.D.: Thumbs Up or Thumbs Down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 417–424. Philadelphia (2002)

    Google Scholar 

  42. Vapnik, V.N.: The Nature of Statistical Learning Theory (1995)

    Google Scholar 

  43. Wu, A.: Customizable segmentation of morphologically derived Words in Chinese. In: Computational Linguistics and Chinese Language (2003)

    Google Scholar 

  44. Xue, N.: Chinese word segmentation as character tagging. In: International Journal of Computational Linguistics and Chinese Language Processing (2003)

    Google Scholar 

  45. Yessenov, K., Misailovic, S.: Sentiment analysis of movie review comments. In: Methodology (2009), pp. 1–17

    Google Scholar 

  46. Yuasa, M., Saito, K., Mukawa, N.: Emoticons convey emotions without cognition of faces: an fMRI study. In: CHI 06 Extended Abstracts on Human Factors in ComputingSystems (2006), pp. 1565–1570

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Yuan .

Editor information

Editors and Affiliations

Appendix

Appendix

56 individuals completed our survey; the detailed results are presented here—see Table 9.

Table 9 Survey results showing the percentage of votes each emotion class received for each label. The best match for the defined labels used in our work are marked in bold

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Yuan, Z., Purver, M. (2015). Predicting Emotion Labels for Chinese Microblog Texts. In: Gaber, M., Cocea, M., Wiratunga, N., Goker, A. (eds) Advances in Social Media Analysis. Studies in Computational Intelligence, vol 602. Springer, Cham. https://doi.org/10.1007/978-3-319-18458-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18458-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18457-9

  • Online ISBN: 978-3-319-18458-6

  • eBook Packages: EngineeringEngineering (R0)