Does Optical Character Recognition and Caption Generation Improve Emotion Detection in Microblog Posts?

  • Roman Klinger
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10260)


Emotion recognition in microblogs like Twitter is the task of assigning an emotion to a post from a predefined set of labels. This is often performed based on the Tweet text. In this paper, we investigate whether information from attached images contributes to this classification task. We use off-the-shelf tools to extract a signal from an image. Firstly, with employ optical character recognition (OCR), to make embedded text accessable, and secondly, we use automatic caption generation to generalize over the content of the depiction. Our experiments show that using the caption only slightly improves performance and only for the emotions fear, anger, disgust and trust. OCR shows a significant impact for joy, love, sadness, fear, and anger.


Emotion classification Social media Caption generation Optical character recognition 


  1. 1.
    Ekman, P.: Basic emotions. In: Dalgleish, T., Power, M. (eds.) Handbook of Cognition and Emotion. Wiley, Sussex (1999)Google Scholar
  2. 2.
    Plutchik, R.: The nature of emotions. Am. Sci. 89(4), 344 (2001)CrossRefGoogle Scholar
  3. 3.
    Alm, C.O., Roth, D., Sproat, R.: Emotions from text: machine learning for text-based emotion prediction. In: HLT-EMNLP (2005)Google Scholar
  4. 4.
    Aman, S., Szpakowicz, S.: Identifying expressions of emotion in text. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 196–205. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-74628-7_27 CrossRefGoogle Scholar
  5. 5.
    Dodds, P.S., Harris, K.D., Kloumann, I.M., Bliss, C.A., Danforth, C.M.: Temporal patterns of happiness and information in a global social network: hedonometrics and twitter. PloS one, vol. 6, no. 12 (2011)Google Scholar
  6. 6.
    Costa, J., Silva, C., Antunes, M., Ribeiro, B.: Concept drift awareness in twitter streams. In: ICMLA (2014)Google Scholar
  7. 7.
    Wang, W., Chen, L., Thirunarayan, K., Sheth, A.P.: Harnessing twitter “big data” for automatic emotion identification. In: SocialCom/PASSAT (2012)Google Scholar
  8. 8.
    @kezia_hunter: Be prepared for a.... Twitter (2017).
  9. 9.
    Bartlett, M.S., Littlewort, G., Lainscsek, C., Fasel, I., Movellan, J.: Machine learning methods for fully automatic recognition of facial expressions and facial actions. In: SMC (2004)Google Scholar
  10. 10.
    Kahou, S.E., Pal, C., Bouthillier, X., Froumenty, P., Gülçehre, C., Memisevic, R., Vincent, P., Courville, A., Bengio, Y., Ferrari, R.C., Mirza, M., Jean, S., Carrier, P.L., Dauphin, Y., Boulanger-Lewandowski, N., Aggarwal, A., Zumer, J., Lamblin, P., Raymond, J.P., Desjardins, G., Pascanu, R., Warde-Farley, D., Torabi, A., Sharma, A., Bengio, E., Côté, M., Konda, K.R., Wu, Z.: Combining modality specific deep neural networks for emotion recognition in video. In: ICMI (2013)Google Scholar
  11. 11.
    Gupta, A., Lamba, H., Kumaraguru, P., Joshi, A.: Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In: WWW (2013)Google Scholar
  12. 12.
    Wang, Y., Wang, S., Tang, J., Liu, H., Li, B.: Unsupervised sentiment analysis for social media images. In: IJCAI (2015)Google Scholar
  13. 13.
    Smith, R.: An overview of the tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition, ICDAR 2007 (2007)Google Scholar
  14. 14.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  15. 15.
    Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR (2015)Google Scholar
  16. 16.
    Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: CVPR (2015)Google Scholar
  17. 17.
    Lin, T., Maire, M., Belongie, S.J., Bourdev, L.D., Girshick, R.B., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. CoRR abs/1405.0312 (2014)Google Scholar
  18. 18.
    Efron, B.: Bootstrap methods: another look at the Jackknife. Ann. Stat. 7(1), 1–26 (1979)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Institut Für Maschinelle SprachverarbeitungUniversität StuttgartStuttgartGermany

Personalised recommendations