Bootstrapping Technique + Embeddings = Emotional Corpus Annotated Automatically

  • Lea Canales
  • Carlo Strapparava
  • Ester Boldrini
  • Patricio Matínez-Barco
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10341)


Detecting depression or personality traits, tutoring and student behaviour systems, or identifying cases of cyber-bulling are a few of the wide range of the applications, in which the automatic detection of emotion is crucial. This task can contribute to the benefit of business, society, politics or education. The main objective of our research is focused on the improvement of the supervised emotion detection systems developed so far, through the definition and implementation of a technique to annotate large scale English emotional corpora automatically and with high standards of reliability. Our proposal is based on a bootstrapping process made up two main steps: the creation of the seed using NRC Emotion Lexicon and its extension employing the distributional semantic similarity through words embeddings. The results obtained are promising and allow us to confirm the soundness of the bootstrapping technique combined with the word embedding to label emotional corpora automatically.


Sentiment analysis Emotion detection Emotional corpus Bootstrapping Word embedding 



This research has been supported by the FPI grant (BES-2013-065950) and the research stay grant (EEBB-I-15-10108) from the Spanish Ministry of Science and Innovation. It has also funded by the Spanish Government (DIGITY ref. TIN2015-65136-C02-2-R) and the Valencian Government (grant no. PROMETEOII/2014/001).


  1. 1.
    Alm, C.O., Roth, D., Sproat, R.: Emotions from text: machine learning for text-based emotion prediction. In: Proceedings of the Conference on HLT-EMNLP, pp. 579–586 (2005)Google Scholar
  2. 2.
    Aman, S., Szpakowicz, S.: Identifying expressions of emotion in text. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS, vol. 4629, pp. 196–205. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-74628-7_27 CrossRefGoogle Scholar
  3. 3.
    Aubur, D., Armantrout, R., Crystal, D., Dirda, M.: Oxford American Writer’s Thesaurus. Oxford University Press, Oxford (2004)Google Scholar
  4. 4.
    Boldrini, E., Martínez-Barco, P.: EMOTIBLOG: a model to learn subjetive information detection in the new textual genres of the Web 2.0-multilingual and multi-genre approach. Ph.D. thesis (2012)Google Scholar
  5. 5.
    Cherry, C., Mohammad, S.M., De Bruijn, B.: Binary classifiers and latent sequence models for emotion detection in suicide notes. Biomed. Inf. Insights 5(Suppl 1), 147–154 (2012)CrossRefGoogle Scholar
  6. 6.
    Choudhury, M.D., Gamon, M., Counts, S.: Happy, nervous or surprised? Classification of human affective states in social media. In: Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (2012)Google Scholar
  7. 7.
    Chowdhury, S., Chowdhury, W.: Performing sentiment analysis in bangla microblog posts. In: International Conference on Informatics, Electronics & Vision (ICIEV). IEEE (2014)Google Scholar
  8. 8.
    Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measure. 20(1), 37 (1960)CrossRefGoogle Scholar
  9. 9.
    Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 100–110 (1999)Google Scholar
  10. 10.
    Dadvar, M., Trieschnigg, D., Ordelman, R., de Jong, F.: Improving cyberbullying detection with user context. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 693–696. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-36973-5_62 CrossRefGoogle Scholar
  11. 11.
    Ekman, P.: An argument for basic emotions. Cognit. Emotion 6, 169–200 (1992)CrossRefGoogle Scholar
  12. 12.
    Gliozzo, A., Strapparava, C.: Semantic Domains in Computational Linguistics. Springer, Heidelberg (2009). doi: 10.1007/978-3-540-68158-8 CrossRefzbMATHGoogle Scholar
  13. 13.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  14. 14.
    Lee, S., Lee, G.G.: A bootstrapping approach for geographic named entity annotation. In: Myaeng, S.H., Zhou, M., Wong, K.-F., Zhang, H.-J. (eds.) AIRS 2004. LNCS, vol. 3411, pp. 178–189. Springer, Heidelberg (2005). doi: 10.1007/978-3-540-31871-2_16 CrossRefGoogle Scholar
  15. 15.
    Liew, J.S.Y., Turtle, H.R., Liddy, E.D.: EmoTweet-28: a fine-grained emotion corpus for sentiment analysis. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) (2016)Google Scholar
  16. 16.
    Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60 (2014)Google Scholar
  17. 17.
    Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  18. 18.
    Mohammad, S.: #Emotional tweets. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics (2012)Google Scholar
  19. 19.
    Mohammad, S.M.: Sentiment analysis: detecting valence, emotions, and other affectual states from text. In: Emotion Measurement (2015)Google Scholar
  20. 20.
    Mohammad, S.M., Turney, P.D.: Crowdsourcing a word-emotion association lexicon. Comput. Lang. 29(3), 436–465 (2013)MathSciNetGoogle Scholar
  21. 21.
    Montero, C.S., Suhonen, J.: Emotion analysis meets learning analytics: online learner profiling beyond numerical data. In: Proceedings of the 14th Koli Calling International Conference on Computing Education Research, pp. 165–169 (2014)Google Scholar
  22. 22.
    Neviarouskaya, A., Prendinger, H., Ishizuka, M.: Compositionality principle in recognition of fine-grained emotions from text. In: Proceedings of the Third International ICWSM Conference, pp. 278–281 (2009)Google Scholar
  23. 23.
    Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP) (2014)Google Scholar
  24. 24.
    Platt, J.: Using analytic QP and sparseness to speed training of support vector machines. In: Proceedings of Advances in Neural Information Processing Systems, pp. 557–563 (1999)Google Scholar
  25. 25.
    Plutchik, R.: A general psycho evolutionary theory of emotion. In: Theories of Emotion, pp. 3–33 (1980)Google Scholar
  26. 26.
    Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Lucas, R.E., Agrawal, M., Park, G.J., Lakshmikanth, S.K., Jha, S., Seligman, M.E.P., Ungar, L.: Characterizing geographic variation in well-being using tweets. In: Proceedings of the International AAAI Conference on Weblogs and Social Media (2013)Google Scholar
  27. 27.
    Strapparava, C., Mihalcea, R.: Semeval-2007 task 14: affective text. In: Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 70–74 (2007)Google Scholar
  28. 28.
    Wang, W., Chen, L., Thirunarayan, K., Sheth, A.P.: Harnessing twitter “big data” for automatic emotion identification. In: International Confernece on Social Computing (SocialCom) (2012)Google Scholar
  29. 29.
    Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics (ACL 1995), pp. 189–196. Association for Computational Linguistics, Stroudsburg, PA, USA (1995)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Lea Canales
    • 1
  • Carlo Strapparava
    • 2
  • Ester Boldrini
    • 1
  • Patricio Matínez-Barco
    • 1
  1. 1.University of AlicanteAlicanteSpain
  2. 2.Fondazione Bruno KesslerTrentoItaly

Personalised recommendations