Bootstrapping Technique + Embeddings = Emotional Corpus Annotated Automatically

Canales, Lea; Strapparava, Carlo; Boldrini, Ester; Matínez-Barco, Patricio

doi:10.1007/978-3-319-69365-1_9

Lea Canales¹⁶,
Carlo Strapparava¹⁷,
Ester Boldrini¹⁶ &
…
Patricio Matínez-Barco¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10341))

Included in the following conference series:

International Workshop on Future and Emerging Trends in Language Technology

1384 Accesses

Abstract

Detecting depression or personality traits, tutoring and student behaviour systems, or identifying cases of cyber-bulling are a few of the wide range of the applications, in which the automatic detection of emotion is crucial. This task can contribute to the benefit of business, society, politics or education. The main objective of our research is focused on the improvement of the supervised emotion detection systems developed so far, through the definition and implementation of a technique to annotate large scale English emotional corpora automatically and with high standards of reliability. Our proposal is based on a bootstrapping process made up two main steps: the creation of the seed using NRC Emotion Lexicon and its extension employing the distributional semantic similarity through words embeddings. The results obtained are promising and allow us to confirm the soundness of the bootstrapping technique combined with the word embedding to label emotional corpora automatically.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Alm, C.O., Roth, D., Sproat, R.: Emotions from text: machine learning for text-based emotion prediction. In: Proceedings of the Conference on HLT-EMNLP, pp. 579–586 (2005)
Google Scholar
Aman, S., Szpakowicz, S.: Identifying expressions of emotion in text. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS, vol. 4629, pp. 196–205. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74628-7_27
Chapter Google Scholar
Aubur, D., Armantrout, R., Crystal, D., Dirda, M.: Oxford American Writer’s Thesaurus. Oxford University Press, Oxford (2004)
Google Scholar
Boldrini, E., Martínez-Barco, P.: EMOTIBLOG: a model to learn subjetive information detection in the new textual genres of the Web 2.0-multilingual and multi-genre approach. Ph.D. thesis (2012)
Google Scholar
Cherry, C., Mohammad, S.M., De Bruijn, B.: Binary classifiers and latent sequence models for emotion detection in suicide notes. Biomed. Inf. Insights 5(Suppl 1), 147–154 (2012)
Article Google Scholar
Choudhury, M.D., Gamon, M., Counts, S.: Happy, nervous or surprised? Classification of human affective states in social media. In: Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (2012)
Google Scholar
Chowdhury, S., Chowdhury, W.: Performing sentiment analysis in bangla microblog posts. In: International Conference on Informatics, Electronics & Vision (ICIEV). IEEE (2014)
Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measure. 20(1), 37 (1960)
Article Google Scholar
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 100–110 (1999)
Google Scholar
Dadvar, M., Trieschnigg, D., Ordelman, R., de Jong, F.: Improving cyberbullying detection with user context. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 693–696. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36973-5_62
Chapter Google Scholar
Ekman, P.: An argument for basic emotions. Cognit. Emotion 6, 169–200 (1992)
Article Google Scholar
Gliozzo, A., Strapparava, C.: Semantic Domains in Computational Linguistics. Springer, Heidelberg (2009). doi:10.1007/978-3-540-68158-8
Book MATH Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Article Google Scholar
Lee, S., Lee, G.G.: A bootstrapping approach for geographic named entity annotation. In: Myaeng, S.H., Zhou, M., Wong, K.-F., Zhang, H.-J. (eds.) AIRS 2004. LNCS, vol. 3411, pp. 178–189. Springer, Heidelberg (2005). doi:10.1007/978-3-540-31871-2_16
Chapter Google Scholar
Liew, J.S.Y., Turtle, H.R., Liddy, E.D.: EmoTweet-28: a fine-grained emotion corpus for sentiment analysis. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) (2016)
Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60 (2014)
Google Scholar
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Mohammad, S.: #Emotional tweets. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics (2012)
Google Scholar
Mohammad, S.M.: Sentiment analysis: detecting valence, emotions, and other affectual states from text. In: Emotion Measurement (2015)
Google Scholar
Mohammad, S.M., Turney, P.D.: Crowdsourcing a word-emotion association lexicon. Comput. Lang. 29(3), 436–465 (2013)
MathSciNet Google Scholar
Montero, C.S., Suhonen, J.: Emotion analysis meets learning analytics: online learner profiling beyond numerical data. In: Proceedings of the 14th Koli Calling International Conference on Computing Education Research, pp. 165–169 (2014)
Google Scholar
Neviarouskaya, A., Prendinger, H., Ishizuka, M.: Compositionality principle in recognition of fine-grained emotions from text. In: Proceedings of the Third International ICWSM Conference, pp. 278–281 (2009)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP) (2014)
Google Scholar
Platt, J.: Using analytic QP and sparseness to speed training of support vector machines. In: Proceedings of Advances in Neural Information Processing Systems, pp. 557–563 (1999)
Google Scholar
Plutchik, R.: A general psycho evolutionary theory of emotion. In: Theories of Emotion, pp. 3–33 (1980)
Google Scholar
Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Lucas, R.E., Agrawal, M., Park, G.J., Lakshmikanth, S.K., Jha, S., Seligman, M.E.P., Ungar, L.: Characterizing geographic variation in well-being using tweets. In: Proceedings of the International AAAI Conference on Weblogs and Social Media (2013)
Google Scholar
Strapparava, C., Mihalcea, R.: Semeval-2007 task 14: affective text. In: Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 70–74 (2007)
Google Scholar
Wang, W., Chen, L., Thirunarayan, K., Sheth, A.P.: Harnessing twitter “big data” for automatic emotion identification. In: International Confernece on Social Computing (SocialCom) (2012)
Google Scholar
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics (ACL 1995), pp. 189–196. Association for Computational Linguistics, Stroudsburg, PA, USA (1995)
Google Scholar

Download references

Acknowledgment

This research has been supported by the FPI grant (BES-2013-065950) and the research stay grant (EEBB-I-15-10108) from the Spanish Ministry of Science and Innovation. It has also funded by the Spanish Government (DIGITY ref. TIN2015-65136-C02-2-R) and the Valencian Government (grant no. PROMETEOII/2014/001).

Author information

Authors and Affiliations

University of Alicante, Alicante, Spain
Lea Canales, Ester Boldrini & Patricio Matínez-Barco
Fondazione Bruno Kessler, Trento, Italy
Carlo Strapparava

Authors

Lea Canales
View author publications
You can also search for this author in PubMed Google Scholar
Carlo Strapparava
View author publications
You can also search for this author in PubMed Google Scholar
Ester Boldrini
View author publications
You can also search for this author in PubMed Google Scholar
Patricio Matínez-Barco
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lea Canales .

Editor information

Editors and Affiliations

University of Seville , Sevilla, Spain
José F Quesada
University of Seville , Seville, Spain
Francisco-Jesús Martín Mateos
University of Seville , Seville, Spain
Teresa López Soto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Canales, L., Strapparava, C., Boldrini, E., Matínez-Barco, P. (2017). Bootstrapping Technique + Embeddings = Emotional Corpus Annotated Automatically. In: Quesada, J., Martín Mateos , FJ., López Soto, T. (eds) Future and Emerging Trends in Language Technology. Machine Learning and Big Data. FETLT 2016. Lecture Notes in Computer Science(), vol 10341. Springer, Cham. https://doi.org/10.1007/978-3-319-69365-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-69365-1_9
Published: 29 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69364-4
Online ISBN: 978-3-319-69365-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics