Advertisement

Detection of Sarcasm and Nastiness: New Resources for Spanish Language

  • Raquel Justo
  • José M. Alcaide
  • M. Inés Torres
  • Marilyn Walker
Article
  • 67 Downloads

Abstract

The main goal of this work is to provide the cognitive computing community with valuable resources to analyze and simulate the intentionality and/or emotions embedded in the language employed in social media. Specifically, it is focused on the Spanish language and online dialogues, leading to the creation of Sofoco (Spanish Online Forums Corpus). It is the first Spanish corpus consisting of dialogic debates extracted from social media and it is annotated by means of crowdsourcing in order to carry out automatic analysis of subjective language forms, like sarcasm or nastiness. Furthermore, the annotators were also asked about the context need when taking a decision. In this way, the users’ intentions and their behavior inside social networks can be better understood and more accurate text analysis is possible. An analysis of the annotation results is carried out and the reliability of the annotations is also explored. Additionally, sarcasm and nastiness detection results (around 0.76 F-Measure in both cases) are also reported. The obtained results show the presented corpus as a valuable resource that might be used in very diverse future work.

Keywords

Online dialogues Figurative language Spanish resources Sarcasm Nastiness 

Notes

Funding

This study was partially funded by the Spanish Government (TIN2014-54288-C4-4-R and TIN2017-85854-C4-3-R) by the European Unions’s H2020 program under grant 769872 and by the National Science Foundation of USA (NSF CISE R1 #1202668).

Compliance with Ethical Standards

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

References

  1. 1.
    Baranyi P, Csapó A. Definition and synergies of cognitive infocommunications. Acta Pytechnica Hungarica 2012;9(1):67–83.Google Scholar
  2. 2.
    Croft W, Cruse DA. 2004. Cognitive linguistics. Cambridge textbooks in linguistics. Cambridge University Press.Google Scholar
  3. 3.
    Becker-Asano C, Wachsmuth I. Affective computing with primary and secondary emotions in a virtual human. Autonom Agents Multi-Agent Syst 2010;20(1):32–49.CrossRefGoogle Scholar
  4. 4.
    Esposito A. The perceptual and cognitive role of visual and auditory channels in conveying emotional information. Cogn Comput 2009;1:268–278.CrossRefGoogle Scholar
  5. 5.
    Recupero DR, Presutti V, Consoli S, Gangemi A, Nuzzolese AG. Sentilo: frame-based sentiment analysis. Cogn Comput 2015;7:211–225.CrossRefGoogle Scholar
  6. 6.
    Vogel C. Denoting offence. Cogn Comput 2014;6:628–639.CrossRefGoogle Scholar
  7. 7.
    Hawalah A. 2017. A framework for building an arabic multi-disciplinary ontology from multiple resources. Cogn Comput.Google Scholar
  8. 8.
    Maynard D, Greenwood MA, et al. Who cares about sarcastic tweets? Investigating the impact of sarcasm on sentiment analysis. Proceedings of the ninth international conference on language resources and Evaluation (LREC-2014). In: Calzolari N, Choukri K, Declerck T, Loftsson H, Maegaard B, and Mariani J, editors. Reykjavik: European Language Resources Association (ELRA); 2014. p. 4238–4243.Google Scholar
  9. 9.
    Kruger J, Epley N, Parker J, Ng ZW. Egocentrism over e-mail: can we communicate as well as we think? J Personal Soc Psychol 2005;89(6):925–936.CrossRefGoogle Scholar
  10. 10.
    Alcaide JM, Justo R, Torres MI. Combining statistical and semantic knowledge for sarcasm detection in online dialogues. Pattern recognition and image analysis. Vol. 9117 of lecture notes in computer science. In: Paredes R, Cardoso JS, and Pardo XM, editors. Springer International Publishing; 2015. p. 662–671.Google Scholar
  11. 11.
    Khodak M, Saunshi N, Vodrahalli K. 2008. A large self-annotated corpus for sarcasm. In: Proceedings of the language resources and evaluation conference (LREC). Miyazaki, Japan.Google Scholar
  12. 12.
    Ruiz Gurillo L, Padilla García X A, (eds). 2009. Dime cómo ironizas y te diré quién eres. Una aproximación pragmática a la ironía. vol. 45 of Studien zur romanischen Sprachwissenschaft und interkulturellen Kommunikation. Frankfurt am Main: Peter Lang Internationaler Verlag der Wissenschften.Google Scholar
  13. 13.
    Hernández-Farías DI, Benedí J, Rosso P. Applying basic features from sentiment analysis for automatic irony detection. Pattern recognition and image analysis. Vol. 9117 of lecture notes in computer science. In: Paredes R, Cardoso JS, and Pardo X M, editors. Springer International Publishing; 2015. p. 337–344.Google Scholar
  14. 14.
    Lukin S, Walker M. Really? Well. Apparently bootstrapping improves the performance of sarcasm and nastiness classifiers for online dialogue. Proceedings of the workshop on language analysis in social media. Atlanta: Association for Computational Linguistics; 2013. p. 30–40.Google Scholar
  15. 15.
    Justo R, Corcoran T, Lukin SM, Walker M, Torres MI. Extracting relevant knowledge for the detection of sarcasm and nastiness in the social web. Knowl-Based Syst 2014;69:124–133.CrossRefGoogle Scholar
  16. 16.
    Hernández-Farías DI, Sulis E, Patti V, Ruffo G, Bosco C. ValenTo: sentiment analysis of figurative language tweets with irony and sarcasm. Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015). Denver: Association for Computational Linguistics; 2015. p. 694–698.Google Scholar
  17. 17.
    Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AYA, Gelbukh A, et al. Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cogn Comput 2016;8:757–771.CrossRefGoogle Scholar
  18. 18.
    Swanson R, Lukin S, Eisenberg L, Corcoran T, Walker M. Getting reliable annotations for sarcasm in online dialogues. Proceedings of the ninth international conference on language resources and evaluation (LREC’14). In: Chair NCC, Choukri K, Declerck T, Loftsson H, Maegaard B, Mariani J, et al., editors. Reykjavik: European Language Resources Association (ELRA); 2014. p. 4250–425–7.Google Scholar
  19. 19.
    Reyes A, Rosso P. On the difficulty of automatically detecting irony: beyond a simple case of negation. Knowl Inform Syst 2014;40(3):595–614.CrossRefGoogle Scholar
  20. 20.
    Filatova E. Irony and sarcasm: corpus generation and analysis using crowdsourcing. In: Language resources and evaluation conference, LREC2012; 2012. p. 392–398.Google Scholar
  21. 21.
    Walker M, Anand P, Fox-Tree JE, Abbot R, King J. 2012. A corpus for research on deliberation and debate. In: Proceedings of the eighth international conference on language resources and evaluation, LREC 2012; 2012. p. 23–25.Google Scholar
  22. 22.
    Martí J C, Casanova I. La traducció cultural: el concepte d’ironia en francés, anglés, espanyol i catalá. La traducció del discurs. In: Martos J L, editors. Universitat d’Alacant; 2009. p. 120–152.Google Scholar
  23. 23.
    Wang PYA. # Irony or# sarcasm quantitative and qualitative study based on twitter. Proceedings of the PACLIC: the 27th Pacific Asia conference on language, information, and computation. Taipei; 2013. p. 349–356.Google Scholar
  24. 24.
    Alvarado Ortega MB. Los indicadores lingüísticos de la ironí en corpus escritos. Interlingüística 2009; 18:91–97.Google Scholar
  25. 25.
    Riloff E, Qadir A, Surve P, De Silva L, Gilbert N, Huang R. Sarcasm as contrast between a positive sentiment and negative situation. Proceedings of the 2013 conference on empirical methods in natural language processing. Seattle: Association for Computational Linguistics; 2013. p. 704–714.Google Scholar
  26. 26.
    Bosco C, Patti V, Bolioli A. Developing corpora for sentiment analysis: the case of irony and senti-TUT. IEEE Intell Syst 2013;28(2):55–63.CrossRefGoogle Scholar
  27. 27.
    Joshi A, Bhattacharyya P, Carman MJ. Automatic sarcasm detection: a survey. ACM Comput Surv 2017;50(5):73:1–73:22.  https://doi.org/10.1145/3124420.CrossRefGoogle Scholar
  28. 28.
    Ding X, Liu B, Yu PS. Holistic lexicon-based approach to opinion mining. Proceedings of the 2008 international conference on web search and data mining. WSDM ’08. New York: ACM; 2008. p. 231–240.Google Scholar
  29. 29.
    Thelwall M, Buckley K, Paltoglou G. Sentiment strength detection for the social web. J Am Soc Inf Sci Technol 2012;63(1):163–173. Available from:  https://doi.org/10.1002/asi.21662.CrossRefGoogle Scholar
  30. 30.
    Turney PD. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual meeting on association for computational linguistics. ACL ’02. Stroudsburg: Association for Computational Linguistics; 2002. p. 417–424.Google Scholar
  31. 31.
    Valitutti R. WordNet-Affect: an affective extension of WordNet. In: Proceedings of the 4th international conference on language resources and evaluation; 2004. p. 1083–1086.Google Scholar
  32. 32.
    Cambria E, Poria S, Hazarika D, Kwok K. SenticNet 5: discovering conceptual primitives for sentiment analysis by means of context embeddings. Proceedings of the thirty-second AAAI conference on artificial intelligence; 2018. p. 1795–1802.Google Scholar
  33. 33.
    Baccianella S, Esuli A, Sebastiani F. SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. Proceedings of the seventh international conference on language resources and evaluation (LREC’10). In: Chair NCC, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, et al., editors. Valletta: European Language Resources Association (ELRA); 2010. p. 2200–2204.Google Scholar
  34. 34.
    de Albornoz JC, Plaza L, Gervás P. SentiSense: an easily scalable concept-based affective lexicon for sentiment analysis. Proceedings of the eight international conference on language resources and evaluation (LREC’12). In: Chair NCC, Choukri K, Declerck T, Doan MU, Maegaard B, Mariani J, et al, editors. Istanbul: European Language Resources Association (ELRA); 2012. p. 3562–3567.Google Scholar
  35. 35.
    Atserias J, Villarejo L, Rigau G. Spanish WordNet 1.6: porting the Spanish Wordnet across Princeton versions. Proceedings of the fourth international conference on language resources and evaluation, LREC 2004, May 26-28, 2004. Lisbon: European Language Resources Association; 2004. p. 161–164.Google Scholar
  36. 36.
    Vossen P, (ed). 1998. EuroWordNet: a multilingual database with lexical semantic networks. Norwell: Kluwer Academic Publishers.Google Scholar
  37. 37.
    Díaz-Granjel I, Sidorov G, Suárez-Guerra S. Creación y evaluación de un diccionario marcado con emociones y ponderado para el español. Onomázein 2014;29:31–46.CrossRefGoogle Scholar
  38. 38.
    Gómez-Adorno H, Markov I, Sidorov G, Posadas-Durȧn JP, Arias CF. Compilaciȯn de un lexicȯn de redes sociales para la identificaciȯn de perfiles de autor. Res Comput Sci 2016;115:19–27.Google Scholar
  39. 39.
    Gómez-Adorno H, Markov I, Sidorov G, Posadas-Durán JP, Sanchez-Perez MA, Chanona-Hernández L. Improving feature representation based on a neural network for author profiling in social media texts. Comp Int and Neurosc 2016;2016:1638936:1–1638936:13.Google Scholar
  40. 40.
    Montejo-Ráez A, Díaz-Galiano MC, Ortega JMP, Lȯpez LAU, et al. Spanish knowledge base generation for polarity classification from masses. 22nd International World Wide Web conference, WWW ’13, Rio de Janeiro, Brazil, May 13-17, 2013, companion volume. International World Wide Web Conferences Steering Committee. In: Carr L, Laender AHF, Lȯscio BF, King I, Fontoura M, and Vrandecic D, editors. ACM; 2013. p. 571–578.Google Scholar
  41. 41.
    Montejo-Ráez A, Díaz-Galiano M C, Martínez-Santiago F, Ureña-López LA. Crowd explicit sentiment analysis. Knowl-Based Syst 2014;69:134–139.CrossRefGoogle Scholar
  42. 42.
    Kamvar SD, Harris J. We feel fine and searching the emotional web. Proceedings of the Fourth ACM international conference on web search and data mining. WSDM ’11. New York: ACM; 2011. p. 117–126. Available from:  https://doi.org/10.1145/1935826.1935854.
  43. 43.
    Pennebaker JW, Chung CK, Ireland M, Gonzales A, Booth RJ. 2007. The development and psychometric properties of LIWC2007. Austin, TX, LIWC Net.Google Scholar
  44. 44.
    Pang B, Lee L, Vaithyanathan S. Thumbs up?: sentiment classification using machine learning techniques. Proceedings of the ACL-02 conference on empirical methods in natural language processing - volume 10. EMNLP ’02. Stroudsburg: Association for Computational Linguistics; 2002. p. 79–86.Google Scholar
  45. 45.
    Dave K, Lawrence S, Pennock DM. Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: WWW2003; 2003. p. 519–528.Google Scholar
  46. 46.
    Mcdonald R, Hannan K, Neylon T, Wells M, Reynar J. Structured models for fine-to-coarse sentiment analysis. In: Proceedings of the 45th annual meeting of the association of computational linguistics; 2007. p. 432–439.Google Scholar
  47. 47.
    Tsur O, Davidov D, Rappoport A. ICWSM—a great catchy name: semi-supervised recognition of sarcastic sentences in online product reviews. In: Proceedings of the fourth international AAAI conference on weblogs and social media; 2010. p. 162–169.Google Scholar
  48. 48.
    Taboada M, Grieve J. Analyzing appraisal automatically. In: Inproceedings of the AAAI spring symposium on exploring attitude and affect in text: theories and applications; 2004. p. 158–161.Google Scholar
  49. 49.
    Cruz FL, Troyano JA, Enriquez F, Ortega J. Experiments in sentiment classification of movie reviews in Spanish. Procesamiento del lenguaje Natural (Sociedad Española para el Procesamiento del Lenguaje Natural). 2012, 41.Google Scholar
  50. 50.
    Martín-Valdivia MT, Martínez-Cámara E, Perea-Ortega JM, na López LAU. Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches. Expert Syst Appl 2013;40 (10):3934–3942.CrossRefGoogle Scholar
  51. 51.
    Vilares D, Alonso MA, Gȯmez-Rodríguez C. A syntactic approach for opinion mining on Spanish reviews. Nat Lang Eng 2015;21(1):139–163.CrossRefGoogle Scholar
  52. 52.
    Vicente IS, Agerri R, Rigau G. Simple, robust and (almost) unsupervised generation of polarity lexicons for multiple languages. Proceedings of the 14th conference of the European chapter of the association for computational linguistics, EACL 2014, April 26-30, 2014. In: Bouma G and Parmentier Y, editors. Gothenburg: The Association for Computer Linguistics; 2014 . p. 88–97.Google Scholar
  53. 53.
    Davidov D, Tsur O, Rappoport A. Enhanced sentiment learning using Twitter hashtags and smileys. Proceedings of the 23rd international conference on computational linguistics: posters. COLING ’10. Stroudsburg: Association for Computational Linguistics; 2010. p. 241–249.Google Scholar
  54. 54.
    Jiang L, Yu M, Zhou M, Liu X, Zhao T. Target-dependent Twitter sentiment classification. Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies - volume 1. HLT ’11. Stroudsburg: Association for Computational Linguistics; 2011. p. 151–160.Google Scholar
  55. 55.
    Montejo-Ráez A, Martínez-Cámara E, Martín-Valdivia MT, Ureña López LA. Ranked Word Net graph for sentiment polarity classification in Twitter. Comput Speech Lang 2014;28(1):93– 107.CrossRefGoogle Scholar
  56. 56.
    González-Ibáñez R, Muresan S, Wacholder N. Identifying sarcasm in Twitter: a closer look. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers. vol. 2. Citeseer; 2011. p. 581–586.Google Scholar
  57. 57.
    Reyes A, Rosso P, Buscaldi D. From humor recognition to irony detection. Figurative Lang Soc Med Data Knowl Eng 2012;74:1–12.CrossRefGoogle Scholar
  58. 58.
    Martínez-Cámara E, García-Cumbreras MA, Martín-Valdivia MT, Ureña López LA. Detecting polarity in Spanish Tweets. Procesamiento del Lenguaje Natural (SEPLN); 2011. 47.Google Scholar
  59. 59.
    Jasso G, Meza-Ruíz IV. Character and word baselines systems for irony detection in Spanish short texts. Procesamiento del Lenguaje Natural 2016;56:41–48. Available from: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/5285.Google Scholar
  60. 60.
    Barbieri F, Ronzano F, Saggion H. Is this Tweet satirical? A computational approach for satire detection in Spanish. Procesamiento del Lenguaje Natural 2015;55:135–142. Available from: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/5225.Google Scholar
  61. 61.
    Barbieri F, Ronzano F, Saggion H. Do we criticise (and laugh) in the same way? Automatic detection of multi-lingual satirical news in Twitter. Proceedings of the twenty-fourth international joint conference on artificial intelligence, IJCAI 2015. In: Yang Q and Wooldridge M, editors. Buenos Aires: AAAI Press; 2015. p. 1215–1221. Available from: http://ijcai.org/Abstract/15/175.
  62. 62.
    Cumbreras MÁG, Villena-Román J, Cámara EM, Díaz-Galiano MC, Martín-Valdivia MT, López LAU. Overview of TASS 2016. Proceedings of TASS 2016: workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN Conference (SEPLN 2016), Salamanca, Spain, September 13th, 2016.. vol. 1702 of CEUR Workshop Proceedings. CEUR-WS.org. In: Villena-Román J, Cumbreras MÁG, Cámara EM, Díaz-Galiano MC, Martín-Valdivia MT, and López LAU, editors. Salamanca; 2016. p. 13–21. Available from: http://ceur-ws.org/Vol-1702/tass2016_proceedings_v24.pdf.
  63. 63.
    Misra A, Walker M. Topic independent identification of agreement and disagreement in social media dialogue. Proceedings of the SIGDIAL 2013 conference. Metz: Association for Computational Linguistics; 2013. p. 41–50.Google Scholar
  64. 64.
    Misra A, Anand P, Fox Tree JE, Walker M. Using summarization to discover argument facets in online idealogical dialog. Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies. Denver: Association for Computational Linguistics; 2015. p. 430–440.Google Scholar
  65. 65.
    Walker M, Anand P, Abbott R, Grant R. Stance classification using dialogic properties of persuasion. Proceedings of the 2012 Conference of the North American chapter of the association for computational linguistics: human language technologies. NAACL HLT ’12. Stroudsburg: Association for Computational Linguistics; 2012. p. 592–596.Google Scholar
  66. 66.
    De Winter JCF, Kyriakidis M, Dodou D, Happee R. Using CrowdFlower to study the relationship between self-reported violations and traffic accidents. In: 6th International conference on applied human factors and ergonomics (AHFE 2015) and the affiliated conferences, {AHFE} 2015. vol. 3; 2015. p. 2518–2525.Google Scholar
  67. 67.
    Justo R, Torres MI, M AJ. Measuring the quality of annotations for a subjective crowdsourcing task. In: Proceedings of 8th Iberian conference on pattern recognition and image analysis (in press). International Association for Pattern Recognition (IAPR); 2017.Google Scholar
  68. 68.
    Buchholz S, Latorre J, Yanagisawa K. In: Crowdsourced assessment of speech synthesis. Wiley; 2013. p. 173–216.Google Scholar
  69. 69.
    Padró L, Stanilovsky E. FreeLing 3.0: towards wider multilinguality. Proceedings of the language resources and evaluation conference (LREC 2012). Istanbul: ELRA; 2012. p. 2473– 2479.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Raquel Justo
    • 1
  • José M. Alcaide
    • 1
  • M. Inés Torres
    • 1
  • Marilyn Walker
    • 2
  1. 1.Universidad del País Vasco, (UPV/EHU)LeioaSpain
  2. 2.University of California, Santa CruzSanta CruzUSA

Personalised recommendations