Skip to main content
Log in

Detection of Sarcasm and Nastiness: New Resources for Spanish Language

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

The main goal of this work is to provide the cognitive computing community with valuable resources to analyze and simulate the intentionality and/or emotions embedded in the language employed in social media. Specifically, it is focused on the Spanish language and online dialogues, leading to the creation of Sofoco (Spanish Online Forums Corpus). It is the first Spanish corpus consisting of dialogic debates extracted from social media and it is annotated by means of crowdsourcing in order to carry out automatic analysis of subjective language forms, like sarcasm or nastiness. Furthermore, the annotators were also asked about the context need when taking a decision. In this way, the users’ intentions and their behavior inside social networks can be better understood and more accurate text analysis is possible. An analysis of the annotation results is carried out and the reliability of the annotations is also explored. Additionally, sarcasm and nastiness detection results (around 0.76 F-Measure in both cases) are also reported. The obtained results show the presented corpus as a valuable resource that might be used in very diverse future work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://www.reddit.com

  2. http://dtminredis.housing.salle.url.edu:8088/EmoLib/en/about.html

  3. https://www.sfu.ca/~mtaboada/SFU_Review_Corpus.html

  4. http://clic.ub.edu/corpus/hopinion

  5. https://developer.twitter.com/en/docs/tweets/search/overview

  6. www.meneame.net

  7. http://sphinxsearch.com/docs/latest/extended-syntax.html

  8. http://cz.efaber.net

  9. https://www.crowdflower.com/crowdflower-now-offering-twelve-language-skill-groups/ https://www.crowdflower.com/crowdflower-now-offering-twelve-language-skill-groups/

  10. raquel.justo@ehu.eus

References

  1. Baranyi P, Csapó A. Definition and synergies of cognitive infocommunications. Acta Pytechnica Hungarica 2012;9(1):67–83.

    Google Scholar 

  2. Croft W, Cruse DA. 2004. Cognitive linguistics. Cambridge textbooks in linguistics. Cambridge University Press.

  3. Becker-Asano C, Wachsmuth I. Affective computing with primary and secondary emotions in a virtual human. Autonom Agents Multi-Agent Syst 2010;20(1):32–49.

    Article  Google Scholar 

  4. Esposito A. The perceptual and cognitive role of visual and auditory channels in conveying emotional information. Cogn Comput 2009;1:268–278.

    Article  Google Scholar 

  5. Recupero DR, Presutti V, Consoli S, Gangemi A, Nuzzolese AG. Sentilo: frame-based sentiment analysis. Cogn Comput 2015;7:211–225.

    Article  Google Scholar 

  6. Vogel C. Denoting offence. Cogn Comput 2014;6:628–639.

    Article  Google Scholar 

  7. Hawalah A. 2017. A framework for building an arabic multi-disciplinary ontology from multiple resources. Cogn Comput.

  8. Maynard D, Greenwood MA, et al. Who cares about sarcastic tweets? Investigating the impact of sarcasm on sentiment analysis. Proceedings of the ninth international conference on language resources and Evaluation (LREC-2014). In: Calzolari N, Choukri K, Declerck T, Loftsson H, Maegaard B, and Mariani J, editors. Reykjavik: European Language Resources Association (ELRA); 2014. p. 4238–4243.

  9. Kruger J, Epley N, Parker J, Ng ZW. Egocentrism over e-mail: can we communicate as well as we think? J Personal Soc Psychol 2005;89(6):925–936.

    Article  Google Scholar 

  10. Alcaide JM, Justo R, Torres MI. Combining statistical and semantic knowledge for sarcasm detection in online dialogues. Pattern recognition and image analysis. Vol. 9117 of lecture notes in computer science. In: Paredes R, Cardoso JS, and Pardo XM, editors. Springer International Publishing; 2015. p. 662–671.

  11. Khodak M, Saunshi N, Vodrahalli K. 2008. A large self-annotated corpus for sarcasm. In: Proceedings of the language resources and evaluation conference (LREC). Miyazaki, Japan.

  12. Ruiz Gurillo L, Padilla García X A, (eds). 2009. Dime cómo ironizas y te diré quién eres. Una aproximación pragmática a la ironía. vol. 45 of Studien zur romanischen Sprachwissenschaft und interkulturellen Kommunikation. Frankfurt am Main: Peter Lang Internationaler Verlag der Wissenschften.

  13. Hernández-Farías DI, Benedí J, Rosso P. Applying basic features from sentiment analysis for automatic irony detection. Pattern recognition and image analysis. Vol. 9117 of lecture notes in computer science. In: Paredes R, Cardoso JS, and Pardo X M, editors. Springer International Publishing; 2015. p. 337–344.

  14. Lukin S, Walker M. Really? Well. Apparently bootstrapping improves the performance of sarcasm and nastiness classifiers for online dialogue. Proceedings of the workshop on language analysis in social media. Atlanta: Association for Computational Linguistics; 2013. p. 30–40.

  15. Justo R, Corcoran T, Lukin SM, Walker M, Torres MI. Extracting relevant knowledge for the detection of sarcasm and nastiness in the social web. Knowl-Based Syst 2014;69:124–133.

    Article  Google Scholar 

  16. Hernández-Farías DI, Sulis E, Patti V, Ruffo G, Bosco C. ValenTo: sentiment analysis of figurative language tweets with irony and sarcasm. Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015). Denver: Association for Computational Linguistics; 2015. p. 694–698.

  17. Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AYA, Gelbukh A, et al. Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cogn Comput 2016;8:757–771.

    Article  Google Scholar 

  18. Swanson R, Lukin S, Eisenberg L, Corcoran T, Walker M. Getting reliable annotations for sarcasm in online dialogues. Proceedings of the ninth international conference on language resources and evaluation (LREC’14). In: Chair NCC, Choukri K, Declerck T, Loftsson H, Maegaard B, Mariani J, et al., editors. Reykjavik: European Language Resources Association (ELRA); 2014. p. 4250–425–7.

  19. Reyes A, Rosso P. On the difficulty of automatically detecting irony: beyond a simple case of negation. Knowl Inform Syst 2014;40(3):595–614.

    Article  Google Scholar 

  20. Filatova E. Irony and sarcasm: corpus generation and analysis using crowdsourcing. In: Language resources and evaluation conference, LREC2012; 2012. p. 392–398.

  21. Walker M, Anand P, Fox-Tree JE, Abbot R, King J. 2012. A corpus for research on deliberation and debate. In: Proceedings of the eighth international conference on language resources and evaluation, LREC 2012; 2012. p. 23–25.

  22. Martí J C, Casanova I. La traducció cultural: el concepte d’ironia en francés, anglés, espanyol i catalá. La traducció del discurs. In: Martos J L, editors. Universitat d’Alacant; 2009. p. 120–152.

  23. Wang PYA. # Irony or# sarcasm quantitative and qualitative study based on twitter. Proceedings of the PACLIC: the 27th Pacific Asia conference on language, information, and computation. Taipei; 2013. p. 349–356.

  24. Alvarado Ortega MB. Los indicadores lingüísticos de la ironí en corpus escritos. Interlingüística 2009; 18:91–97.

    Google Scholar 

  25. Riloff E, Qadir A, Surve P, De Silva L, Gilbert N, Huang R. Sarcasm as contrast between a positive sentiment and negative situation. Proceedings of the 2013 conference on empirical methods in natural language processing. Seattle: Association for Computational Linguistics; 2013. p. 704–714.

  26. Bosco C, Patti V, Bolioli A. Developing corpora for sentiment analysis: the case of irony and senti-TUT. IEEE Intell Syst 2013;28(2):55–63.

    Article  Google Scholar 

  27. Joshi A, Bhattacharyya P, Carman MJ. Automatic sarcasm detection: a survey. ACM Comput Surv 2017;50(5):73:1–73:22. https://doi.org/10.1145/3124420.

    Article  Google Scholar 

  28. Ding X, Liu B, Yu PS. Holistic lexicon-based approach to opinion mining. Proceedings of the 2008 international conference on web search and data mining. WSDM ’08. New York: ACM; 2008. p. 231–240.

  29. Thelwall M, Buckley K, Paltoglou G. Sentiment strength detection for the social web. J Am Soc Inf Sci Technol 2012;63(1):163–173. Available from: https://doi.org/10.1002/asi.21662.

    Article  Google Scholar 

  30. Turney PD. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual meeting on association for computational linguistics. ACL ’02. Stroudsburg: Association for Computational Linguistics; 2002. p. 417–424.

  31. Valitutti R. WordNet-Affect: an affective extension of WordNet. In: Proceedings of the 4th international conference on language resources and evaluation; 2004. p. 1083–1086.

  32. Cambria E, Poria S, Hazarika D, Kwok K. SenticNet 5: discovering conceptual primitives for sentiment analysis by means of context embeddings. Proceedings of the thirty-second AAAI conference on artificial intelligence; 2018. p. 1795–1802.

  33. Baccianella S, Esuli A, Sebastiani F. SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. Proceedings of the seventh international conference on language resources and evaluation (LREC’10). In: Chair NCC, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, et al., editors. Valletta: European Language Resources Association (ELRA); 2010. p. 2200–2204.

  34. de Albornoz JC, Plaza L, Gervás P. SentiSense: an easily scalable concept-based affective lexicon for sentiment analysis. Proceedings of the eight international conference on language resources and evaluation (LREC’12). In: Chair NCC, Choukri K, Declerck T, Doan MU, Maegaard B, Mariani J, et al, editors. Istanbul: European Language Resources Association (ELRA); 2012. p. 3562–3567.

  35. Atserias J, Villarejo L, Rigau G. Spanish WordNet 1.6: porting the Spanish Wordnet across Princeton versions. Proceedings of the fourth international conference on language resources and evaluation, LREC 2004, May 26-28, 2004. Lisbon: European Language Resources Association; 2004. p. 161–164.

  36. Vossen P, (ed). 1998. EuroWordNet: a multilingual database with lexical semantic networks. Norwell: Kluwer Academic Publishers.

  37. Díaz-Granjel I, Sidorov G, Suárez-Guerra S. Creación y evaluación de un diccionario marcado con emociones y ponderado para el español. Onomázein 2014;29:31–46.

    Article  Google Scholar 

  38. Gómez-Adorno H, Markov I, Sidorov G, Posadas-Durȧn JP, Arias CF. Compilaciȯn de un lexicȯn de redes sociales para la identificaciȯn de perfiles de autor. Res Comput Sci 2016;115:19–27.

    Google Scholar 

  39. Gómez-Adorno H, Markov I, Sidorov G, Posadas-Durán JP, Sanchez-Perez MA, Chanona-Hernández L. Improving feature representation based on a neural network for author profiling in social media texts. Comp Int and Neurosc 2016;2016:1638936:1–1638936:13.

    Google Scholar 

  40. Montejo-Ráez A, Díaz-Galiano MC, Ortega JMP, Lȯpez LAU, et al. Spanish knowledge base generation for polarity classification from masses. 22nd International World Wide Web conference, WWW ’13, Rio de Janeiro, Brazil, May 13-17, 2013, companion volume. International World Wide Web Conferences Steering Committee. In: Carr L, Laender AHF, Lȯscio BF, King I, Fontoura M, and Vrandecic D, editors. ACM; 2013. p. 571–578.

  41. Montejo-Ráez A, Díaz-Galiano M C, Martínez-Santiago F, Ureña-López LA. Crowd explicit sentiment analysis. Knowl-Based Syst 2014;69:134–139.

    Article  Google Scholar 

  42. Kamvar SD, Harris J. We feel fine and searching the emotional web. Proceedings of the Fourth ACM international conference on web search and data mining. WSDM ’11. New York: ACM; 2011. p. 117–126. Available from: https://doi.org/10.1145/1935826.1935854.

  43. Pennebaker JW, Chung CK, Ireland M, Gonzales A, Booth RJ. 2007. The development and psychometric properties of LIWC2007. Austin, TX, LIWC Net.

  44. Pang B, Lee L, Vaithyanathan S. Thumbs up?: sentiment classification using machine learning techniques. Proceedings of the ACL-02 conference on empirical methods in natural language processing - volume 10. EMNLP ’02. Stroudsburg: Association for Computational Linguistics; 2002. p. 79–86.

  45. Dave K, Lawrence S, Pennock DM. Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: WWW2003; 2003. p. 519–528.

  46. Mcdonald R, Hannan K, Neylon T, Wells M, Reynar J. Structured models for fine-to-coarse sentiment analysis. In: Proceedings of the 45th annual meeting of the association of computational linguistics; 2007. p. 432–439.

  47. Tsur O, Davidov D, Rappoport A. ICWSM—a great catchy name: semi-supervised recognition of sarcastic sentences in online product reviews. In: Proceedings of the fourth international AAAI conference on weblogs and social media; 2010. p. 162–169.

  48. Taboada M, Grieve J. Analyzing appraisal automatically. In: Inproceedings of the AAAI spring symposium on exploring attitude and affect in text: theories and applications; 2004. p. 158–161.

  49. Cruz FL, Troyano JA, Enriquez F, Ortega J. Experiments in sentiment classification of movie reviews in Spanish. Procesamiento del lenguaje Natural (Sociedad Española para el Procesamiento del Lenguaje Natural). 2012, 41.

  50. Martín-Valdivia MT, Martínez-Cámara E, Perea-Ortega JM, na López LAU. Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches. Expert Syst Appl 2013;40 (10):3934–3942.

    Article  Google Scholar 

  51. Vilares D, Alonso MA, Gȯmez-Rodríguez C. A syntactic approach for opinion mining on Spanish reviews. Nat Lang Eng 2015;21(1):139–163.

    Article  Google Scholar 

  52. Vicente IS, Agerri R, Rigau G. Simple, robust and (almost) unsupervised generation of polarity lexicons for multiple languages. Proceedings of the 14th conference of the European chapter of the association for computational linguistics, EACL 2014, April 26-30, 2014. In: Bouma G and Parmentier Y, editors. Gothenburg: The Association for Computer Linguistics; 2014 . p. 88–97.

  53. Davidov D, Tsur O, Rappoport A. Enhanced sentiment learning using Twitter hashtags and smileys. Proceedings of the 23rd international conference on computational linguistics: posters. COLING ’10. Stroudsburg: Association for Computational Linguistics; 2010. p. 241–249.

  54. Jiang L, Yu M, Zhou M, Liu X, Zhao T. Target-dependent Twitter sentiment classification. Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies - volume 1. HLT ’11. Stroudsburg: Association for Computational Linguistics; 2011. p. 151–160.

  55. Montejo-Ráez A, Martínez-Cámara E, Martín-Valdivia MT, Ureña López LA. Ranked Word Net graph for sentiment polarity classification in Twitter. Comput Speech Lang 2014;28(1):93– 107.

    Article  Google Scholar 

  56. González-Ibáñez R, Muresan S, Wacholder N. Identifying sarcasm in Twitter: a closer look. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers. vol. 2. Citeseer; 2011. p. 581–586.

  57. Reyes A, Rosso P, Buscaldi D. From humor recognition to irony detection. Figurative Lang Soc Med Data Knowl Eng 2012;74:1–12.

    Article  Google Scholar 

  58. Martínez-Cámara E, García-Cumbreras MA, Martín-Valdivia MT, Ureña López LA. Detecting polarity in Spanish Tweets. Procesamiento del Lenguaje Natural (SEPLN); 2011. 47.

  59. Jasso G, Meza-Ruíz IV. Character and word baselines systems for irony detection in Spanish short texts. Procesamiento del Lenguaje Natural 2016;56:41–48. Available from: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/5285.

    Google Scholar 

  60. Barbieri F, Ronzano F, Saggion H. Is this Tweet satirical? A computational approach for satire detection in Spanish. Procesamiento del Lenguaje Natural 2015;55:135–142. Available from: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/5225.

    Google Scholar 

  61. Barbieri F, Ronzano F, Saggion H. Do we criticise (and laugh) in the same way? Automatic detection of multi-lingual satirical news in Twitter. Proceedings of the twenty-fourth international joint conference on artificial intelligence, IJCAI 2015. In: Yang Q and Wooldridge M, editors. Buenos Aires: AAAI Press; 2015. p. 1215–1221. Available from: http://ijcai.org/Abstract/15/175.

  62. Cumbreras MÁG, Villena-Román J, Cámara EM, Díaz-Galiano MC, Martín-Valdivia MT, López LAU. Overview of TASS 2016. Proceedings of TASS 2016: workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN Conference (SEPLN 2016), Salamanca, Spain, September 13th, 2016.. vol. 1702 of CEUR Workshop Proceedings. CEUR-WS.org. In: Villena-Román J, Cumbreras MÁG, Cámara EM, Díaz-Galiano MC, Martín-Valdivia MT, and López LAU, editors. Salamanca; 2016. p. 13–21. Available from: http://ceur-ws.org/Vol-1702/tass2016_proceedings_v24.pdf.

  63. Misra A, Walker M. Topic independent identification of agreement and disagreement in social media dialogue. Proceedings of the SIGDIAL 2013 conference. Metz: Association for Computational Linguistics; 2013. p. 41–50.

  64. Misra A, Anand P, Fox Tree JE, Walker M. Using summarization to discover argument facets in online idealogical dialog. Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies. Denver: Association for Computational Linguistics; 2015. p. 430–440.

  65. Walker M, Anand P, Abbott R, Grant R. Stance classification using dialogic properties of persuasion. Proceedings of the 2012 Conference of the North American chapter of the association for computational linguistics: human language technologies. NAACL HLT ’12. Stroudsburg: Association for Computational Linguistics; 2012. p. 592–596.

  66. De Winter JCF, Kyriakidis M, Dodou D, Happee R. Using CrowdFlower to study the relationship between self-reported violations and traffic accidents. In: 6th International conference on applied human factors and ergonomics (AHFE 2015) and the affiliated conferences, {AHFE} 2015. vol. 3; 2015. p. 2518–2525.

  67. Justo R, Torres MI, M AJ. Measuring the quality of annotations for a subjective crowdsourcing task. In: Proceedings of 8th Iberian conference on pattern recognition and image analysis (in press). International Association for Pattern Recognition (IAPR); 2017.

  68. Buchholz S, Latorre J, Yanagisawa K. In: Crowdsourced assessment of speech synthesis. Wiley; 2013. p. 173–216.

  69. Padró L, Stanilovsky E. FreeLing 3.0: towards wider multilinguality. Proceedings of the language resources and evaluation conference (LREC 2012). Istanbul: ELRA; 2012. p. 2473– 2479.

Download references

Funding

This study was partially funded by the Spanish Government (TIN2014-54288-C4-4-R and TIN2017-85854-C4-3-R) by the European Unions’s H2020 program under grant 769872 and by the National Science Foundation of USA (NSF CISE R1 #1202668).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raquel Justo.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Justo, R., Alcaide, J.M., Torres, M.I. et al. Detection of Sarcasm and Nastiness: New Resources for Spanish Language. Cogn Comput 10, 1135–1151 (2018). https://doi.org/10.1007/s12559-018-9578-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-018-9578-5

Keywords

Navigation