SFU ReviewSP-NEG: a Spanish corpus annotated with negation for sentiment analysis. A typology of negation patterns

  • Salud María Jiménez-Zafra
  • Mariona Taulé
  • M. Teresa Martín-Valdivia
  • L. Alfonso Ureña-López
  • M. Antónia Martí
Original Paper

Abstract

In this paper, we present SFU ReviewSP-NEG, the first Spanish corpus annotated with negation with a wide coverage freely available. We describe the methodology applied in the annotation of the corpus including the tagset, the linguistic criteria and the inter-annotator agreement tests. We also include a complete typology of negation patterns in Spanish. This typology has the advantage that it is easy to express in terms of a tagset for corpus annotation: the types are clearly defined, which avoids ambiguity in the annotation process, and they provide wide coverage (i.e. they resolved all the cases occurring in the corpus). We use the SFU ReviewSP as a base in order to make the annotations. The corpus consists of 400 reviews, 221,866 words and 9455 sentences, out of which 3022 sentences contain at least one negation structure.

Keywords

Annotation of negation Scope of negation Polarity annotation Sentiment analysis 

References

  1. Afzal, Z., Pons, E., Kang, N., Sturkenboom, M. C., Schuemie, M. J., & Kors, J. A. (2014). Contextd: An algorithm to identify contextual properties of medical terms in a dutch clinical corpus. BMC Bioinformatics, 15(1), 1.CrossRefGoogle Scholar
  2. Banjade, R., & Rus, V. (2016). Dt-neg: Tutorial dialogues annotated for negation scope and focus in context. In Chair NCC, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis (Eds.), Proceedings of the tenth international conference on language resources and evaluation (LREC 2016), European Language Resources Association (ELRA), Paris, France.Google Scholar
  3. Blanco, E., & Moldovan, D. (2014). Retrieving implicit positive meaning from negated statements. Natural Language Engineering, 20(04), 501–535.CrossRefGoogle Scholar
  4. Bokharaeian, B., Diaz, A., Neves, M., & Francisco, V. (2014). Exploring negation annotations in the drugddi corpus. In Fourth workshop on building and evaluating resources for health and biomedical text processing (BIOTxtM 2014), Citeseer.Google Scholar
  5. Councill, IG., McDonald, R., & Velikovich, L. (2010). What’s great and what’s not: learning to classify the scope of negation for improved sentiment analysis. In Proceedings of the workshop on negation and speculation in natural language processing, Association for Computational Linguistics, (pp. 51–59).Google Scholar
  6. Demonte, V., & Bosque, I. (1999). Gramática descriptiva de la lengua española. Espasa Calpe.Google Scholar
  7. Erk, K., & Pado, S. (2004). A powerful and versatile xml format for representing role-semantic annotation. In LREC, Citeseer.Google Scholar
  8. Española, R. A. (2009). Nueva gramática de la lengua española.Google Scholar
  9. Herrero-Zazo, M., Segura-Bedmar, I., Martínez, P., & Declerck, T. (2013). The ddi corpus: An annotated corpus with pharmacological substances and drug-drug interactions. Journal of Biomedical Informatics, 46(5), 914–920.CrossRefGoogle Scholar
  10. Huddleston, R., Pullum, G. K., et al. (2002). The cambridge grammar of English language. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  11. Jiménez-Zafra, S. M., Martın-Valdivia, M. T., Urena-López, L. A., Martı, M. A., & Taulé, M. (2016). Problematic cases in the annotation of negation in Spanish. ExProM 2016 p 42.Google Scholar
  12. Kim, J. D., Ohta, T., & Tsujii, J. (2008). Corpus annotation for mining biomedical events from literature. BMC Bioinformatics, 9(1), 1.CrossRefGoogle Scholar
  13. Konstantinova, N., & De Sousa, S. C. (2011). Annotating negation and speculation: The case of the review domain. In RANLP student research workshop (pp. 139–144).Google Scholar
  14. Konstantinova, N., De Sousa, S.C., Díaz, N.P.C., López, M.J.M., Taboada, M., & Mitkov, R. (2012). A review corpus annotated for negation, speculation and their scope. In LREC (pp. 3190–3195).Google Scholar
  15. Martí, M. A., Martín-Valdivia, M. T., Taulé, M., Jiménez-Zafra, S. M., Nofre, M., & Marsó, L. (2016). La negación en español: Análisis y tipología de patrones de negación. Procesamiento del Lenguaje Natural, 57, 41–48.Google Scholar
  16. Morante, R., & Daelemans, W. (2012). Conandoyle-neg: Annotation of negation in conan doyle stories. In Proceedings of the eighth international conference on language resources and evaluation, Istanbul, Citeseer.Google Scholar
  17. Morante, R., & Sporleder, C. (2012). Modality and negation: An introduction to the special issue. Computational Linguistics, 38(2), 223–260.CrossRefGoogle Scholar
  18. Moreno, A., López, S., Sánchez, F., & Grishman, R. (2003). Developing a syntactic annotation scheme and tools for a Spanish treebank. In Treebanks (pp. 149–163). Springer, New YorkGoogle Scholar
  19. Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71–106.CrossRefGoogle Scholar
  20. Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on empirical methods in natural language processing-Volume 10, Association for Computational Linguistics (pp. 79–86).Google Scholar
  21. Payne, T. E. (1997). Describing morphosyntax: A guide for field linguists. Cambridge: Cambridge University Press.Google Scholar
  22. Polanyi, L., & Zaenen, A. (2006). Contextual valence shifters. In Computing attitude and affect in text: Theory and applications (pp. 1–10). Springer, New York.Google Scholar
  23. Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., AL-Smadi, M., Al-Ayyoub, M., Zhao, Y., & Qin, B. (2016). Semeval-2016 task 5: Aspect based sentiment analysis. In Proceedings of SemEval-2016 (pp. 19–30).Google Scholar
  24. Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., et al. (2007). Bioinfer: a corpus for information extraction in the biomedical domain. BMC Bioinformatics, 8(1), 1.CrossRefGoogle Scholar
  25. Ruppenhofer, J., Sporleder, C., Morante, R., Baker, C., & Palmer, M. (2010). Semeval-2010 task 10: Linking events and their participants in discourse. In Proceedings of the 5th international workshop on semantic evaluation, association for computational linguistics (pp. 45–50).Google Scholar
  26. Sandoval, A. M., & Salazar, M. G. (2013). La anotación de la negación en un corpus escrito etiquetado sintácticamente annotation of negation in a written treebank. Revista Iberoamericana de Linguistica 8.Google Scholar
  27. Segura Bedmar, I., Martínez, P., & Herrero Zazo, M. (2013). Semeval-2013 task 9: Extraction of drug-drug interactions from biomedical texts (ddiextraction 2013). Association for Computational Linguistics.Google Scholar
  28. Taboada, M., Anthony, C., & Voll, K. (2006). Methods for creating semantic orientation dictionaries. In Proceedings of the 5th conference on language resources and evaluation (LREC’06) (pp. 427–432).Google Scholar
  29. Vincze, V. (2010). Speculation and negation annotation in natural language texts: what the case of bioscope might (not) reveal. In Proceedings of the workshop on negation and speculation in natural language processing, Association for Computational Linguistics (pp. 28–31).Google Scholar
  30. Vincze, V., Szarvas, G., Farkas, R., Móra, G., & Csirik, J. (2008). The bioscope corpus: Biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 9(11), 1.Google Scholar
  31. Wiegand, M., Balahur, A., Roth, B., Klakow, D., & Montoyo, A. (2010). A survey on the role of negation in sentiment analysis. In Proceedings of the workshop on negation and speculation in natural language processing, Association for Computational Linguistics (pp. 60–68).Google Scholar
  32. Wishart, D. S., Knox, C., Guo, A. C., Cheng, D., Shrivastava, S., Tzur, D., et al. (2008). Drugbank: A knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Research, 36(suppl 1), D901–D906.Google Scholar
  33. Zou, B., Zhou, G., & Zhu, Q. (2016). Research on Chinese negation and speculation: Corpus annotation and identification. Frontiers of Computer Science (pp. 1–13).Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversidad de JaénJaénSpain
  2. 2.CLiC, Centre de Llenguatge i Computació, Department of LinguisticsUniversity of BarcelonaBarcelonaSpain

Personalised recommendations