How important is syntactic parsing accuracy? An empirical evaluation on rule-based sentiment analysis

  • Carlos Gómez-Rodríguez
  • Iago Alonso-Alonso
  • David Vilares
Article
  • 120 Downloads

Abstract

Syntactic parsing, the process of obtaining the internal structure of sentences in natural languages, is a crucial task for artificial intelligence applications that need to extract meaning from natural language text or speech. Sentiment analysis is one example of application for which parsing has recently proven useful. In recent years, there have been significant advances in the accuracy of parsing algorithms. In this article, we perform an empirical, task-oriented evaluation to determine how parsing accuracy influences the performance of a state-of-the-art rule-based sentiment analysis system that determines the polarity of sentences from their parse trees. In particular, we evaluate the system using four well-known dependency parsers, including both current models with state-of-the-art accuracy and more innacurate models which, however, require less computational resources. The experiments show that all of the parsers produce similarly good results in the sentiment analysis task, without their accuracy having any relevant influence on the results. Since parsing is currently a task with a relatively high computational cost that varies strongly between algorithms, this suggests that sentiment analysis researchers and users should prioritize speed over accuracy when choosing a parser; and parsing researchers should investigate models that improve speed further, even at some cost to accuracy.

Keywords

Syntactic parsing Sentiment analysis Natural language processing Artificial intelligence 

References

  1. Andor D, Alberti C, Weiss D, Severyn A, Presta A, Ganchev K, Petrov S, Collins M (2016) Globally normalized transition-based neural networks. arXiv: 1603.06042 [cs.CL]
  2. Asmi A, Ishaya T (2012) Negation identification and calculation in sentiment analysis. In: The second international conference on advances in information mining and management, pp 1–7Google Scholar
  3. Aue A, Gamon M (2005) Customizing sentiment classifiers to new domains: a case study. In: Proceedings of the 5th international conference on recent advances in natural language processing (RANLP 2015), Borovets, Bulgaria. https://www.microsoft.com/en-us/research/publication/customizing-sentiment-classifiers-to-new-domains-a-case-study/
  4. Ballesteros M, Nivre J (2012) Maltoptimizer: a system for maltparser optimization. In: Chair NCC, Choukri K, Declerck T, Dogan MU, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S (eds) Proceedings of the eight international conference on language resources and evaluation (LREC’12). European Language Resources Association (ELRA), IstanbulGoogle Scholar
  5. Bender EM, Flickinger D, Oepen S, Zhang Y (2011) Parser evaluation over local and non-local deep dependencies in a large corpus. In: Proceedings of the 2011 conference on empirical methods in natural language processing, Association for Computational Linguistics, Edinburgh, Scotland, UK, pp 397–408. http://www.aclweb.org/anthology/D11-1037
  6. Berzak Y, Huang Y, Barbu A, Korhonen A, Katz B (2016) Bias and agreement in syntactic annotations. arXiv:1605.04481 [cs.CL]
  7. Branavan SRK, Silver D, Barzilay R (2012) Learning to win by reading manuals in a monte-carlo framework. J Artif Int Res 43(1):661–704. http://dl.acm.org/citation.cfm?id=2387915.2387932
  8. Buyko E, Hahn U (2010) Evaluating the impact of alternative dependency graph encodings on solving event extraction tasks. In: Proceedings of the 2010 conference on empirical methods in natural language processing, Association for Computational Linguistics, Cambridge, MA, pp 982–992. http://www.aclweb.org/anthology/D10-1096
  9. Chen D, Manning C (2014) A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, pp 740–750. http://www.aclweb.org/anthology/D14-1082
  10. Choi JD, McCallum A (2013) Transition-based dependency parsing with selectional branching. In: Proceedings of the 51st annual meeting of the association for computational linguistics (volume 1: long papers), Sofia, Bulgaria, pp 1052–1062. http://www.aclweb.org/anthology/P13-1104
  11. Clark S, Copestake A, Curran JR, Zhang Y, Herbelot A, Haggerty J, Ahn BG, Wyk CV, Roesner J, Kummerfeld J, Dawborn T (2009) Large-scale syntactic processing: parsing the web. Technical report. Johns Hopkins UniversityGoogle Scholar
  12. Cohen SB, Gómez-Rodríguez C, Satta G (2011) Exact inference for generative probabilistic non-projective dependency parsing. In: Proceedings of the 2011 conference on empirical methods in natural language processing (EMNLP), Association for Computational Linguistics, pp 1234–1245. http://www.aclweb.org/anthology/D11-1114
  13. DeNeefe S, Knight K (2009) Synchronous tree adjoining machine translation. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Association for Computational Linguistics, Singapore, pp 727–736. http://www.aclweb.org/anthology/D/D09/D09-1076
  14. Dyer C, Ballesteros M, Ling W, Matthews A, Smith NA (2015) Transition-based dependency parsing with stack long short-term memory. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: long papers), Association for Computational Linguistics, Beijing, China, pp 334–343. http://www.aclweb.org/anthology/P15-1033
  15. Eisner J (1996) Three new probabilistic models for dependency parsing: an exploration. In: Proceedings of the 16th international conference on computational linguistics (COLING-96), San Francisco, CA, USA, pp 340–345Google Scholar
  16. Farghaly A, Shaalan K (2009) Arabic natural language processing: challenges and solutions. ACM Trans Asian Lang Inf Process (TALIP) 8(4):14:1–14:22. doi:10.1145/1644879.1644881 Google Scholar
  17. Goldberg Y, Nivre J (2012) A dynamic oracle for arc-eager dependency parsing. In: Proceedings of the 24th international conference on computational linguistics (COLING), Association for Computational Linguistics, pp 959–976. http://aclweb.org/anthology/C/C12/C12-1059.pdf
  18. Gómez-Rodríguez C (2016) Restricted non-projectivity: coverage vs efficiency. Comput Linguist 42(4):809–817. doi:10.1162/COLI_a_00267 CrossRefMathSciNetGoogle Scholar
  19. Gómez-Rodríguez C, Carroll J, Weir D (2008) A deductive approach to dependency parsing. In: Proceedings of the 46th annual meeting of the Association for Computational Linguistics: human language technologies (ACL’08:HLT), Association for Computational Linguistics, pp 968–976. http://www.aclweb.org/anthology/P/P08/P08-1110
  20. Gómez-Rodríguez C, Carroll JA, Weir DJ (2011) Dependency parsing schemata and mildly non-projective dependency parsing. Computat Linguist 37(3):541–586CrossRefMathSciNetGoogle Scholar
  21. Goto I, Utiyama M, Onishi T, Sumita E (2011) A comparison study of parsers for patent machine translation. In: Proceedings of the 13th machine translation summit (MT Summit XIII), International Association for Machine Translation, pp 448–455. http://www.mt-archive.info/MTS-2011-Goto.pdf
  22. Huang L, Sagae K (2010) Dynamic programming for linear-time incremental parsing. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, ACL ’10, pp 1077–1086. http://portal.acm.org/citation.cfm?id=1858681.1858791
  23. Jia L, Yu C, Meng W (2009) The effect of negation on sentiment analysis and retrieval effectiveness. CIKM’09 proceeding of the 18th ACM conference on information and knowledge management. ACM Press, Hong Kong, pp 1827–1830CrossRefGoogle Scholar
  24. Joshi M, Penstein-Rosé C (2009) Generalizing dependency features for opinion mining. In: Proceedings of the ACL-IJCNLP 2009 conference short papers, Association for Computational Linguistics, Stroudsburg, PA, USA, ACLShort ’09, pp 313–316Google Scholar
  25. Kahane S, Mazziotta N (2015) Syntactic polygraphs. a formalism extending both constituency and dependency. In: Proceedings of the 14th meeting on the mathematics of language (MoL 2015), Association for Computational Linguistics, Chicago, USA, pp 152–164. http://www.aclweb.org/anthology/W15-2313
  26. Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. In: The 52nd annual meeting of the association for computational linguistics. Proceedings of the conference. Volume 1: long papers, ACL, Baltimore, Maryland, USA, pp 655–665Google Scholar
  27. Khan FH, Qamar U, Bashir S (2016a) Esap: a decision support framework for enhanced sentiment analysis and polarity classification. Inf Sci 367:862–873CrossRefGoogle Scholar
  28. Khan FH, Qamar U, Bashir S (2016b) Swims: semi-supervised subjective feature weighting and intelligent model selection for sentiment analysis. Knowl Based Syst 100:97–111CrossRefGoogle Scholar
  29. Kong L, Schneider N, Swayamdipta S, Bhatia A, Dyer C, Smith NA (2014) A dependency parser for tweets. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Association for Computational Linguistics, Doha, Qatar, pp 1001–1012. http://www.aclweb.org/anthology/D14-1108
  30. Kuhlmann M, Gómez-Rodríguez C, Satta G (2011) Dynamic programming algorithms for transition-based dependency parsers. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies (ACL 2011), Association for Computational Linguistics, Portland, Oregon, USA, pp 673–682. http://www.aclweb.org/anthology/P11-1068
  31. Liu Q, Gao Z, Liu B, Zhang Y (2016) Automated rule selection for opinion target extraction. Knowl Based Syst 104:74–88CrossRefGoogle Scholar
  32. Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the penn treebank. Comput Linguist 19(2):313–330Google Scholar
  33. Martins A, Smith N, Xing E, Aguiar P, Figueiredo M (2010) Turbo parsers: dependency parsing by approximate variational inference. In: Proceedings of the 2010 conference on empirical methods in natural language processing, Association for Computational Linguistics, Cambridge, MA, pp 34–44. http://www.aclweb.org/anthology/D10-1004
  34. Martins A, Almeida M, Smith NA (2013) Turning on the turbo: fast third-order non-projective turbo parsers. In: Proceedings of the 51st annual meeting of the association for computational linguistics (volume 2: short papers), Sofia, Bulgaria, pp 617–622. http://www.aclweb.org/anthology/P13-2109
  35. McDonald R, Nivre J (2007) Characterizing the errors of data-driven dependency parsing models. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 122–131Google Scholar
  36. McDonald R, Satta G (2007) On the complexity of non-projective data-driven dependency parsing. In: IWPT 2007: proceedings of the 10th international conference on parsing technologies, pp 121–132Google Scholar
  37. McDonald R, Pereira F, Ribarov K, Hajič J (2005) Non-projective dependency parsing using spanning tree algorithms. In: HLT/EMNLP 2005: proceedings of the conference on human language technology and empirical methods in natural language processing, pp 523–530Google Scholar
  38. McDonald R, Nivre J, Quirmbach-brundage Y, Goldberg Y, Das D, Ganchev K, Hall K, Petrov S, Zhang H, Täckström O, Bedini C, Castelló N, Lee J (2013) Universal dependency annotation for multilingual parsing. In: Proceedings of the 51st annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 92–97Google Scholar
  39. Miceli Barone AV, Attardi G (2015) Non-projective dependency-based pre-reordering with recurrent neural network for machine translation. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: Long Papers), Association for Computational Linguistics, Beijing, China, pp 846–856. http://www.aclweb.org/anthology/P15-1082
  40. Miyao Y, Sætre R, Sagae K, Matsuzaki T, Tsujii J (2008) Task-oriented evaluation of syntactic parsers and their representations. In: Proceedings of ACL-08: HLT, association for computational linguistics, Columbus, Ohio, pp 46–54. http://www.aclweb.org/anthology/P/P08/P08-1006
  41. Napoles C, Gormley M, Van Durme B (2012) Annotated gigaword. In: Proceedings of the joint workshop on automatic knowledge base construction and web-scale knowledge extraction, Association for Computational Linguistics, pp 95–100Google Scholar
  42. Nivre J, Hall J, Nilsson J, Chanev A, Eryiǧit G, Kübler S, Marinov S, Marsi E (2007) Maltparser: a language-independent system for data-driven dependency parsing. Nat Lang Eng 13:95–135Google Scholar
  43. Nivre J, Rimell L, McDonald R, Gómez Rodríguez C (2010) Evaluation of dependency parsers on unbounded dependencies. In: Proceedings of the 23rd international conference on computational linguistics (COLING 2010), Association for Computational Linguistics, pp 833–841. http://www.aclweb.org/anthology/C10-1094
  44. Padó S, Noh TG, Stern A, Wang R, Zanoli R (2015) Design and realization of a modular architecture for textual entailment. Nat Lang Eng 21(2):167–200CrossRefGoogle Scholar
  45. Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 271–278Google Scholar
  46. Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 115–124Google Scholar
  47. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. EMNLP 14:1532–1543Google Scholar
  48. Pitler E, Kannan S, Marcus M (2013) Finding optimal 1-endpoint-crossing trees. Trans Assoc Comput Linguist 1:13–24. http://aclweb.org/anthology/Q13-1002
  49. Popel M, Mareček D, Green N, Zabokrtsky Z (2011) Influence of parser choice on dependency-based mt. In: Proceedings of the sixth workshop on statistical machine translation, Association for Computational Linguistics, Edinburgh, Scotland, pp 433–439. http://www.aclweb.org/anthology/W11-2153
  50. Poria S, Cambria E, Winterstein G, Huang GB (2014) Sentic patterns: dependency-based rules for concept-level sentiment analysis. Knowl Based Syst 69:45–63CrossRefGoogle Scholar
  51. Quirk C, Corston-Oliver S (2006) The impact of parse quality on syntactically-informed statistical machine translation. In: Proceedings of the 2006 conference on empirical methods in natural language processing, Association for Computational Linguistics, Sydney, Australia, pp 62–69. http://www.aclweb.org/anthology/W06-1608
  52. Rajpurkar P, Zhang J, Konstantin L, Liang P (2016) SQuAD: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250
  53. Rasooli MS, Tetreault JR (2015) Yara parser: a fast and accurate dependency parser. CoRR http://arxiv.org/abs/1503.06733
  54. Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP 2013. 2013 Conference on empirical methods in natural language processing. Proceedings of the Conference, ACL, Seattle, Washington, USA, pp 1631–1642Google Scholar
  55. Song M, Kim WC, Lee D, Heo GE, Kang KY (2015) PKDE4J: entity and relation extraction for public knowledge discovery. J Biomed Inform 57:320–332. doi:10.1016/j.jbi.2015.08.008 CrossRefGoogle Scholar
  56. Taboada M, Grieve J (2004) Analyzing appraisal automatically. In: Proceedings of AAAI spring symposium on exploring attitude and affect in text (AAAI Technical Report SS0407), Stanford University, CA, AAAI Press, pp 158–161Google Scholar
  57. Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307CrossRefGoogle Scholar
  58. Taulé M, Martí MA, Recasens M (2008) AnCora: multilevel annotated corpora for catalan and Spanish. In: Calzolari N, Choukri K, Maegaard B, Mariani J, Odjik J, Piperidis S, Tapias D (eds) Proceedings of the sixth international conference on language resources and evaluation (LREC’08), Marrakech, Morocco, pp 96–101Google Scholar
  59. Vilares D, Alonso MA, Gómez-Rodríguez C (2015a) A linguistic approach for determining the topics of Spanish Twitter messages. J Inf Sci 41(02):127–145CrossRefGoogle Scholar
  60. Vilares D, Alonso MA, Gómez-Rodríguez C (2015b) A syntactic approach for opinion mining on Spanish reviews. Nat Lang Eng 21(01):139–163CrossRefGoogle Scholar
  61. Vilares D, Alonso MA, Gómez-Rodríguez C (2015c) On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages. J Assoc Inf Sci Sci Technol 66(9):1799–1816CrossRefGoogle Scholar
  62. Vilares D, Gómez-Rodríguez C, Alonso MA (2017) Universal, unsupervised (rule-based), uncovered sentiment analysis. Knowl Based Syst 118:45–55. doi:10.1016/j.knosys.2016.11.014 CrossRefGoogle Scholar
  63. Volokh A (2013) Performance-oriented dependency parsing. Doctoral dissertation. Saarland University, Saarbrücken, GermanyGoogle Scholar
  64. Volokh A, Neumann G (2012) Task-oriented dependency parsing evaluation methodology. In: IEEE 13th international conference on information reuse and integration, IRI 2012, Las Vegas, NV, USA, 8–10 Aug 2012, pp 132–137. doi:10.1109/IRI.2012.6303001
  65. Wu Y, Zhang Q, Huang X, Wu L (2009) Phrase dependency parsing for opinion mining. In: Proceedings of the 2009 conference on empirical methods in natural language processing, ACL, Singapore, pp 1533–1541Google Scholar
  66. Xiao T, Zhu J, Zhang C, Liu T (2016) Syntactic skeleton-based translation. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, 12–17 Feb 2016, Phoenix, Arizona, USA, pp 2856–2862. http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11933
  67. Yu M, Gormley MR, Dredze M (2015) Combining word embeddings and feature embeddings for fine-grained relation extraction. In: Proceedings of the 2015 conference of the north american chapter of the Association for Computational Linguistics: human language technologies, Association for Computational Linguistics, Denver, Colorado, pp 1374–1379. http://www.aclweb.org/anthology/N15-1155
  68. Yuret D, Han A, Turgut Z (2010) Semeval-2010 task 12: Parser evaluation using textual entailments. In: Proceedings of the 5th international workshop on semantic evaluation, Association for Computational Linguistics, Uppsala, Sweden, pp 51–56. http://www.aclweb.org/anthology/S10-1009
  69. Zhang Y, Nivre J (2011) Transition-based dependency parsing with rich non-local features. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers-volume 2, pp. 188–193 http://dl.acm.org/citation.cfm?id=2002736.2002777

Copyright information

© Springer Science+Business Media B.V. 2017

Authors and Affiliations

  1. 1.FASTPARSE Lab, Grupo LyS, Departamento de ComputaciónUniversidade da CoruñaA CoruñaSpain

Personalised recommendations