Advertisement

SyntaxNet Errors from the Linguistic Point of View

  • Oleg Durandin
  • Alexey MalafeevEmail author
  • Nikolai Zolotykh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10716)

Abstract

The paper deals with Google’s universal parser SyntaxNet. The system was used to analyze the Universal Dependencies linguistic corpora. We conducted an error analysis of the output of the parser to reveal to what extent the error types are connected with or preconditioned by the language types. In particular, we carried out several experiments, clustering the languages based on the frequency of different errors made by SyntaxNet, and studied the similarity of the resulting clustering with the traditional typology of languages. Three types of errors were separately considered: part-of-speech tagging, dependency labeling, and attachment errors. We show that there is indeed a correlation between error frequencies and language types, which might indicate that to further improve the performance of a universal parser, one needs to take into account language-specific morphological and syntactic structures.

Keywords

Natural language processing Syntax parsing SyntaxNet Error analysis Linguistic typology 

References

  1. Andor, D., Alberti, C., Weiss, D., Severyn, A., Presta, A., Ganchev, K., Petrov, S., Collins, M.: Globally normalized transition-based neural networks. arXiv preprint arXiv:1603.06042 (2016)
  2. Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 149–164. Association for Computational Linguistics (2006)Google Scholar
  3. Charniak, E.: A maximum-entropy-inspired parser. In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference, pp. 132–139. Association for Computational Linguistics (2000)Google Scholar
  4. Collins, M.: Three generative, lexicalised models for statistical parsing. In: Proceedings of the Eighth Conference on European Chapter of the Association for Computational Linguistics, pp. 16–23. Association for Computational Linguistics (1997)Google Scholar
  5. Collins, M.: Head-driven statistical models for natural language parsing. Comput. Linguis. 29(4), 589–637 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  6. Covington, M.A.: A fundamental algorithm for dependency parsing. In: Proceedings of the 39th Annual ACM Southeast Conference, pp. 95–102 (2001)Google Scholar
  7. Eisner, J.M.: Three new probabilistic models for dependency parsing: an exploration. In: Proceedings of the 16th Conference on Computational Linguistics, vol. 1, pp. 340–345. Association for Computational Linguistics (1996)Google Scholar
  8. Ferguson, C.A.: Sports announcer talk: syntactic aspects of register variation. Lang. Soc. 12(2), 153–172 (1983)CrossRefGoogle Scholar
  9. Ferrara, K., Brunner, H., Whittemore, G.: Interactive written discourse as an emergent register. Written Commun. 8(1), 8–34 (1991)CrossRefGoogle Scholar
  10. Haegeman, L.: Understood subjects in English diaries. On the relevance of theoretical syntax for the study of register variation. Multilingua J. Cross Cult. Interlanguage Commun. 9(2), 157–199 (1990)CrossRefGoogle Scholar
  11. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 423–430. Association for Computational Linguistics (2003)Google Scholar
  12. Li, J., Monroe, W., Jurafsky, D.: Understanding neural networks through representation erasure. arXiv preprint arXiv:1612.08220 (2017)
  13. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)Google Scholar
  14. Matthews, P.H.: Syntax. Cambridge Textbooks in Linguistics, pp. 69–75. Cambridge University Press, Cambridge (1981)Google Scholar
  15. McDonald, R., Lerman, K., Pereira, F.: Multilingual dependency analysis with a two-stage discriminative parser. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 216–220. Association for Computational Linguistics (2006)Google Scholar
  16. McDonald, R.T., Nivre, J.: Characterizing the errors of data-driven dependency parsing models. In: EMNLP-CoNLL, pp. 122–131 (2007)Google Scholar
  17. McDonald, R.T., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Castelló, N.B., Lee, J.: Universal dependency annotation for multilingual parsing. In: ACL (2), pp. 92–97 (2013)Google Scholar
  18. Nivre, J.: Dependency grammar and dependency parsing. MSI Rep. 5133(1959), 1–32 (2005)Google Scholar
  19. Nivre, J., Hall, J., Nilsson, J.: Maltparser: a data-driven parser-generator for dependency parsing. In: Proceedings of LREC, vol. 6, pp. 2216–2219 (2006)Google Scholar
  20. Nivre, J., de Marneffe, M.C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C.D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., Zeman, D.: Universal dependencies v1: a multilingual treebank collection. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 1659–1666, May 2016Google Scholar
  21. Petrov, S.: Announcing syntaxnet: The world’s most accurate parser goes open source. Google Research Blog, 12 May 2016Google Scholar
  22. Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.National Research University Higher School of EconomicsNizhny NovgorodRussia
  2. 2.Lobachevsky State University of Nizhny NovgorodNizhny NovgorodRussia

Personalised recommendations