Towards a Framework for Evaluating Syntactic Parsers

  • Tuomo Kakkonen
  • Erkki Sutinen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4139)


Despite its great importance to developing parsing systems, the task of evaluating the performance of a syntactic parser of natural language is poorly defined. This paper provides a survey of parser evaluation methods and outlines a framework for experimental parser evaluation. Clearly, there is a lack of a comprehensive evaluation framework and a generic evaluation tool for parsers in the research community. Several evaluation methods exist and some practical evaluations have been carried out, but they usually concentrate on a single level of parsers’ performance. The proposed framework focuses on intrinsic evaluation, providing useful information for parser developers. We provide a fuller picture of parser’s performance compared to using the standard precision and recall measures. In addition, we consider ways of using the framework for comparative evaluations. The main motivation for this work is to serve as a requirements analysis for a parser evaluation tool to be implemented.


Test Suite Natural Language Processing Intrinsic Evaluation Practical Evaluation Annotation Scheme 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Balkan, L., Meijer, S., Arnold, D., Dauphin, E., Estival, D., Falkedal, K., Lehmann, S., Reginier-Prost, S.: Test Suite Design Guidelines and Methodology. Report to LRE 62-089 (D-WP2.1) (1994)Google Scholar
  2. 2.
    Bangalore, S., Sarkar, A., Doran, C., Hockey, B.: Grammar & Parser Evaluation in the XTAG Project. In: Proceedings of the 1st International Conference on Language Resources and Evaluation (LREC), Granada, Spain (1998)Google Scholar
  3. 3.
    Chomsky, N.: Aspects of the Theory of Syntax. Cambridge, Massachusetts, USA (1965)Google Scholar
  4. 4.
    Kakkonen, T.: Dependency Treebanks: Methods, Annotation Schemes and Tools. In: Proceedings of the 15th Nordic Conference of Computational Linguistics, Joensuu, Finland (2005)Google Scholar
  5. 5.
    Mengel, A., Lezius, W.: An XML-based Representation Format for Syntactically Annotated Corpora. In: Proceedings of the 2nd LREC, Athens, Greece (2000)Google Scholar
  6. 6.
    Ide, N., Romary, L.: A Common Frameworks for Syntactic Annotation. In: Proceedings of the Association for Computational Linguistic, 39th Annual Meeting and 10th Conference of the European Chapter, Toulouse, France (2001)Google Scholar
  7. 7.
    Atwell, E., Demetriou, G., Hughes, J., Schiffrin, A., Souter, C., Wilcock, S.: A Comparative Evaluation of Modern English Corpus Grammatical Annotation Schemes. ICAME Journal 24, 7–23 (2000)Google Scholar
  8. 8.
    Foster, J.: Parsing Ungrammatical Input: An Evaluation Procedure. In: Proceedings of the 4th LREC, Lisbon, Portugal (2004)Google Scholar
  9. 9.
    Bigert, J., Sj, J.Google Scholar
  10. 10.
    Black, E., et al.: A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars. In: Proceedings of the 4th DARPA Speech and Natural Language Workshop, Pacific Grove, California, USA (1991)Google Scholar
  11. 11.
    Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19, 313–330 (1993)Google Scholar
  12. 12.
    Lin, D.: Dependency-Based Evaluation of MINIPAR. In: Proceedings of the 1st LREC, Granada, Spain (1998)Google Scholar
  13. 13.
    Bangalore, S., Doran, C., Hockey, B., Joshi, A.K.: An Approach to Robust Partial Parsing and Evaluation Metrics. In: Proceedings of the Workshop on Robust Parsing at European Summer School in Logic, Language and Information, Prague, Czech Republic (1996)Google Scholar
  14. 14.
    Carroll, J., Briscoe, T., Sanfilippo, A.: Parser Evaluation: A Survey and a New Proposal. In: Proceedings of the 1st LREC, Granada, Spain (1998)Google Scholar
  15. 15.
    Xia, F., Palmer, M.: Converting Dependency Structures to Phrase Structures. In: Proceedings of the 1st Human Language Technology Conference, San Diego, California, USA (2001)Google Scholar
  16. 16.
    Carroll, J., Minnen, G., Briscoe, T.: Corpus Annotation for Parser Evaluation. In: Proceedings of the EACL 1999 Post-Conference Workshop on Linguistically Interpreted Corpora, Bergen, Norway (1999)Google Scholar
  17. 17.
    Kübler, S., Telljohann, H.: Towards a Dependency-Oriented Evaluation for Partial Parsing. In: Proceedings of the Beyond PARSEVAL Workshop at the 3rd LREC, Las Palmas, Gran Canaria, Spain (2002)Google Scholar
  18. 18.
    Clark, S., Hockenmaier, J.: Evaluating a Wide-Coverage CCG Parser. In: Proceedings of the Beyond PARSEVAL Workshop at the 3rd LREC, Las Palmas, Gran Canaria, Spain (2002)Google Scholar
  19. 19.
    Suzuki, H.: Phrase-Based Dependency Evaluation of a Japanese Parser. In: Proceedings of the 5th LREC, Lisbon, Portugal (2004)Google Scholar
  20. 20.
    Atwell, E.: Comparative Evaluation of Grammatical Annotation Models, Rodopi, Amsterdam, The Netherlands (1996)Google Scholar
  21. 21.
    Magerman, D.: Natural Language Parsing as Statistical Pattern Recognition. Ph.D. Thesis, Stanford University, California, USA (1994)Google Scholar
  22. 22.
    Sasaki, F., Witt, A., Metzing, D.: Declarations of Relations, Differences and Transformations between Theory-specific Treebanks: A New Methodology. In: Proceedings of the 2nd Workshop on Treebanks and Linguistic Theories, Växjö, Sweden (2003)Google Scholar
  23. 23.
    Wang, J.N., Chang, J.S., Su, K.Y.: An Automatic Treebank Conversion Algorithm for Corpus Sharing. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, USA (1994)Google Scholar
  24. 24.
    Charniak, E.: Tree-bank Grammars. Technical Report CS-96-02, Brown University, Providence, Rhode Island, USA (1996)Google Scholar
  25. 25.
    Covington, M.A.: A Dependency Parser for Variable-Word-Order Languages. Research Report AI-1990-01, Artificial Intelligence Center, The University of Georgia, Athens, Georgia, USA (1990)Google Scholar
  26. 26.
    Santos, D.: Timber! Issues in Treebank Building and Use. In: Proceedings of the 6th International Workshop on Computational Processing of the Portuguese Language, Faro, Portugal (2003)Google Scholar
  27. 27.
    Black, E.: Evaluation of Broad-Coverage Natural-Language Parsers. Cambridge University Press, Cambridge (1998)Google Scholar
  28. 28.
    Manning, C.D., Carpenter, C.: Probabilistic Parsing Using Left Corner Language Models. In: Proceedings of the 5th International Workshop on Parsing Technologies, Boston, Massachusetts, USA (1995)Google Scholar
  29. 29.
    Prasad, R., Sarkar, A.: Comparing Test-suite Based Evaluation and Corpus-based Evaluation of a Wide Coverage Grammar for English. In: Proceedings of the Using Evaluation within HLT Programs: Results and Trends Workshop at the 2nd LREC, Athens, Greece (2000)Google Scholar
  30. 30.
    Balkan, L., Arnold, D., Fouvry, F.: Test Suites for Evaluation in Natural Language Engineering. In: Proceedings of the 2nd Language Engineering Convention, London, UK (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Tuomo Kakkonen
    • 1
  • Erkki Sutinen
    • 1
  1. 1.University of JoensuuFinland

Personalised recommendations