Abstract
Parser Evaluation using Textual Entailments (PETE) is a shared task in the SemEval-2010 Evaluation Exercises on Semantic Evaluation. The task involves recognizing textual entailments based on syntactic information alone. PETE introduces a new parser evaluation scheme that is formalism independent, less prone to annotation error, and focused on semantically relevant distinctions. This paper describes the PETE task, gives an error analysis of the top-performing Cambridge system, and introduces a standard entailment module that can be used with any parser that outputs Stanford typed dependencies.
Similar content being viewed by others
Notes
The collapsed and propagated version of Stanford dependencies somewhat mitigates this problem and this is the parser output representation we chose to use as input to the example entailment module of Sect. 7.
Note that some of the difficult constructions, plus noise in the laypeople’s responses meant a large percentage of potential entailments didn’t pass the filter, but nevertheless at a nominal cost we were able to create a dataset where all the entailments were unanimously agreed by 3 people, which is not the case for most other commonly used treebanks.
There were eight POS changes in the development set, most of which did not result in errors on evaluation. Note also that this particular H is ungrammatical English. Recall that the negative H sentences were derived from genuine parser errors; it was not always possible to construct grammatical sentences corresponding to such errors, though we will consider constraining all H sentences to be grammatical in future work.
References
Black, E., Abney, S., Flickenger, D., Gdaniec, C., Grishman, R., Harrison, P., et al. (1991). A procedure for quantitatively comparing the syntactic coverage of english grammars. In Speech and natural language: Proceedings of a workshop, held at Pacific Grove, California, February 19–22, 1991 (p. 306). Los Altos: Morgan Kaufmann.
Bonnema, R., Bod, R., & Scha, R. (1997). A DOP model for semantic interpretation. In Proceedings of the eighth conference on European chapter of the association for computational linguistics, association for computational linguistics (pp. 159–167).
Bos, J., et al. (Eds.). (2008). Proceedings of the workshop on cross-framework and cross-domain parser evaluation, in connection with the 22nd international conference on computational linguistics. http://lingo.stanford.edu/events/08/pe/
Carroll, J., Minnen, G., & Briscoe, T. (1999). Corpus annotation for parser evaluation. In Proceedings of the EACL workshop on linguistically interpreted corpora (LINC).
Cer, D., de Marneffe, M. C., Jurafsky, D., & Manning, C. D. (2010). Parsing to stanford dependencies: Trade-offs between speed and accuracy. In 7th International conference on language resources and evaluation (LREC 2010). http://nlp.stanford.edu/pubs/lrecstanforddeps_final_final.pdf.
Charniak, E., & Johnson, M. (2005). Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the 43rd annual meeting on association for computational linguistics, association for computational linguistics (p. 180).
Clark, S., & Curran, J. (2007). Wide-coverage efficient statistical parsing with CCG and log-linear models. Computational Linguistics, 33(4), 493–552.
Collins, M. (2003). Head-driven statistical models for natural language parsing. Computational Linguistics, 29(4), 589–637.
Dagan, I., Dolan, B., Magnini, B., & Roth, D. (2009). Recognizing textual entailment: Rational, evaluation and approaches. Natural Language Engineering 15(04).
De Marneffe, M., & Manning, C. (2008). Stanford typed dependencies manual. http://nlp.stanford.edu/software/dependencies-manual.pdf.
De Marneffe, M., MacCartney, B., & Manning, C. (2006). Generating typed dependency parses from phrase structure parses. In LREC 2006.
Dickinson, M., & Meurers, W. D. (2003). Detecting inconsistencies in treebanks. In Proceedings of the second workshop on treebanks and linguistic theories (TLT 2003), Växjö, Sweden (pp. 45–56). http://ling.osu.edu/dickinso/papers/dickinson-meurers-tlt03.html.
Dietterich, T. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural computation, 10(7), 1895–1923.
Erk, K., McCarthy, D., & Gaylord, N. (2009). Investigations on word senses and word usages. In Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the afnlp: volume 1-volume 1, association for computational linguistics (pp. 10–18).
Hockenmaier, J. (2003). Data and models for statistical parsing with combinatory categorial grammar. PhD thesis, University of Edinburgh.
Hockenmaier, J., & Steedman, M. (2007). CCGbank: A corpus of CCG derivations and dependency structures extracted from the Penn Treebank. Computational Linguistics, 33(3), 355–396.
King, T., Crouch, R., Riezler, S., Dalrymple, M., & Kaplan, R. (2003). The PARC 700 dependency bank. In Proceedings of the EACL03: 4th international workshop on linguistically interpreted corpora (LINC-03) (pp. 1–8).
Klein, D., & Manning, C. (2003). Accurate unlexicalized parsing. In Proceedings of the 41st annual meeting on association for computational linguistics-volume 1, association for computational linguistics (pp. 423–430).
Marcus, M., Santorini, B., & Marcinkiewicz, M. (1994). Building a large annotated corpus of English: The Penn Treebank. Computational linguistics, 19(2), 313–330.
McCarthy, D., & Navigli, R. (2007). Semeval-2007 task 10: English lexical substitution task. In Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007), association for computational linguistics, Prague, Czech Republic (pp. 48–53). http://www.aclweb.org/anthology/W/W07/W07-2009.
McDonald, R., Pereira, F., Ribarov, K., & Hajic, J. (2005). Non-projective dependency parsing using spanning tree algorithms. In Proceedings of HLT/EMNLP (pp. 523–530).
Minnen, G., Carroll, J., & Pearce, D. (2000). Robust, applied morphological generation. In Proceedings of INLG, Mitzpe Ramon, Israel.
Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., et al. (2007a). The CoNLL 2007 shared task on dependency parsing. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL (Vol. 7, pp. 915–932).
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., et al. (2007b). MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(02), 95–135.
Petrov, S., & Klein, D. (2007). Improved inference for unlexicalized parsing. In Proceedings of NAACL HLT 2007 (pp. 404–411).
Rimell, L., Clark, S., & Steedman, M. (2009). Unbounded dependency recovery for parser evaluation. In Proceedings of the 2009 conference on empirical methods in natural language processing, association for computational linguistics (pp. 813–821).
Snyder, B., & Palmer, M. (2004). The English all-words task. In ACL 2004 Senseval-3 workshop, Barcelona, Spain. http://www.cse.unt.edu/rada/senseval/senseval3/proceedings/pdf/snyder.pdf.
Steedman, M. (2000). The syntactic process. Cambridge, MA: The MIT Press.
Acknowledgments
We would like to thank Stephan Oepen and Anna Mac for their careful analysis and valuable suggestions. Önder Eker and Zehra Turgut contributed to the development of the PETE task. Stephen Clark collaborated on the development of the Cambridge system. We would also like to thank Matthew Honnibal for discussion of the SCHWA system and contribution to the entailment system analysis.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yuret, D., Rimell, L. & Han, A. Parser evaluation using textual entailments. Lang Resources & Evaluation 47, 639–659 (2013). https://doi.org/10.1007/s10579-012-9200-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-012-9200-5