Skip to main content
Log in

Parser evaluation using textual entailments

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Parser Evaluation using Textual Entailments (PETE) is a shared task in the SemEval-2010 Evaluation Exercises on Semantic Evaluation. The task involves recognizing textual entailments based on syntactic information alone. PETE introduces a new parser evaluation scheme that is formalism independent, less prone to annotation error, and focused on semantically relevant distinctions. This paper describes the PETE task, gives an error analysis of the top-performing Cambridge system, and introduces a standard entailment module that can be used with any parser that outputs Stanford typed dependencies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The collapsed and propagated version of Stanford dependencies somewhat mitigates this problem and this is the parser output representation we chose to use as input to the example entailment module of Sect. 7.

  2. http://www.cs.brown.edu/ec/papers/badPars.txt.gz.

  3. Note that some of the difficult constructions, plus noise in the laypeople’s responses meant a large percentage of potential entailments didn’t pass the filter, but nevertheless at a nominal cost we were able to create a dataset where all the entailments were unanimously agreed by 3 people, which is not the case for most other commonly used treebanks.

  4. http://svn.ask.it.usyd.edu.au/trac/candc.

  5. http://www.cis.upenn.edu/~treebank/tokenizer.sed.

  6. There were eight POS changes in the development set, most of which did not result in errors on evaluation. Note also that this particular H is ungrammatical English. Recall that the negative H sentences were derived from genuine parser errors; it was not always possible to construct grammatical sentences corresponding to such errors, though we will consider constraining all H sentences to be grammatical in future work.

References

  • Black, E., Abney, S., Flickenger, D., Gdaniec, C., Grishman, R., Harrison, P., et al. (1991). A procedure for quantitatively comparing the syntactic coverage of english grammars. In Speech and natural language: Proceedings of a workshop, held at Pacific Grove, California, February 19–22, 1991 (p. 306). Los Altos: Morgan Kaufmann.

  • Bonnema, R., Bod, R., & Scha, R. (1997). A DOP model for semantic interpretation. In Proceedings of the eighth conference on European chapter of the association for computational linguistics, association for computational linguistics (pp. 159–167).

  • Bos, J., et al. (Eds.). (2008). Proceedings of the workshop on cross-framework and cross-domain parser evaluation, in connection with the 22nd international conference on computational linguistics. http://lingo.stanford.edu/events/08/pe/

  • Carroll, J., Minnen, G., & Briscoe, T. (1999). Corpus annotation for parser evaluation. In Proceedings of the EACL workshop on linguistically interpreted corpora (LINC).

  • Cer, D., de Marneffe, M. C., Jurafsky, D., & Manning, C. D. (2010). Parsing to stanford dependencies: Trade-offs between speed and accuracy. In 7th International conference on language resources and evaluation (LREC 2010). http://nlp.stanford.edu/pubs/lrecstanforddeps_final_final.pdf.

  • Charniak, E., & Johnson, M. (2005). Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the 43rd annual meeting on association for computational linguistics, association for computational linguistics (p. 180).

  • Clark, S., & Curran, J. (2007). Wide-coverage efficient statistical parsing with CCG and log-linear models. Computational Linguistics, 33(4), 493–552.

    Article  Google Scholar 

  • Collins, M. (2003). Head-driven statistical models for natural language parsing. Computational Linguistics, 29(4), 589–637.

    Article  Google Scholar 

  • Dagan, I., Dolan, B., Magnini, B., & Roth, D. (2009). Recognizing textual entailment: Rational, evaluation and approaches. Natural Language Engineering 15(04).

  • De Marneffe, M., & Manning, C. (2008). Stanford typed dependencies manual. http://nlp.stanford.edu/software/dependencies-manual.pdf.

  • De Marneffe, M., MacCartney, B., & Manning, C. (2006). Generating typed dependency parses from phrase structure parses. In LREC 2006.

  • Dickinson, M., & Meurers, W. D. (2003). Detecting inconsistencies in treebanks. In Proceedings of the second workshop on treebanks and linguistic theories (TLT 2003), Växjö, Sweden (pp. 45–56). http://ling.osu.edu/dickinso/papers/dickinson-meurers-tlt03.html.

  • Dietterich, T. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural computation, 10(7), 1895–1923.

    Article  Google Scholar 

  • Erk, K., McCarthy, D., & Gaylord, N. (2009). Investigations on word senses and word usages. In Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the afnlp: volume 1-volume 1, association for computational linguistics (pp. 10–18).

  • Hockenmaier, J. (2003). Data and models for statistical parsing with combinatory categorial grammar. PhD thesis, University of Edinburgh.

  • Hockenmaier, J., & Steedman, M. (2007). CCGbank: A corpus of CCG derivations and dependency structures extracted from the Penn Treebank. Computational Linguistics, 33(3), 355–396.

    Article  Google Scholar 

  • King, T., Crouch, R., Riezler, S., Dalrymple, M., & Kaplan, R. (2003). The PARC 700 dependency bank. In Proceedings of the EACL03: 4th international workshop on linguistically interpreted corpora (LINC-03) (pp. 1–8).

  • Klein, D., & Manning, C. (2003). Accurate unlexicalized parsing. In Proceedings of the 41st annual meeting on association for computational linguistics-volume 1, association for computational linguistics (pp. 423–430).

  • Marcus, M., Santorini, B., & Marcinkiewicz, M. (1994). Building a large annotated corpus of English: The Penn Treebank. Computational linguistics, 19(2), 313–330.

    Google Scholar 

  • McCarthy, D., & Navigli, R. (2007). Semeval-2007 task 10: English lexical substitution task. In Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007), association for computational linguistics, Prague, Czech Republic (pp. 48–53). http://www.aclweb.org/anthology/W/W07/W07-2009.

  • McDonald, R., Pereira, F., Ribarov, K., & Hajic, J. (2005). Non-projective dependency parsing using spanning tree algorithms. In Proceedings of HLT/EMNLP (pp. 523–530).

  • Minnen, G., Carroll, J., & Pearce, D. (2000). Robust, applied morphological generation. In Proceedings of INLG, Mitzpe Ramon, Israel.

  • Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., et al. (2007a). The CoNLL 2007 shared task on dependency parsing. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL (Vol. 7, pp. 915–932).

  • Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., et al. (2007b). MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(02), 95–135.

    Google Scholar 

  • Petrov, S., & Klein, D. (2007). Improved inference for unlexicalized parsing. In Proceedings of NAACL HLT 2007 (pp. 404–411).

  • Rimell, L., Clark, S., & Steedman, M. (2009). Unbounded dependency recovery for parser evaluation. In Proceedings of the 2009 conference on empirical methods in natural language processing, association for computational linguistics (pp. 813–821).

  • Snyder, B., & Palmer, M. (2004). The English all-words task. In ACL 2004 Senseval-3 workshop, Barcelona, Spain. http://www.cse.unt.edu/rada/senseval/senseval3/proceedings/pdf/snyder.pdf.

  • Steedman, M. (2000). The syntactic process. Cambridge, MA: The MIT Press.

    Google Scholar 

Download references

Acknowledgments

We would like to thank Stephan Oepen and Anna Mac for their careful analysis and valuable suggestions. Önder Eker and Zehra Turgut contributed to the development of the PETE task. Stephen Clark collaborated on the development of the Cambridge system. We would also like to thank Matthew Honnibal for discussion of the SCHWA system and contribution to the entailment system analysis.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deniz Yuret.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuret, D., Rimell, L. & Han, A. Parser evaluation using textual entailments. Lang Resources & Evaluation 47, 639–659 (2013). https://doi.org/10.1007/s10579-012-9200-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-012-9200-5

Keywords

Navigation