Skip to main content
Log in

Stochastic HPSG Parse Disambiguation using the Redwoods Corpus

  • Published:
Research on Language and Computation

Abstract

This article details our experiments on HPSG parse disambiguation, based on the Redwoods treebank. Using existing and novel stochastic models, we evaluate the usefulness of different information sources for disambiguation – lexical, syntactic, and semantic. We perform careful comparisons of generative and discriminative models using equivalent features and show the consistent advantage of discriminatively trained models. Our best system performs at over 76% sentence exact match accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • S.P. Abney (1997) ArticleTitleStochastic Attribute-Value Grammars Computational Linguistics. 23 597–618

    Google Scholar 

  • A. Agresti (1990) Categorical Data Analysis Wiley New York

    Google Scholar 

  • Charniak E. (1997) Statistical Parsing with a Context-Free Grammar and Word Statistics. In Proceedings of the 14th National Conference on Artifical Intelligence. Providence, RI, pp. 598–603

  • Charniak E., Carroll G. (1994) Context-Sensitive Statistics for Improved Grammatical Language Models. In Proceedings of the 12th National Conference on Artificial Intelligence. Srattle, WA, pp. 742–747

  • Chen S., Rosenfeld R. (1999) A Gaussian Prior for Smoothing Maximum Entropy Models. Technical Report CMUCS-99-108, Carnegie Mellon

  • Collins M. (1999) Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania

  • Collins M., Brooks J. (1995) Prepositional Attachment Through a Backed-off Model. In Yarovsky D. and Church K. (eds.), Proceeding of the 3rd Workshop on Very Large Corpora. Somerset, New Jersey, pp. 27–38, Association for Computational Linguistics

  • Collins M. J.(1997) Three Generative, Lexicalised Models For Statistical Parsing. In Proceedings of the 35th Meeting of the Association for Computational Linguistics and the 7th Conference of the European Chapter of the ACL. Madrid, Spain, pp. 16–23

  • Copestake A., Flickinger D. P., Sag I. A., Pollard, C. (1999) Minimal Recursion Semantics. An Indroduction. Ms., Stanford University

  • Friedman N., Goldszmidt M.(1996) Learning Bayesian Network with Local Structure. In Proceeding of the 12th Conference on Uncertainty in Artifical Intelligence

  • T.E. Harris (1963) The Theory of Branching Processes Springer Berlin, Germany

    Google Scholar 

  • Hindle D., Rooth M. (1991) Structural Ambiguity and Lexical Relations. In Proceedings of the 29th Meeting of the Association for Computational Linguisitics. pp. 229–236

  • M. Johnson (1998) ArticleTitlePCFG Models of Linguistic Tree Representations Computational Linguisitics. 24 613–632

    Google Scholar 

  • Johnson M., Geman, S., Canon, S., Chi, Z., Riezler, S. (1999) Estimators for Stochastic ’Unification-based’ Grammars. In Proceeding of the 37th Meeting of the Association for Computational Linguistics. College Park, MD, pp. 535–541

  • Kaplan R.M., Bresnan J. (1982) Lexical-Functional Grammar: A Formal System for Grammatical Representation. In: Bresnan J. (ed). The Mental Representation of Grammatical Relations. MIT Press, Cambridge, MA, pp. 173–281

  • Klein D., Manning C. D. (2002) Conditional Structure Versus Conditional Estimation in NLP Models. In EMNLP 2002

  • M.C. MacDonald (1994) ArticleTitleProbabilistic Constraints and Syntactic Ambiguity Resolution Language and Cognitive Processes. 9 157–201

    Google Scholar 

  • Magerman D. M. (1995) Statistical Decision-Tree Models for Parsing. In Proceeding of the 33rd Meeting of the Association for Computational Linguistics

  • C.D. Manning Schütze. (1999) Foundations of Statistical Natural Language Processing MIT Press Cambridge, MA

    Google Scholar 

  • Marciniak M., Mykowiecka A., Przepiórkowski A., Kupść A. (1999) Construction of an HPSG Treebank for Polish. In Journée ATALA, 18–19 juin, Corpus annotés pour la syntaxe. Paris, pp. 97–105

  • M.P. Marcus B. Santorini M.A. Marcinkiewicz (1993) ArticleTitleBulding a Large Annotated Corpus of English: The Penn Treebank Computational Linguistics. 19 313–330

    Google Scholar 

  • Ng A., Jordan M. (2002) On Discriminative Vs. Generative Classififiers: A comparison of logistic regression and Naive Bayes. In NIPS 14.

  • Oepen S., Flickinger D., Toutanova K., Manning C. D. (2004) LinGO Redwoods. A Rich and Dynamic Treebank for HPSG. Journal of Language and Computation.

  • Oepen S., Toutanova K., Shieber S., Manning C., Flickinger D., Brants T. (2002) The LinGo Redwoods Treebank: Motivation and Preliminary applications. In COLING 19

  • C. Pollard I.A. Sag (1994) Head-Driven Phrase Structure Grammar University of Chicago Press Chicago

    Google Scholar 

  • Riezler S., King T. H., Kaplan R. M., Crouch R., Maxwell J. T., III, Johnson M.(2002) Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques. In Proceedings of the 40th Meeting of the Association for Computational Linguistics.

  • I.A. Sag T. Wasow (1999) Syntactic Theory: A Formal Introduction CSLI Publications Stanford, CA

    Google Scholar 

  • Simov K., Osenova P., Slavcheva M., Kolkovka S., Balabanova E., Doikoff D., Ivanova K., Simov A., Kouylekov M. (2002) Building a Linguistically Interpreted Corpus of Bulgarian : The BulTreeBank. In Proceedings of LREC. Canary Islands, Spain, pp. 1729–1736

  • B. Srinivas A.K. Joshi (1999) ArticleTitleSupertagging: An Approach to Almost Parsing Computational Linguistics. 25 237–265

    Google Scholar 

  • Toutanova K., Manning C., Oepen S., Flickinger D.(2003a) Parse Selection on the Redwoods Corpus : 3rd Growth Results. CS Technical Report, Stanford University

  • Toutanova K., Manning C. D. (2002) Feature Selection for a Rich HPSG Grammar Using Decision Trees. In Proceedings of the Sixth Conference on Natural Language Learning (CoNLL-2002)

  • Toutanova K., Manning C. D., Flickinger D., Oepen S. (2002) Parse Disambiguation for a Rich HPSG grammar. In Treebanks and Linguistic Theories. Sozopol, Bulgaria

  • Toutanova K., Mitchell M., Manning C. (2003b) Optimizing Local Probability Models for Satistical Parsing. In Proceeding of the 14th European Conference on Machine Learing (ECML). Dubrovnik, Croatia

  • J.C. Trueswell (1996) ArticleTitleThe role of lexical frequency in syntactic ambiguity resolution Journal of Memory and Language. 35 566–585 Occurrence Handle10.1006/jmla.1996.0030

    Article  Google Scholar 

  • V.N. Vapnik (1998) Statistical Learning Theory Wiley New York

    Google Scholar 

  • I.H. Witten T.C. Bell (1991) ArticleTitleThe zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression IEEE Trans. Inform. Theory. 37 IssueID4 1085–1094 Occurrence Handle10.1109/18.87000

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kristina Toutanova.

About this article

Cite this article

Toutanova, K., Manning, C.D., Flickinger, D. et al. Stochastic HPSG Parse Disambiguation using the Redwoods Corpus. Res Lang Comput 3, 83–105 (2005). https://doi.org/10.1007/s11168-005-1288-y

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11168-005-1288-y

Keywords

Navigation