Abstract
This article details our experiments on HPSG parse disambiguation, based on the Redwoods treebank. Using existing and novel stochastic models, we evaluate the usefulness of different information sources for disambiguation – lexical, syntactic, and semantic. We perform careful comparisons of generative and discriminative models using equivalent features and show the consistent advantage of discriminatively trained models. Our best system performs at over 76% sentence exact match accuracy.
Similar content being viewed by others
References
S.P. Abney (1997) ArticleTitleStochastic Attribute-Value Grammars Computational Linguistics. 23 597–618
A. Agresti (1990) Categorical Data Analysis Wiley New York
Charniak E. (1997) Statistical Parsing with a Context-Free Grammar and Word Statistics. In Proceedings of the 14th National Conference on Artifical Intelligence. Providence, RI, pp. 598–603
Charniak E., Carroll G. (1994) Context-Sensitive Statistics for Improved Grammatical Language Models. In Proceedings of the 12th National Conference on Artificial Intelligence. Srattle, WA, pp. 742–747
Chen S., Rosenfeld R. (1999) A Gaussian Prior for Smoothing Maximum Entropy Models. Technical Report CMUCS-99-108, Carnegie Mellon
Collins M. (1999) Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania
Collins M., Brooks J. (1995) Prepositional Attachment Through a Backed-off Model. In Yarovsky D. and Church K. (eds.), Proceeding of the 3rd Workshop on Very Large Corpora. Somerset, New Jersey, pp. 27–38, Association for Computational Linguistics
Collins M. J.(1997) Three Generative, Lexicalised Models For Statistical Parsing. In Proceedings of the 35th Meeting of the Association for Computational Linguistics and the 7th Conference of the European Chapter of the ACL. Madrid, Spain, pp. 16–23
Copestake A., Flickinger D. P., Sag I. A., Pollard, C. (1999) Minimal Recursion Semantics. An Indroduction. Ms., Stanford University
Friedman N., Goldszmidt M.(1996) Learning Bayesian Network with Local Structure. In Proceeding of the 12th Conference on Uncertainty in Artifical Intelligence
T.E. Harris (1963) The Theory of Branching Processes Springer Berlin, Germany
Hindle D., Rooth M. (1991) Structural Ambiguity and Lexical Relations. In Proceedings of the 29th Meeting of the Association for Computational Linguisitics. pp. 229–236
M. Johnson (1998) ArticleTitlePCFG Models of Linguistic Tree Representations Computational Linguisitics. 24 613–632
Johnson M., Geman, S., Canon, S., Chi, Z., Riezler, S. (1999) Estimators for Stochastic ’Unification-based’ Grammars. In Proceeding of the 37th Meeting of the Association for Computational Linguistics. College Park, MD, pp. 535–541
Kaplan R.M., Bresnan J. (1982) Lexical-Functional Grammar: A Formal System for Grammatical Representation. In: Bresnan J. (ed). The Mental Representation of Grammatical Relations. MIT Press, Cambridge, MA, pp. 173–281
Klein D., Manning C. D. (2002) Conditional Structure Versus Conditional Estimation in NLP Models. In EMNLP 2002
M.C. MacDonald (1994) ArticleTitleProbabilistic Constraints and Syntactic Ambiguity Resolution Language and Cognitive Processes. 9 157–201
Magerman D. M. (1995) Statistical Decision-Tree Models for Parsing. In Proceeding of the 33rd Meeting of the Association for Computational Linguistics
C.D. Manning Schütze. (1999) Foundations of Statistical Natural Language Processing MIT Press Cambridge, MA
Marciniak M., Mykowiecka A., Przepiórkowski A., Kupść A. (1999) Construction of an HPSG Treebank for Polish. In Journée ATALA, 18–19 juin, Corpus annotés pour la syntaxe. Paris, pp. 97–105
M.P. Marcus B. Santorini M.A. Marcinkiewicz (1993) ArticleTitleBulding a Large Annotated Corpus of English: The Penn Treebank Computational Linguistics. 19 313–330
Ng A., Jordan M. (2002) On Discriminative Vs. Generative Classififiers: A comparison of logistic regression and Naive Bayes. In NIPS 14.
Oepen S., Flickinger D., Toutanova K., Manning C. D. (2004) LinGO Redwoods. A Rich and Dynamic Treebank for HPSG. Journal of Language and Computation.
Oepen S., Toutanova K., Shieber S., Manning C., Flickinger D., Brants T. (2002) The LinGo Redwoods Treebank: Motivation and Preliminary applications. In COLING 19
C. Pollard I.A. Sag (1994) Head-Driven Phrase Structure Grammar University of Chicago Press Chicago
Riezler S., King T. H., Kaplan R. M., Crouch R., Maxwell J. T., III, Johnson M.(2002) Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques. In Proceedings of the 40th Meeting of the Association for Computational Linguistics.
I.A. Sag T. Wasow (1999) Syntactic Theory: A Formal Introduction CSLI Publications Stanford, CA
Simov K., Osenova P., Slavcheva M., Kolkovka S., Balabanova E., Doikoff D., Ivanova K., Simov A., Kouylekov M. (2002) Building a Linguistically Interpreted Corpus of Bulgarian : The BulTreeBank. In Proceedings of LREC. Canary Islands, Spain, pp. 1729–1736
B. Srinivas A.K. Joshi (1999) ArticleTitleSupertagging: An Approach to Almost Parsing Computational Linguistics. 25 237–265
Toutanova K., Manning C., Oepen S., Flickinger D.(2003a) Parse Selection on the Redwoods Corpus : 3rd Growth Results. CS Technical Report, Stanford University
Toutanova K., Manning C. D. (2002) Feature Selection for a Rich HPSG Grammar Using Decision Trees. In Proceedings of the Sixth Conference on Natural Language Learning (CoNLL-2002)
Toutanova K., Manning C. D., Flickinger D., Oepen S. (2002) Parse Disambiguation for a Rich HPSG grammar. In Treebanks and Linguistic Theories. Sozopol, Bulgaria
Toutanova K., Mitchell M., Manning C. (2003b) Optimizing Local Probability Models for Satistical Parsing. In Proceeding of the 14th European Conference on Machine Learing (ECML). Dubrovnik, Croatia
J.C. Trueswell (1996) ArticleTitleThe role of lexical frequency in syntactic ambiguity resolution Journal of Memory and Language. 35 566–585 Occurrence Handle10.1006/jmla.1996.0030
V.N. Vapnik (1998) Statistical Learning Theory Wiley New York
I.H. Witten T.C. Bell (1991) ArticleTitleThe zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression IEEE Trans. Inform. Theory. 37 IssueID4 1085–1094 Occurrence Handle10.1109/18.87000
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Toutanova, K., Manning, C.D., Flickinger, D. et al. Stochastic HPSG Parse Disambiguation using the Redwoods Corpus. Res Lang Comput 3, 83–105 (2005). https://doi.org/10.1007/s11168-005-1288-y
Issue Date:
DOI: https://doi.org/10.1007/s11168-005-1288-y