Advertisement

Research on Language and Computation

, Volume 2, Issue 4, pp 523–547 | Cite as

Evaluating Automatic LFG F-Structure Annotation for the Penn-II Treebank

  • Michael Burke
  • Aoife Cahill
  • Mairéad Mccarthy
  • Ruth O’donovan
  • Josef Van Genabith
  • Andy Way
Article

Abstract

Lexical-Functional Grammar (LFG: Kaplan and Bresnan, 1982; Bresnan, 2001; Dalrymple, 2001) f-structures represent abstract syntactic information approximating to basic predicate-argument-modifier (dependency) structure or simple logical form (van Genabith and Crouch, 1996; Cahill et al., 2003a) . A number of methods have been developed (van Genabith et al., 1999a,b, 2001; Frank, 2000; Sadler et al., 2000; Frank et al., 2003) for automatically annotating treebank resources with LFG f-structure information. Until recently, however, most of this work on automatic f-structure annotation has been applied only to limited data sets, so while it may have shown ‘proof of concept’, it has not yet demonstrated that the techniques developed scale up to much larger data sets. More recent work (Cahill et al., 2002a,b) has presented efforts in evolving and scaling techniques established in these previous papers to the full Penn-II Treebank (Marcus et al., 1994). In this paper, we present a number of quantitative and qualitative evaluation experiments which provide insights into the effectiveness of the techniques developed to automatically derive a set of f-structures for the more than 1,000,000 words and 49,000 sentences of Penn-II. Currently we obtain 94.85% Precision, 95.4% Recall and 95.09% F-Score for preds-only f-structures against a manually encoded gold standard.

Keywords

automatic annotation corpora evaluation LFG treebanks unification grammar 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brants S., Dipper S., Hansen S., Lezius W., Smith G. (2002) The TIGER Treebank. In Proceedings of the Workshop on Treebanks and Linguistic Theories, Sozopol, Bulgaria, pp. 24–41.Google Scholar
  2. Bresnan, J. 2001Lexical-Functional SyntaxBlackwellOxfordGoogle Scholar
  3. Cahill A., McCarthy M., van Genabith J., Way A. (2002a) Parsing Text with PCFGs and Automatic F-Structure Annotation. In Proceedings of the Seventh International Conference on Lexical-Functional Grammar, Athens, Greece, pp. 76–95Google Scholar
  4. Cahill A., McCarthy M., van Genabith J., Way A. (2002b) Automatic Annotation of the Penn Treebank with LFG F-Structure Information. In Proceedings of the LREC Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Language Data, Las Palmas, Canary Islands, Spain, pp. 8–15.Google Scholar
  5. Cahill A., McCarthy M., Burke M., van Genabith J., Way A. (2003a) Deriving Quasi-Logical Forms for the Penn Treebank; Computing Meaning. In Bunt H., Muskens R., Thijsse E. (eds.), Studies in Linguistics and Philosophy. Kluwer Academic Publishers, Dordrecht/Boston/London, in press.Google Scholar
  6. Cahill A., Forst M., McCarthy M., O’Donovan R., Rohrer C., van Genabith J., Way A. (2003b) Treebank-Based Multilingual Unification Grammar Development. In Proceedings of the Workshop on Ideas and Strategies for Multilingual Grammar Development, at the 15th European Summer School in Logic Language and Information, Vienna, Austria, 18th–29th August 2003.Google Scholar
  7. Carroll J., Minnen G., Briscoe T. (1999) Corpus Annotation for Parser Evaluation. In Proceedings of the EACL Workshop on Linguistically Interpreted Corpora (LINC-99), Bergen, Norway, pp. 35–41.Google Scholar
  8. Clement L., Kinyon A. (2003) Generating Parallel Multilingual LFG-TAG Grammars using a MetaGrammar. In Proceedings of the 41st Annual Conference of the Association for Computational Linguistics, Sapporo, Japan, pp. 184–191.Google Scholar
  9. Collins M. (1996) A New Statistical Parser Based on Bigram Lexical Dependencies, In Proceedings of 34th Conference of the Association of Computational Linguistics, Santa Cruz, CA, pp. 184–192.Google Scholar
  10. Crouch R., Kaplan R., King T., Riezler S. (2002) A Comparison of Evaluation Metrics for a Broad Coverage Parser, Beyond PARSEVAL Workshop at 3rd International Conference on Language Resources and Evaluation (LREC’02), Las Palmas, pp. 67–74.Google Scholar
  11. Dalrymple, M. 2001Lexical-Functional GrammarAcademic PressSan Diego, CA, London.Google Scholar
  12. Frank A. (2000) Automatic F-Structure Annotation of Treebank Trees. In Proceedings of the The Fifth International Conference on Lexical-Functional Grammar, CSLI Publications, Stanford, CA, pp. 139–160.Google Scholar
  13. Frank, A., Sadler, L., Genabith, J., Way, A. 2003From Treebank Resources To LFG F-StructuresAbeillé, A. eds. Treebanks: Building and Using Syntactically Annotated CorporaKluwer Academic PublishersDordrecht/Boston/London367389Google Scholar
  14. Hockenmaier J., Steedman M. (2002) Generative Models for Statistical Parsing with Combinatory Categorial Grammar. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA. 2002 pp. 335–342Google Scholar
  15. Kaplan, R., Bresnan, J. 1982Lexical-Functional Grammar: a Formal System for Grammatical RepresentationBresnan, J. eds. The Mental Representation of Grammatical RelationsMIT PressCambridge, MA.173281Google Scholar
  16. Krotov A., Hepple M., Gaizauskas R., Wilks Y. (1998) Compacting the Penn Treebank Grammar. In Proceedings of the 17th International Conference on Computational Linguistics and 36th Conference of the Association for Computational Linguistics, Montreal, Canada, pp. 699–703.Google Scholar
  17. Leech, G., Garside, R. 1991Running a Grammar Factory: On the Compilation of Parsed Corpora, or ‘Treebanks’Johansson, S.Stenström, A. eds. English Computer CorporaMouton de GruyterBerlin1532Google Scholar
  18. Liakata M., Pulman S. (2002) From trees to Predicate-Argument Structures, COLING’02. Proceedings of the Conference, Taipei, Taiwan, pp. 563–569.Google Scholar
  19. Magerman D. (1994) Natural Language Parsing as Statistical Pattern Recognition. PhD Thesis, Stanford University, CA.Google Scholar
  20. Marcus M., Kim G., Marcinkiewicz M. A., MacIntyre R., Ferguson M., Katz K., Schasberger B. (1994) The Penn Treebank: Annotating Predicate Argument Structure. In Proceedings of the ARPA Human Language Technology Workshop, Princeton, NJ.Google Scholar
  21. Riezler S., Kaplan R., King T., Johnson M., Crouch R., Maxwell J. III (2002) Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques. In Proceedings of 40th Conference of the Association for Computational Linguistics, Philadelphia, PA. pp. 271–278.Google Scholar
  22. Sadler L., van Genabith J., Way A. (2000) Automatic F-Structure Annotation from the AP Treebank. In Proceedings of the Fifth International Conference on Lexical-Functional Grammar, CSLI Publications, Stanford, CA, pp. 226–243.Google Scholar
  23. Sampson, G. 1995English for the Computer: The Susanne Corpus and Analytic SchemeClarendon PressOxford, UKGoogle Scholar
  24. van Genabith J., Crouch D. (1996) Direct and Underspecified Interpretations of LFG f-Structures. In COLING 96, Copenhagen, Denmark, Proceedings of the Conference. pp. 262–267.Google Scholar
  25. van Genabith J., Frank A., Way A. (2001) Treebank versus X-Bar based Automatic F-Structure Annotation. In Proceedings of the Sixth International Conference on Lexical-Functional Grammar, CSLI Publications, Stanford, CA, pp. 127–146.Google Scholar
  26. van Genabith J., Sadler L., Way A. (1999a) Data-Driven Compilation of LFG Semantic Forms. In Proceedings of the EACL Workshop on Linguistically Interpreted Corpora (LINC-99), Bergen, Norway, pp. 69–76.Google Scholar
  27. van Genabith J., Way A., Sadler L. (1999b) Semi-Automatic Generation of f-structures from Treebanks. In Proceedings of the Fourth International Conference on Lexical-Functional Grammar, CSLI Publications, Stanford, CA.Google Scholar

Copyright information

© Springer 2005

Authors and Affiliations

  • Michael Burke
    • 1
  • Aoife Cahill
    • 1
  • Mairéad Mccarthy
    • 1
  • Ruth O’donovan
    • 1
  • Josef Van Genabith
    • 1
  • Andy Way
    • 1
  1. 1.School of ComputingDublin City UniversityDublinIreland

Personalised recommendations