Skip to main content

Dependency Syntax Analysis Using Grammar Induction and a Lexical Categories Precedence System

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6608))

  • 2235 Accesses

Abstract

The unsupervised approach for syntactic analysis tries to discover the structure of the text using only raw text. In this paper we explore this approach using Grammar Inference Algorithms. Despite of still having room for improvement, our approach tries to minimize the effect of the current limitations of some grammar inductors by adding morphological information before the grammar induction process, and a novel system for converting a shallow parse to dependencies, which reconstructs information about inductor’s undiscovered heads by means of a lexical categories precedence system. The performance of our parser, which needs no syntactic tagged resources or rules, trained with a small corpus, is 10% below to that of commercial semi-supervised dependency analyzers for Spanish, and comparable to the state of the art for English.

We thank the support of SNI, SIP-IPN, COFAA-IPN, and PIFI-IPN, CONACYT; and the Japanese Government; the first author is a JSPS fellow. The third author is a Visiting Scholar at Waseda University and specifically acknowledges support of SIP-20100773 grant, CONACYT 50206-H grant, and CONACYT scholarship for Sabbatical stay 2010.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Atserias, J., y Rodríguez, H.: TACAT: TAgged Corpus Text Analyzer, Technical Report LSI-UPC RT-2-98 (1998)

    Google Scholar 

  • Brants, T.: TNT — A Statistical Part-of-Speech Tagger. In: ANLP 2000, 6th Applied NLP Conference, Seattle, Washington, USA (2000)

    Google Scholar 

  • Briscoe, T., Carroll, J., Graham, J., Copestake, A.: Relational evaluation schemes. In: Procs. of the Beyond PARSEVAL Workshop at the 3rd International Conf. on Language Resources and Evaluation, Gran Canaria, pp. 4–8 (2002)

    Google Scholar 

  • Briscoe, T., Waegner, N.: Generalized probabilistic LR parsing of natural language (corpora) with unification-based grammars. Computational Linguistics 19, 25–69 (1993)

    Google Scholar 

  • Brooks, D.J.: Unsupervised Grammar Induction by Distribution and Attachment. In: Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL-X), pp. 117–124. Association for Computational Linguistics, New York City (2006)

    Google Scholar 

  • Calvo, H., Gelbukh, A.: Automatic Semantic Role Labeling using Selectional Preferences with Very Large Corpora. Computación y Sistemas 12(1), 128–150 (2008)

    Google Scholar 

  • Calvo, H., Gelbukh, A.: DILUCT: An Open-Source Spanish Dependency Parser based on Rules, Heuristics, and Selectional Preferences. In: NLDB 2006, pp. 164–175 (2006)

    Google Scholar 

  • Charniak, E.: A Maximum-Entropy-Inspired Parser. In: NAACL 2000, pp. 132–139 (2000)

    Google Scholar 

  • Charniak, E.: Statistical techniques for natural language parsing. AI Magazine 18, 33–43 (1997)

    Google Scholar 

  • Civit, M., Antònia Martí, M., Bufí, N.: From Constituents to Dependencies. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 141–152. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  • Cohen, S.B., Gimpel, K., Smith, N.A.: Unsupervised Bayesian Parameter Estimation for Dependency Parsing. In: Advances in NIPS 22 (2008)

    Google Scholar 

  • Collins, M.: Head-Driven Statistical Models for Natural Language Parsing, Ph.D. thesis, University of Pennsylvania (1999)

    Google Scholar 

  • Dörnenburg, E.: Extension of the EMILE algorithm for inductive learning of context-free grammars for natural languages, Master’s Thesis, University of Dortmund (1997)

    Google Scholar 

  • Gambino, O.J., Calvo, H.: On the usage of morphological tags for grammar induction. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 912–921. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  • Gelbukh, A., Calvo, H., Torres, S.: Transforming a constituency Treebank into a dependency Treebank. Procesamiento Del Lenguaje Natural 35, 145–152 (2005)

    Google Scholar 

  • Genabith, J., van, A., Frank, A.: Way. Treebank vs. Xbar-based Automatic F-Structure Anotation. In: Proceedings of the LFG 2001 Conference. University of Hong Kong, CSLI Publications, Hong Kong (2001)

    Google Scholar 

  • Genthial, D., Courtin, J., Kowarski, I.: Contribution of a Category Hierarchy to the Robustness of Syntactic Parsing. In: COLING 1990, pp. 139–144 (1990)

    Google Scholar 

  • Gold, E.M.: Language Identification in the Limit. Information and Control 10(5), 447–474 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  • Gorla, J., Goyal, A., Sangal, R.: Two Approaches for Building an Unsupervised Dependency Parser and their Other Applications. In: AAAI 2007, pp. 1860–1861 (2007)

    Google Scholar 

  • Hengeveld, K.: Parts of speech. In: Fortescue, M., Harder, P., Kristoffersen, L. (eds.) Layered Structure and Reference in a Functional Perspective, Benjamins, Amsterdam, pp. 29–56 (1992)

    Google Scholar 

  • Yamada, H., Matsumoto, Y.: Statistical dependency analysis with support vector machines. In: Procs. of the 8th International Workshop on Parsing Technologies (IWPT), pp. 195–206 (2003)

    Google Scholar 

  • Klein, D., Manning, C.: Corpus-based induction of syntactic structure: Models of dependency and constituency. In: Proceedings of the ACL (2004)

    Google Scholar 

  • de Marneffe, M.-C., MacCartney, B., Manning, C.D.: Generating Typed Dependency Parses from Phrase Structure Parses. In: Proceedings of LREC 2006 (2006)

    Google Scholar 

  • McDonald, R., Lerman, K., Pereira, F.: Multilingual dependency analysis with a two-stage discriminative parser. In: Proceedings of the CoNLL (2006)

    Google Scholar 

  • McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Non-projective dependency parsing using spanning tree algorithms. In: Proceedings of the HLT/EMNLP (2005a)

    Google Scholar 

  • McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of the ACL (2005b)

    Google Scholar 

  • McDonald, R., Satta, G.: On the complexity of non-projective data-driven dependency parsing. In: Proceedings of the IWPT (2007)

    Google Scholar 

  • van Zaanen, M., Adriaans, P.: Alignment-Based Learning versus EMILE: A Comparison. In: Proceedings of the Belgian-Dutch Conference on Artificial Intelligence (BNAIC), Amsterdam, The Netherlands, pp. 315–322 (2001)

    Google Scholar 

  • Morales-Carrasco, R., Gelbukh, A.: Evaluation of TnT Tagger for Spanish. In: Fourth Mexican International Conference on Computer Science ENC 2003, Tlaxcala, México, pp. 18–28 (2003)

    Google Scholar 

  • Navarro, B., Civit, M., Antonia Martí, M., Marcos, R., Fernández, B.: Syntactic, semantic and pragmatic annotation in Cast3LB. In: Shallow Processing of Large Corpora (SProLaC), a Workshop of Corpus Linguistics, Lancaster, UK (2003)

    Google Scholar 

  • Ninio, A.: A proposal for the adoption of dependency grammar as the framework for the study of language acquisition. Honor of Shlomo Kugelmass, pp. 85–103 (1996)

    Google Scholar 

  • Paskin, M.A.: Cubic-time parsing and learning algorithms for grammatical bigram models, Technical Report, UCB/CSD-01-1148, Computer Science Division, University of California Berkeley (2001)

    Google Scholar 

  • Paskin, M.A.: Grammatical Bigrams. In: Advances in Neural Information Processing Systems, vol. 14, MIT Press, Cambridge (2001)

    Google Scholar 

  • Pereira, F., Schabes, Y.: Inside-outside reestimation from partially bracketed corpora. In: 27th Annual Meeting of the Association for Computational Linguistics, ACL, pp. 128–135 (1992)

    Google Scholar 

  • Robinson, J.J.: Methods for obtaining corresponding phrase structure and dependency grammars. In: Proceedings of the 1967 Conference on Computational Linguistics, pp. 1–25 (1967)

    Google Scholar 

  • Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 149–164 (2006)

    Google Scholar 

  • Sagae, K., Lavie, A.: Parser combination by reparsing. In: Proceedings of the HLT/NAACL (2006)

    Google Scholar 

  • Sampson, G.: English for the Computer, The SUSANNE Corpus and analytic scheme. Clarendon Press, Oxford (1995)

    Google Scholar 

  • Smith, N., Eisner, J.: Guiding unsupervised grammar induction using contrastive estimation. In: Working Notes of the International Joint Conference on Artificial Intelligence Workshop on Grammatical Inference Applications (2005)

    Google Scholar 

  • Tapanainen, P., Järvinen, T.: A non-projective dependency parser. In: Proceedings of the 5th Conference on Applied Natural Language Processing, Washington, D.C., pp. 64–71 (1997)

    Google Scholar 

  • van Zaanen, M.: Bootstrapping Structure into Language: Alignment-Based Learning, PhD Thesis, School of Computing, University of Leeds (2002)

    Google Scholar 

  • Cheng, Y., Asahara, M., Matsumoto, Y.: Multi-lingual Dependency Parsing at NAIST, CONLL-X, Nara Institute of Science and Technology (2006)

    Google Scholar 

  • Yuret, D.: Discovery of linguistic relations using lexical attraction, Ph.D. thesis, MIT (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Calvo, H., Gambino, O.J., Gelbukh, A., Inui, K. (2011). Dependency Syntax Analysis Using Grammar Induction and a Lexical Categories Precedence System. In: Gelbukh, A.F. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19400-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19400-9_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19399-6

  • Online ISBN: 978-3-642-19400-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics