Favor Short Dependencies: Parsing with Soft and Hard Constraints on Dependency Length

Chapter
Part of the Text, Speech and Language Technology book series (TLTB, volume 43)

Abstract

Many modern parsers identify the head word of each constituent they find. This makes it possible to identify the word-to-word dependencies implicit in a parse. Some parsers, known as dependency parsers, even return these dependencies as their primary output. Why bother to identify dependencies? The typical reason is to model the fact that some word pairs are more likely than others to engage in a dependency relationship.

Keywords

Regular Language Soft Constraint Hard Constraint Tree Dependency Dependency Length 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

This work was supported by NSF ITR grant IIS-0313193 to the first author and a fellowship from the Fannie and John Hertz Foundation to the second author. The views expressed are not necessarily endorsed by the sponsors. The authors thank Mark Johnson, Eugene Charniak, Charles Schafer, Keith Hall, and John Hale for helpful discussion and Elliott Drébek and Markus Dreyer for insights on (respectively) Chinese and German parsing. They also thank an anonymous reviewer for suggesting the German experiments.

References

  1. Abney, S.P. (1991). Parsing by chunks. In R. Berwick, S. Abney, and C. Tenny (eds.), Principle-Based Parsing: Computation and Psycholinguistics. Dordrecht: Kluwer.Google Scholar
  2. Appelt, D.E., J.R. Hobbs, J. Bear, D. Israel, and M. Tyson (1993). FASTUS: A finite-state processor for information extraction from real-world text. In Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI), Chambery, pp. 1172–1178.Google Scholar
  3. Bangalore, S. and A.K. Joshi (1999). Supertagging: an approach to almost parsing. Computational Linguistics 25(2), 237–265.Google Scholar
  4. Bertsch, E. and M.-J. Nederhof (1999). Regular closure of deterministic languages. SIAM Journal on Computing 29(1), 81–102.CrossRefGoogle Scholar
  5. Bikel, D. (2004). A distributional analysis of a lexicalized statistical parsing model. In Proceedings of 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), Barcelona.Google Scholar
  6. Caraballo, S.A. and E. Charniak (1998). New figures of merit for best-first probabilistic chart parsing. Computational Linguistics 24(2), 275–98.Google Scholar
  7. Charniak, E., S. Goldwater, and M. Johnson (1998). Edge-based best-first chart parsing. In Proceedings of 6th Workshop on Very Large Corpora, Montreal, pp. 127–133.Google Scholar
  8. Charniak, E. and M. Johnson (2005). Coarse-to-fine n-best parsing and maxent discriminative reranking. In Proceedings of 43rd Annual Meeting of the Association for Computational Linguistics (ACL), Ann Arbor, Michigan, pp. 173–180.Google Scholar
  9. Chelba, C. and F. Jelinek (2000). Structured language modeling. Computer Speech and Language 14, 283–332.CrossRefGoogle Scholar
  10. Chen, S. (1995). Bayesian grammar induction for language modeling. In Proceedings of 33rd Annual Meeting of the Association for Computational Linguistics (ACL), Cambridge, Massachussetts, pp. 228–235.Google Scholar
  11. Chiang, D. (2005). A hierarchical phrase-based model for statistical machine translation. In Proceedings of 43rd Annual Meeting of the Association for Computational Linguistics (ACL), Ann Arbor, Michigan, pp. 263–270.Google Scholar
  12. Church, K.W. (1980). On memory limitations in natural language processing. Master’s thesis, MIT.Google Scholar
  13. Collins, M. (1997). Three generative, lexicalised models for statistical parsing. In Proceedings of 33rd Annual Meeting of the Association for Computational Linguistics (ACL), Madrid, pp. 16–23.Google Scholar
  14. Dreyer, M., D.A. Smith, and N.A. Smith (2006). Vine parsing and minimum risk reranking for speed and precision. In Proceedings of 10th Conference on Computational Natural Language Learning, New York.Google Scholar
  15. Eisner, J. (2000). Bilexical grammars and their cubic-time parsing algorithms. In H. Bunt and A. Nijholt (eds.), Advances in Probabilistic and Other Parsing Technologies. Dordrecht: Kluwer, pp. 29–61.CrossRefGoogle Scholar
  16. Eisner, J. and J. Blatz (2007). Program transformations for optimization of parsing algorithms and other weighted logic programs. In Proceedings of FG.Google Scholar
  17. Eisner, J., E. Goldlust, and N.A. Smith (2005). Compiling Comp Ling: practical weighted dynamic programming and the Dyna language. In Proceedings of Human Language Technology and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), Vancouver, pp. 281–290.Google Scholar
  18. Eisner, J. and G. Satta (1999). Efficient parsing for bilexical CFGs and head automaton grammars. In Proceedings of 37th Annual Meeting of the Association for Computational Linguistics (ACL), University of Maryland, pp. 457–480.Google Scholar
  19. Eisner, J. and N.A. Smith (2005). Parsing with soft and hard constraints on dependency length. In Proceedings of the 9th International Workshop on Parsing Technologies (IWPT), Vancouver, pp. 30–41.Google Scholar
  20. Frazier, L. (1979). On Comprehending Sentences: Syntactic Parsing Strategies. Ph. D. thesis, University of Massachusetts.Google Scholar
  21. Gibson, E. (1998). Linguistic complexity: Locality of syntactic dependencies. Cognition 68, 1–76.CrossRefGoogle Scholar
  22. Gildea, D. and D. Temperley (2007). Optimizing grammars for minimum dependency length. In Proceedings of 45th Annual Meeting of the Association for Computational Linguistics (ACL), Prague, pp. 184–191.Google Scholar
  23. Goodman, J. (1999). Semiring parsing. Computational Linguistics 25(4), 573–605.Google Scholar
  24. Grefenstette, G. (1996). Light parsing as finite-state filtering. In Proceedings of the ECAI Workshop on Extended Finite-State Models of Language, Budapest, pp. 20–25.Google Scholar
  25. Hall, K. (2007). k-best spanning tree parsing. In Proceedings of 45th Annual Meeting of the Association for Computational Linguistics (ACL), Prague, pp. 392–399.Google Scholar
  26. Hawkins, J. (1994). A Performance Theory of Order and Constituency. Cambridge: Cambridge University Press.Google Scholar
  27. Hindle, D. (1990). Noun classification from predicate-argument structure. In Proceedings of 28th Annual Meeting of the Association for Computational Linguistics (ACL), Pittsburgh, pp. 268–275.Google Scholar
  28. Hobbs, J.R. and J. Bear (1990). Two principles of parse preference. In Proceedings of the 13th International Conference on Computational Linguistics (COLING), Helsinki, pp. 162–167.Google Scholar
  29. Klein, D. and C.D. Manning (2003a). A\(^\ast\) parsing: Fast exact viterbi parse selection. In Proceedings of the Conference on Human Language Technology and the North Amercan Chapter of the Association for Computational Linguistics (HLT-NAACL), Edmonton, pp. 40–47.Google Scholar
  30. Klein, D. and C.D. Manning (2003b). Accurate unlexicalized parsing. In Proceedings of 41st Annual Meeting of the Association for Computational Linguistics (ACL), Sapporo, pp. 423–430.Google Scholar
  31. Klein, D. and C.D. Manning (2003c). Fast exact inference with a factored model for natural language parsing. In S. Becker, S. Thrun, and K.Obermayer (eds.), Advances in Neural Information Processing Systems (NIPS 2002), MIT Press, Cambridge, MA, pp. 3–10.Google Scholar
  32. Klein, D. and C.D. Manning (2004). Corpus-based induction of syntactic structure: models of dependency and constituency. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Barcelona, pp. 479–486.Google Scholar
  33. Liu, Y., A. Stolcke, E. Shriberg, and M. Harper (2005). Using conditional random fields for sentence boundary detection in speech. In Proceedings of 43rd Annual Meeting of the Association for Computational linguistics (ACL), Ann Arbor, Michigan, pp. 451–458.Google Scholar
  34. McDonald, R., K. Crammer, and F. Pereira (2005). Online large-margin training of dependency parsers. In Proceedings of 43rd Annual Meeting of the Association for Computational Linguistics (ACL), Ann Arbor, Michigan, pp. 91–98.Google Scholar
  35. McDonald, R., F. Pereira, K. Ribarov, and J. Hajič (2005). Non-projective dependency parsing using spanning tree algorithms. In Proceedings of Human Language Technology and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), Vancouver, pp. 523–530.Google Scholar
  36. Miyao, Y. and J. Tsujii (2002). Maximum entropy estimation for feature forests. In Proceedings of the Conference on Human Language Technology (HLT), Edmonton, pp. 1–8.Google Scholar
  37. Nederhof, M.-J. (2000). Practical experiments with regular approximation of context-free languages. Computational Linguistics 26(1), 17–44.CrossRefGoogle Scholar
  38. Nederhof, M.-J. (2003). Weighted deductive parsing and Knuth’s algorithm. Computational Linguistics 29(1), 135–143.CrossRefGoogle Scholar
  39. Reynar, J.C. and A. Ratnaparkhi (1997). A maximum entropy approach to identifying sentence boundaries. In Proceedings of the 5th Applied Natural Language Conference, Washington, pp. 16–19.Google Scholar
  40. Schafer, C. and D. Yarowsky (2003). A two-level syntax-based approach to Arabic-English statistical machine translation. In Proceedings of the Workshop on Machine Translation for Semitic Languages, New Orleans.Google Scholar
  41. Shieber, S. and Y. Schabes (1990). Synchronous tree adjoining grammars. In Proceedings of the 13th International Conference on Computational Linguistics (COLING), Helsinki.Google Scholar
  42. Sikkel, K. (1997). Parsing Schemata: A Framework for Specification and Analysis of Parsing Algorithms. Texts in Theoretical Computer Science. Berlin, Heidelberg, New York: Springer.Google Scholar
  43. Smith, N.A. (2006). Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text. Ph. D. thesis, Johns Hopkins University.Google Scholar
  44. Smith, N.A. and J. Eisner (2005). Contrastive estimation: Training log-linear models on unlabeled data. In Proceedings of 43rd Annual Meeting of the Association for Computational Linguistics (ACL), Ann Arbor, Michigan, pp. 354–362.Google Scholar
  45. Smith, N.A. and J. Eisner (2006). Annealing structural bias in multilingual weighted grammar induction. In Proceedings of 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL), Sydney, pp. 569–576.Google Scholar
  46. Stolcke, A. (1995). An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational Linguistics 21(2), 165–201.Google Scholar
  47. Tarjan, R.E. (1977). Finding optimum branchings. Networks 7(1), 25–35.CrossRefGoogle Scholar
  48. Taskar, B., D. Klein, M. Collins, D. Koller, and C. Manning (2004). Max-margin parsing. In Proceedings of 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), Barcelona, pp. 1–8.Google Scholar
  49. Temperley, D. (2007). Minimization of dependency length in written English. Cognition 105, 300–333.CrossRefGoogle Scholar
  50. Turian, J. and I.D. Melamed (2006). Advances in discriminative parsing. In Proceedings of 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL), Sydney, pp. 873–880.Google Scholar
  51. Wu, D. (1997). Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics 23(3), 377–404.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  1. 1.Johns Hopkins UniversityBaltimoreUSA
  2. 2.Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations