Skip to main content

A Latent Variable Model for Generative Dependency Parsing

  • Chapter
  • First Online:
Trends in Parsing Technology

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 43))

Abstract

Dependency parsing has been a topic of active research in natural language processing during the last several years. The CoNLL-2006 shared task (Buchholz and Marsi, 2006) made a wide selection of standardized treebanks for different languages available for the research community and allowed for easy comparison between various statistical methods on a standardized benchmark. One of the surprising things discovered by this evaluation is that the best results are achieved by methods which are quite different from state-of-the-art models for constituent parsing, e.g. the deterministic parsing method of Nivre et al. (2006) and the minimum spanning tree parser of McDonald et al.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In preliminary experiments, we also considered look-ahead, where the word is predicted earlier than it appears at the head of the queue I, and “anti-look-ahead”, where the word is predicted only when it is shifted to the stack S. Early prediction allows conditioning decision probabilities on the words in the look-ahead and, thus, speeds up the search for an optimal decision sequence. However, the loss of accuracy with look-ahead was quite significant. The described method, where a new word is predicted when it appears at the head of the queue, led to the most accurate model and quite efficient search. The anti-look-ahead model was both less accurate and slower.

  2. 2.

    We refer to the head of the queue as the front, to avoid unnecessary ambiguity of the word head in the context of dependency parsing.

  3. 3.

    The tuned feature sets were obtained from http://w3.msi.vxu.se/~nivre/research/MaltParser.html. We removed lookahead features for ISBN experiments but preserved them for experiments with the MALT parser. Analogously, we extended simple features with 3 words lookahead for the MALT parser experiments.

  4. 4.

    Part-of-speech tags for multi-word units in the Dutch treebank were formed as concatenation of tags of the words, which led to quite a sparse set of part-of-speech tags.

  5. 5.

    Note that the development set accuracy predicted correctly the testing set ranking of ISBN TF, LF and TF-NA models on each of the datasets, so it is fair to compare the best ISBN result among the three with other parsers.

  6. 6.

    The MALT parser is trained to keep the word as long as possible: if both Shift and Reduce decisions are possible during training, it always prefers to shift. Though this strategy should generally reduce the described problem, it is evident from the low precision score for attachment to root that it can not completely eliminate it.

  7. 7.

    Use of cross-validation with our model is relatively time-consuming and, thus, not quite feasible for the shared task.

  8. 8.

    A piecewise-linear approximation for each individual language was used to compute the average. Experiments were run on a standard 2.4 GHz desktop PC.

  9. 9.

    For Basque, Chinese, and Turkish this time is below 7 ms, but for English it is 38 ms. English, along with Catalan, required the largest beam across all ten languages. Note that accuracy in the lowest part of the curve can probably be improved by varying latent vector size and frequency cut-offs. Also, efficiency was not the main goal during the implementation of the parser, and it is likely that a much faster implementation is possible.

  10. 10.

    The ISBN dependency parser is downloadable from http://flake.cs.uiuc.edu/titov/idp/

References

  • Abeillé, A. (Ed.) (2003). Treebanks: Building and Using Parsed Corpora. Dordrecht: Kluwer.

    Google Scholar 

  • Aduriz, I., M. J. Aranzabe, J. M. Arriola, A. Atutxa, A. D. de Ilarraza, A. Garmendia, and M. Oronoz (2003). Construction of a Basque dependency treebank. In Proceedings of the 2nd Workshop on Treebanks and Linguistic Theories (TLT), Växjö, pp. 201–204.

    Google Scholar 

  • Aho, A. V., R. Sethi, and J. D. Ullman (1986). Compilers: Principles, Techniques and Tools. Reading, MA: Addison Wesley.

    Google Scholar 

  • Böhmová, A., J. Hajič, E. Hajičová, and B. Hladká (2003). The PDT: a 3-level annotation scenario. See Abeillé (2003), Chapter 7, pp. 103–127.

  • Bottou, L. (1991). Une approche théoretique de l’apprentissage connexionniste: Applications à la reconnaissance de la parole. Ph. D. thesis, Université de Paris XI, Paris.

    Google Scholar 

  • Buchholz, S. and E. Marsi (2006). CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the 10th Conference on Computational Natural Language Learning, New York, NY, pp. 149–164.

    Google Scholar 

  • Charniak, E. (2000). A maximum-entropy-inspired parser. In Proceedings of the 1st Meeting of North American Chapter of Association for Computational Linguistics, Seattle, WA, pp. 132–139.

    Google Scholar 

  • Charniak, E. and M. Johnson (2005). Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the 43rd Meeting of Association for Computational Linguistics, Ann Arbor, MI, pp. 173–180.

    Google Scholar 

  • Chen, K., C. Luo, M. Chang, F. Chen, C. Chen, C. Huang, and Z. Gao (2003). Sinica treebank: design criteria, representational issues and implementation. See Abeillé (2003), Chapter 13, pp. 231–248.

  • Collins, M. (1999). Head-Driven Statistical Models for Natural Language Parsing. Ph. D. thesis, University of Pennsylvania, Philadelphia, PA.

    Google Scholar 

  • Collins, M. (2000). Discriminative reranking for natural language parsing. In Proceedings of the 17th International Conference on Machine Learning, Stanford, CA, pp. 175–182.

    Google Scholar 

  • Csendes, D., J. Csirik, T. Gyimóthy, and A. Kocsor (2005). The Szeged Treebank. Springer, Berlin/ Heidelberg.

    Google Scholar 

  • Dzeroski, S., T. Erjavec, N. Ledinek, P. Pajas, Z. Zabokrtsky, and A. Zele (2006). Towards a Slovene dependency treebank. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), Genoa, pp. 1388–1391.

    Google Scholar 

  • Hajič, J., O. Smrž, P. Zemánek, J. Šnaidauf, and E. Beška (2004). Prague Arabic dependency treebank: development in data and tools. In Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools, Caira, pp. 110–117.

    Google Scholar 

  • Hall, J., J. Nilsson, J. Nivre, G. Eryigit, B. Megyesi, M. Nilsson, and M. Saers (2007). Single malt or blended? a study in multilingual parser optimization. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, Prague, pp. 933–939.

    Google Scholar 

  • Henderson, J. (2003). Inducing history representations for broad coverage statistical parsing. In Proceedings of the Joint Meeting of North American Chapter of the Association for Computational Linguistics and the Human Language Technology Conference, Edmonton, AB, pp. 103–110.

    Google Scholar 

  • Henderson, J. (2004). Discriminative training of a neural network statistical parser. In Proceedings of the 42nd Meeting of Association for Computational Linguistics, Barcelona, pp. 95–102.

    Google Scholar 

  • Henderson, J., P. Merlo, G. Musillo, and I. Titov (2008). A latent variable model of synchronous parsing for syntactic and semantic dependencies. In Proceedings of the CoNLL-2008 Shared Task, Manchester, pp. 178–182.

    Google Scholar 

  • Henderson, J. and I. Titov (2005). Data-defined kernels for parse reranking derived from probabilistic models. In Proceedings of the 43rd Meeting of Association for Computational Linguistics, Ann Arbor, MI, pp. 181–188.

    Google Scholar 

  • Johansson, R. and P. Nugues (2007). Extended constituent-to-dependency conversion for English. In Proceedings of the 16th Nordic Conference on Computational Linguistics (NODALIDA), Tartu, pp. 105–112.

    Google Scholar 

  • Jordan, M. I., Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. (1999). An introduction to variational methods for graphical models. In M. I. Jordan (Ed.), Learning in Graphical Models, pp. 183–233. Cambridge, MA: MIT Press.

    Google Scholar 

  • Koo, T. and M. Collins (2005). Hidden-variable models for discriminative reranking. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Vancouver, BC, pp. 507–514.

    Google Scholar 

  • Kromann, M. T. (2003). The Danish dependency treebank and the underlying linguistic theory. In Proceedings of the 2nd Workshop on Treebanks and Linguistic Theories (TLT), Vaxjo.

    Google Scholar 

  • Liang, P., S. Petrov, M. Jordan, and D. Klein (2007). The infinite PCFG using hierarchical dirichlet processes. In Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, pp. 688–697.

    Google Scholar 

  • Marcus, M., B. Santorini, and M. Marcinkiewicz (1993). Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19(2), 313–330.

    Google Scholar 

  • Martí, M. A., M. Taulé, L. Màrquez, and M. Bertran (2007). CESS-ECE: a multilingual and multilevel annotated corpus. Available for download from: http://www.lsi.upc.edu/ mbertran/cess-ece/

  • Matsuzaki, T., Y. Miyao, and J. Tsujii (2005). Probabilistic CFG with latent annotations. In Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, MI, pp. 75–82.

    Google Scholar 

  • McDonald, R., K. Lerman, and F. Pereira (2006). Multilingual dependency analysis with a two-stage discriminative parser. In Proceeding of the 10th Conference on Computational Natural Language Learning, New York, NY, pp. 216–220.

    Google Scholar 

  • Montemagni, S., F. Barsotti, M. Battista, N. Calzolari, O. Corazzari, A. Lenci, A. Zampolli, F. Fanciulli, M. Massetani, R. Raffaelli, R. Basili, M. T. Pazienza, D. Saracino, F. Zanzotto, N. Nana, F. Pianesi, and R. Delmonte (2003). Building the Italian Syntactic-Semantic Treebank. See Abeillé (2003), Chapter 11, pp. 189–210.

  • Murphy, K. P. (2002). Dynamic Belief Networks: Representation, Inference and Learning. Ph. D. thesis, University of California, Berkeley, CA.

    Google Scholar 

  • Musillo, G. and P. Merlo (2008). Unlexicalised hidden variable models of split dependency grammars. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Columbus, OH, pp. 213–216.

    Google Scholar 

  • Nakagawa, T. (2007). Multilingual dependency parsing using global features. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 952–956.

    Google Scholar 

  • Neal, R. (1992). Connectionist learning of belief networks. Artificial Intelligence 56, 71–113.

    Article  Google Scholar 

  • Nivre, J., J. Hall, and J. Nilsson (2004). Memory-based dependency parsing. In Proceedings of the 8th Conference on Computational Natural Language Learning, Boston, MA, pp. 49–56.

    Google Scholar 

  • Nivre, J., J. Hall, J. Nilsson, G. Eryigit, and S. Marinov (2006). Pseudo-projective dependency parsing with support vector machines. In Proceedings of the 10th Conference on Computational Natural Language Learning, New York, NY, pp. 221–225.

    Google Scholar 

  • Oflazer, K., B. Say, D. Z. Hakkani-Tür, and G. Tür (2003). Building a Turkish treebank. See Abeillé (2003), Chapter 15, pp. 261–277.

  • Peshkin, L. and V. Savova (2005). Dependency parsing with dynamic Bayesian network. In AAAI, 20th National Conference on Artificial Intelligence, Pittsburgh, PA, pp. 1112–1117.

    Google Scholar 

  • Petrov, S., L. Barrett, R. Thibaux, and D. Klein (2006). Learning accurate, compact, and interpretable tree annotation. In Proceedings of the Annual Meeting of the ACL and the International Conference on Computational Linguistics, Sydney, pp. 433–44.

    Google Scholar 

  • Petrov, S. and D. Klein (2007). Improved inference for unlexicalized parsing. In Proceedings of the Conference on Human Language Technology and North American chapter of the Association for Computational Linguistics (HLT-NAACL 2007), Rochester, NY, pp. 404–411.

    Google Scholar 

  • Prescher, D. (2005). Head-driven PCFGs with latent-head statistics. In Proceedings of the 9th International Workshop on Parsing Technologies, Vancouver, BC, pp. 115–124.

    Google Scholar 

  • Prokopidis, P., E. Desypri, M. Koutsombogera, H. Papageorgiou, and S. Piperidis (2005). Theoretical and practical issues in the construction of a Greek dependency treebank. In Proceedings of the 4th Workshop on Treebanks and Linguistic Theories (TLT), Barcelona, pp. 149–160.

    Google Scholar 

  • Riezler, S., T. H. King, R. M. Kaplan, R. Crouch, J. T. Maxwell, and M. Johnson (2002). Parsing the Wall Street Journal using a Lexical-Functional Grammar and discriminative estimation techniques. In Proceedings of the 40th Meeting of Association for Computational Linguistics, Philadelphia, PA, pp. 271–278.

    Google Scholar 

  • Sallans, B. (2002). Reinforcement Learning for Factored Markov Decision Processes. Ph. D. thesis, University of Toronto, Toronto, ON.

    Google Scholar 

  • Sha, F. and F. Pereira (2003). Shallow parsing with conditional random fields. In Proceedings of the Joint Meeting of North American Chapter of the Association for Computational Linguistics and the Human Language Technology Conference, Edmonton, AB, pp. 213–220.

    Google Scholar 

  • Titov, I. and J. Henderson (2007). Constituent parsing with Incremental Sigmoid Belief Networks. In Proc. 45th Meeting of Association for Computational Linguistics (ACL), Prague, pp. 632–639.

    Google Scholar 

  • van der Beek, L., G. Bouma, J. Daciuk, T. Gaustad, R. Malouf, G. van Noord, R. Prins, and B. Villada (2002). The Alpino dependency treebank. In Computational Linguistic in the Netherlands (CLIN), Enschede, pp. 8–22.

    Google Scholar 

Download references

Acknowledgements

This work was funded by Swiss NSF grant 200020-109685, Swiss NSF Fellowship PBGE22-119276, UK EPSRC grant EP/E019501/1, EU FP6 grant 507802 (TALK project), and EU FP7 grant 216594 (CLASSiC project).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivan Titov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Titov, I., Henderson, J. (2010). A Latent Variable Model for Generative Dependency Parsing. In: Bunt, H., Merlo, P., Nivre, J. (eds) Trends in Parsing Technology. Text, Speech and Language Technology, vol 43. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9352-3_3

Download citation

Publish with us

Policies and ethics