Abstract
Dependency parsing has been a topic of active research in natural language processing during the last several years. The CoNLL-2006 shared task (Buchholz and Marsi, 2006) made a wide selection of standardized treebanks for different languages available for the research community and allowed for easy comparison between various statistical methods on a standardized benchmark. One of the surprising things discovered by this evaluation is that the best results are achieved by methods which are quite different from state-of-the-art models for constituent parsing, e.g. the deterministic parsing method of Nivre et al. (2006) and the minimum spanning tree parser of McDonald et al.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In preliminary experiments, we also considered look-ahead, where the word is predicted earlier than it appears at the head of the queue I, and “anti-look-ahead”, where the word is predicted only when it is shifted to the stack S. Early prediction allows conditioning decision probabilities on the words in the look-ahead and, thus, speeds up the search for an optimal decision sequence. However, the loss of accuracy with look-ahead was quite significant. The described method, where a new word is predicted when it appears at the head of the queue, led to the most accurate model and quite efficient search. The anti-look-ahead model was both less accurate and slower.
- 2.
We refer to the head of the queue as the front, to avoid unnecessary ambiguity of the word head in the context of dependency parsing.
- 3.
The tuned feature sets were obtained from http://w3.msi.vxu.se/~nivre/research/MaltParser.html. We removed lookahead features for ISBN experiments but preserved them for experiments with the MALT parser. Analogously, we extended simple features with 3 words lookahead for the MALT parser experiments.
- 4.
Part-of-speech tags for multi-word units in the Dutch treebank were formed as concatenation of tags of the words, which led to quite a sparse set of part-of-speech tags.
- 5.
Note that the development set accuracy predicted correctly the testing set ranking of ISBN TF, LF and TF-NA models on each of the datasets, so it is fair to compare the best ISBN result among the three with other parsers.
- 6.
The MALT parser is trained to keep the word as long as possible: if both Shift and Reduce decisions are possible during training, it always prefers to shift. Though this strategy should generally reduce the described problem, it is evident from the low precision score for attachment to root that it can not completely eliminate it.
- 7.
Use of cross-validation with our model is relatively time-consuming and, thus, not quite feasible for the shared task.
- 8.
A piecewise-linear approximation for each individual language was used to compute the average. Experiments were run on a standard 2.4 GHz desktop PC.
- 9.
For Basque, Chinese, and Turkish this time is below 7 ms, but for English it is 38 ms. English, along with Catalan, required the largest beam across all ten languages. Note that accuracy in the lowest part of the curve can probably be improved by varying latent vector size and frequency cut-offs. Also, efficiency was not the main goal during the implementation of the parser, and it is likely that a much faster implementation is possible.
- 10.
The ISBN dependency parser is downloadable from http://flake.cs.uiuc.edu/titov/idp/
References
Abeillé, A. (Ed.) (2003). Treebanks: Building and Using Parsed Corpora. Dordrecht: Kluwer.
Aduriz, I., M. J. Aranzabe, J. M. Arriola, A. Atutxa, A. D. de Ilarraza, A. Garmendia, and M. Oronoz (2003). Construction of a Basque dependency treebank. In Proceedings of the 2nd Workshop on Treebanks and Linguistic Theories (TLT), Växjö, pp. 201–204.
Aho, A. V., R. Sethi, and J. D. Ullman (1986). Compilers: Principles, Techniques and Tools. Reading, MA: Addison Wesley.
Böhmová, A., J. Hajič, E. Hajičová, and B. Hladká (2003). The PDT: a 3-level annotation scenario. See Abeillé (2003), Chapter 7, pp. 103–127.
Bottou, L. (1991). Une approche théoretique de l’apprentissage connexionniste: Applications à la reconnaissance de la parole. Ph. D. thesis, Université de Paris XI, Paris.
Buchholz, S. and E. Marsi (2006). CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the 10th Conference on Computational Natural Language Learning, New York, NY, pp. 149–164.
Charniak, E. (2000). A maximum-entropy-inspired parser. In Proceedings of the 1st Meeting of North American Chapter of Association for Computational Linguistics, Seattle, WA, pp. 132–139.
Charniak, E. and M. Johnson (2005). Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the 43rd Meeting of Association for Computational Linguistics, Ann Arbor, MI, pp. 173–180.
Chen, K., C. Luo, M. Chang, F. Chen, C. Chen, C. Huang, and Z. Gao (2003). Sinica treebank: design criteria, representational issues and implementation. See Abeillé (2003), Chapter 13, pp. 231–248.
Collins, M. (1999). Head-Driven Statistical Models for Natural Language Parsing. Ph. D. thesis, University of Pennsylvania, Philadelphia, PA.
Collins, M. (2000). Discriminative reranking for natural language parsing. In Proceedings of the 17th International Conference on Machine Learning, Stanford, CA, pp. 175–182.
Csendes, D., J. Csirik, T. Gyimóthy, and A. Kocsor (2005). The Szeged Treebank. Springer, Berlin/ Heidelberg.
Dzeroski, S., T. Erjavec, N. Ledinek, P. Pajas, Z. Zabokrtsky, and A. Zele (2006). Towards a Slovene dependency treebank. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), Genoa, pp. 1388–1391.
Hajič, J., O. Smrž, P. Zemánek, J. Šnaidauf, and E. Beška (2004). Prague Arabic dependency treebank: development in data and tools. In Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools, Caira, pp. 110–117.
Hall, J., J. Nilsson, J. Nivre, G. Eryigit, B. Megyesi, M. Nilsson, and M. Saers (2007). Single malt or blended? a study in multilingual parser optimization. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, Prague, pp. 933–939.
Henderson, J. (2003). Inducing history representations for broad coverage statistical parsing. In Proceedings of the Joint Meeting of North American Chapter of the Association for Computational Linguistics and the Human Language Technology Conference, Edmonton, AB, pp. 103–110.
Henderson, J. (2004). Discriminative training of a neural network statistical parser. In Proceedings of the 42nd Meeting of Association for Computational Linguistics, Barcelona, pp. 95–102.
Henderson, J., P. Merlo, G. Musillo, and I. Titov (2008). A latent variable model of synchronous parsing for syntactic and semantic dependencies. In Proceedings of the CoNLL-2008 Shared Task, Manchester, pp. 178–182.
Henderson, J. and I. Titov (2005). Data-defined kernels for parse reranking derived from probabilistic models. In Proceedings of the 43rd Meeting of Association for Computational Linguistics, Ann Arbor, MI, pp. 181–188.
Johansson, R. and P. Nugues (2007). Extended constituent-to-dependency conversion for English. In Proceedings of the 16th Nordic Conference on Computational Linguistics (NODALIDA), Tartu, pp. 105–112.
Jordan, M. I., Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. (1999). An introduction to variational methods for graphical models. In M. I. Jordan (Ed.), Learning in Graphical Models, pp. 183–233. Cambridge, MA: MIT Press.
Koo, T. and M. Collins (2005). Hidden-variable models for discriminative reranking. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Vancouver, BC, pp. 507–514.
Kromann, M. T. (2003). The Danish dependency treebank and the underlying linguistic theory. In Proceedings of the 2nd Workshop on Treebanks and Linguistic Theories (TLT), Vaxjo.
Liang, P., S. Petrov, M. Jordan, and D. Klein (2007). The infinite PCFG using hierarchical dirichlet processes. In Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, pp. 688–697.
Marcus, M., B. Santorini, and M. Marcinkiewicz (1993). Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19(2), 313–330.
Martí, M. A., M. Taulé, L. Màrquez, and M. Bertran (2007). CESS-ECE: a multilingual and multilevel annotated corpus. Available for download from: http://www.lsi.upc.edu/ mbertran/cess-ece/
Matsuzaki, T., Y. Miyao, and J. Tsujii (2005). Probabilistic CFG with latent annotations. In Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, MI, pp. 75–82.
McDonald, R., K. Lerman, and F. Pereira (2006). Multilingual dependency analysis with a two-stage discriminative parser. In Proceeding of the 10th Conference on Computational Natural Language Learning, New York, NY, pp. 216–220.
Montemagni, S., F. Barsotti, M. Battista, N. Calzolari, O. Corazzari, A. Lenci, A. Zampolli, F. Fanciulli, M. Massetani, R. Raffaelli, R. Basili, M. T. Pazienza, D. Saracino, F. Zanzotto, N. Nana, F. Pianesi, and R. Delmonte (2003). Building the Italian Syntactic-Semantic Treebank. See Abeillé (2003), Chapter 11, pp. 189–210.
Murphy, K. P. (2002). Dynamic Belief Networks: Representation, Inference and Learning. Ph. D. thesis, University of California, Berkeley, CA.
Musillo, G. and P. Merlo (2008). Unlexicalised hidden variable models of split dependency grammars. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Columbus, OH, pp. 213–216.
Nakagawa, T. (2007). Multilingual dependency parsing using global features. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 952–956.
Neal, R. (1992). Connectionist learning of belief networks. Artificial Intelligence 56, 71–113.
Nivre, J., J. Hall, and J. Nilsson (2004). Memory-based dependency parsing. In Proceedings of the 8th Conference on Computational Natural Language Learning, Boston, MA, pp. 49–56.
Nivre, J., J. Hall, J. Nilsson, G. Eryigit, and S. Marinov (2006). Pseudo-projective dependency parsing with support vector machines. In Proceedings of the 10th Conference on Computational Natural Language Learning, New York, NY, pp. 221–225.
Oflazer, K., B. Say, D. Z. Hakkani-Tür, and G. Tür (2003). Building a Turkish treebank. See Abeillé (2003), Chapter 15, pp. 261–277.
Peshkin, L. and V. Savova (2005). Dependency parsing with dynamic Bayesian network. In AAAI, 20th National Conference on Artificial Intelligence, Pittsburgh, PA, pp. 1112–1117.
Petrov, S., L. Barrett, R. Thibaux, and D. Klein (2006). Learning accurate, compact, and interpretable tree annotation. In Proceedings of the Annual Meeting of the ACL and the International Conference on Computational Linguistics, Sydney, pp. 433–44.
Petrov, S. and D. Klein (2007). Improved inference for unlexicalized parsing. In Proceedings of the Conference on Human Language Technology and North American chapter of the Association for Computational Linguistics (HLT-NAACL 2007), Rochester, NY, pp. 404–411.
Prescher, D. (2005). Head-driven PCFGs with latent-head statistics. In Proceedings of the 9th International Workshop on Parsing Technologies, Vancouver, BC, pp. 115–124.
Prokopidis, P., E. Desypri, M. Koutsombogera, H. Papageorgiou, and S. Piperidis (2005). Theoretical and practical issues in the construction of a Greek dependency treebank. In Proceedings of the 4th Workshop on Treebanks and Linguistic Theories (TLT), Barcelona, pp. 149–160.
Riezler, S., T. H. King, R. M. Kaplan, R. Crouch, J. T. Maxwell, and M. Johnson (2002). Parsing the Wall Street Journal using a Lexical-Functional Grammar and discriminative estimation techniques. In Proceedings of the 40th Meeting of Association for Computational Linguistics, Philadelphia, PA, pp. 271–278.
Sallans, B. (2002). Reinforcement Learning for Factored Markov Decision Processes. Ph. D. thesis, University of Toronto, Toronto, ON.
Sha, F. and F. Pereira (2003). Shallow parsing with conditional random fields. In Proceedings of the Joint Meeting of North American Chapter of the Association for Computational Linguistics and the Human Language Technology Conference, Edmonton, AB, pp. 213–220.
Titov, I. and J. Henderson (2007). Constituent parsing with Incremental Sigmoid Belief Networks. In Proc. 45th Meeting of Association for Computational Linguistics (ACL), Prague, pp. 632–639.
van der Beek, L., G. Bouma, J. Daciuk, T. Gaustad, R. Malouf, G. van Noord, R. Prins, and B. Villada (2002). The Alpino dependency treebank. In Computational Linguistic in the Netherlands (CLIN), Enschede, pp. 8–22.
Acknowledgements
This work was funded by Swiss NSF grant 200020-109685, Swiss NSF Fellowship PBGE22-119276, UK EPSRC grant EP/E019501/1, EU FP6 grant 507802 (TALK project), and EU FP7 grant 216594 (CLASSiC project).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Titov, I., Henderson, J. (2010). A Latent Variable Model for Generative Dependency Parsing. In: Bunt, H., Merlo, P., Nivre, J. (eds) Trends in Parsing Technology. Text, Speech and Language Technology, vol 43. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9352-3_3
Download citation
DOI: https://doi.org/10.1007/978-90-481-9352-3_3
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-9351-6
Online ISBN: 978-90-481-9352-3
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)