Dependency Parsing Using Global Features

Nakagawa, Tetsuji

doi:10.1007/978-90-481-9352-3_5

Tetsuji Nakagawa⁴

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 43))

543 Accesses

Abstract

Many methods for statistical dependency parsing have been studied. For example, McDonald et al. (2005a) proposed a method for projective dependency parsing using an online large-margin training algorithm, and later extended it to a non-projective dependency parsing method (McDonald et al., 2005b). However, these studies assumed that the heads of tokens in a sentence were independent from each other, and had limited available features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The term token is used here to represent the basic unit of dependency parsing. Words are used as the basic unit in many languages, but other kinds of units are used in some languages (e.g. a chunk called bunsetsu is often used in Japanese). A part of speech (POS) tag is assumed to be attached to each token.
2.
\({\cal H}(\mathbf{w})\) is a superset of the set of non-projective trees; it is an unnecessarily large set and contains ill-formed dependency structures such as graphs with cycles. This issue may cause a reduction in parsing performance, but this approximation is adopted for computational efficiency.
3.
Although more accurate methods for the parameter estimation may give better performance, the static Monte Carlo method using fixed samples is adopted here in order to approximately estimate the parameters at a reasonable computational cost.
4.
http://sourceforge.net/projects/mstparser/
5.
Although 13 languages were handled in the shared task, only the four languages were used because the corpora of the four languages are freely available. http://nextens.uvt.nl/~conll/

References

Abeillé, A. (Ed.) (2003). Treebanks: Building and Using Parsed Corpora. Dordrecht: Kluwer.
Google Scholar
Aduriz, I., M.J. Aranzabe, J.M. Arriola, A. Atutxa, A.D. de Ilarraza, A. Garmendia, and M. Oronoz (2003). Construction of a Basque dependency treebank. In Proceedings of the 2nd Workshop on Treebanks and Linguistic Theories (TLT), pp. 201–204.
Google Scholar
Afonso, S., E. Bick, R. Haber, and D. Santos (2002). “Floresta sintá(c)tica”: a treebank for Portuguese. In Proceedings of LREC 2002, pp. 1698–1703.
Google Scholar
Andrieu, C., N. de Freitas, A. Doucet, and M.I. Jordan (2003). An introduction to MCMC for machine learning. Machine Learning 50, 5–43.
Article Google Scholar
Böhmová, A., J. Hajič, E. Hajičová, and B. Hladká (2003). The PDT: a 3-level annotation scenario. See Abeillé (2003), Chapter 7, pp. 103–127.
Buchholz, S. and E. Marsi (2006). Conll-x shared task on multilingual dependency parsing. In Proceedings of CoNLL 2006, New York, NY, pp. 149–164.
Google Scholar
Chen, K., C. Luo, M. Chang, F. Chen, C. Chen, C. Huang, and Z. Gao (2003). Sinica treebank: Design criteria, representational issues and implementation. See Abeillé (2003), Chapter 13, pp. 231–248.
Collins, M. and T. Koo (2005). Discriminative reranking for natural language parsing. Computational Linguistics 31(1), 25–69.
Article Google Scholar
Covington, M.A. (2001). A fundamental algorithm for dependency parsing. In Proceedings of ACM Southeast Conference 2001, pp. 95–102.
Google Scholar
Csendes, D., J. Csirik, T. Gyimóthy, and A. Kocsor (2005). The Szeged Treebank. Springer.
Google Scholar
Eisner, J. (1996). Three new probabilistic models for dependency parsing: an exploration. In Proceedings of COLING ’96, pp. 340–345.
Google Scholar
Hajič, J., O. Smrž, P. Zemánek, J. Šnaidauf, and E. Beška (2004). Prague Arabic dependency treebank: Development in data and tools. In Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools, pp. 110–117.
Google Scholar
Hall, K. (2007). K-best spanning tree parsing. In Proceedings of ACL 2007, pp. 392–399.
Google Scholar
Johansson, R. and P. Nugues (2007). Extended constituent-to-dependency conversion for English. In Proceedings of the 16th Nordic Conference on Computational Linguistics (NODALIDA).
Google Scholar
Johnson, M., S. Geman, S. Canon, Z. Chi, and S. Riezler (1999). Estimators for stochastic “unification-based” grammars. In Proceedings of ACL’99, pp. 535–541.
Google Scholar
Kromann, M.T. (2003). The Danish dependency treebank and the underlying linguistic theory. In Proceedings of the 2nd Workshop on Treebanks and Linguistic Theories (TLT).
Google Scholar
Kudo, T. and Y. Matsumoto (2002). Japanese dependency analysis using cascaded chunking. In Proceedings of CoNLL 2002, pp. 63–69.
Google Scholar
Liu, D.C. and J. Nocedal (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming 45(3), 503–528.
Article Google Scholar
Marcus, M., B. Santorini, and M. Marcinkiewicz (1993). Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19(2), 313–330.
Google Scholar
Martí, M.A., M. Taulé, L. Màrquez, and M. Bertran (2007). CESS-ECE: A multilingual and multilevel annotated corpus. Available for download from: http://www.lsi.upc.edu/~mbertran/cess-ece/.
McDonald, R., K. Crammer, and F. Pereira (2005a). Online large-margin training of dependency parsers. In Proceedings of ACL 2005, pp. 91–98.
Google Scholar
McDonald, R. and F. Pereira (2006). Online learning of approximate dependency parsing algorithms. In Proceedings of EACL 2006, pp. 81–88.
Google Scholar
McDonald, R., F. Pereira, K. Ribarow, and J. Hajic (2005b). Non-projective dependency parsing using spanning tree algorithms. In Proceedings of HLT/EMNLP 2005, pp. 523–530.
Google Scholar
Montemagni, S., F. Barsotti, M. Battista, N. Calzolari, O. Corazzari, A. Lenci, A. Zampolli, F. Fanciulli, M. Massetani, R. Raffaelli, R. Basili, M.T. Pazienza, D. Saracino, F. Zanzotto, N. Nana, F. Pianesi, and R. Delmonte (2003). Building the Italian syntactic-semantic treebank. See Abeillé (2003), Chapter 11, pp. 189–210.
Nakagawa, T., T. Kudo, and Y. Matsumoto (2002). Revision learning and its application to part-of-speech tagging. In Proceedings of ACL 2002, pp. 497–504.
Google Scholar
Nilsson, J., J. Hall, and J. Nivre (2005). MAMBA meets TIGER: reconstructing a Swedish treebank from antiquity. In Proceedings of the NODALIDA Special Session on Treebanks.
Google Scholar
Nivre, J. (2003). An efficient algorithm for projective dependency parsing. In Proceedings of IWPT 2003, pp. 149–160.
Google Scholar
Nivre, J., J. Hall, S. Kübler, R. McDonald, J. Nilsson, S. Riedel, and D. Yuret (2007). The CoNLL 2007 shared task on dependency parsing. In Proceedings of the CoNLL 2007 Shared Task. Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL).
Google Scholar
Oflazer, K., B. Say, D.Z. Hakkani-Tür, and G. Tür (2003). Building a Turkish treebank. See Abeillé (2003), Chapter 15, pp. 261–277.
Prokopidis, P., E. Desypri, M. Koutsombogera, H. Papageorgiou, and S. Piperidis (2005). Theoretical and practical issues in the construction of a Greek dependency treebank. In Proceedings of the 4th Workshop on Treebanks and Linguistic Theories (TLT), pp. 149–160.
Google Scholar
Riedel, S. and J. Clarke (2006). Incremental integer linear programming for non-projective dependency parsing. In Proceedings of EMNLP 2006, pp. 129–137.
Google Scholar
Rosenfeld, R., S.F. Chen, and X. Zhu (2001). Whole-sentence exponential language models: a vehicle for linguistic-statistical integration. Computers Speech and Language 15(1), 55–73.
Article Google Scholar
Roth, D. and W. Yih (2004). A linear programming formulation for global inference in natural language tasks. In Proceedings of CoNLL 2004, pp. 1–8.
Google Scholar
Tamura, A., H. Takamura, and M. Okumura (2007). Japanese dependency analysis using ancestor-descendant relations. In Proceedings of EMNLP-CoNLL 2007, pp. 600–609.
Google Scholar
van der Beek, L., G. Bouma, R. Malouf, and G. van Noord (2002). The Alpino dependency treebank. In M. Theune, A. Nijholt, H. Hondorp (eds.),Computational Linguistics in the Netherlands 2001. ( http://www.rodopi.nl/senj.asp?BookId=LC+45).
Yamada, H. and Y. Matsumoto (2003). Statistical dependency analysis with support vector machines. In Proceedings of IWPT 2003, pp. 195–206.
Google Scholar

Download references

Acknowledgements

The author would like to thank Professor Yuji Matsumoto at Nara Institute of Science and Technology and Dr. Hiroyasu Yamada at JustSystems Corporation for their help.

Author information

Authors and Affiliations

Knowledge Creating Communication Research Center, National Institute of Information and Communications Technology, Kyoto, 619-0289, Japan
Tetsuji Nakagawa

Authors

Tetsuji Nakagawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tetsuji Nakagawa .

Editor information

Editors and Affiliations

Tilburg University, Warandelaan 2, Tilburg, 5000 LE, Netherlands
Harry Bunt
Dépt. Linguistique, Université de Genève, rue de Candolle 2, Genève, 1211, Switzerland
Paola Merlo
Pimpstensvägen 16, Uppsala, 752 67, Sweden
Joakim Nivre

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nakagawa, T. (2010). Dependency Parsing Using Global Features. In: Bunt, H., Merlo, P., Nivre, J. (eds) Trends in Parsing Technology. Text, Speech and Language Technology, vol 43. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9352-3_5

Download citation

DOI: https://doi.org/10.1007/978-90-481-9352-3_5
Published: 29 September 2010
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-9351-6
Online ISBN: 978-90-481-9352-3
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)

Publish with us

Policies and ethics