Skip to main content

Dependency Parsing Using Global Features

  • Chapter
  • First Online:
Trends in Parsing Technology

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 43))

  • 543 Accesses

Abstract

Many methods for statistical dependency parsing have been studied. For example, McDonald et al. (2005a) proposed a method for projective dependency parsing using an online large-margin training algorithm, and later extended it to a non-projective dependency parsing method (McDonald et al., 2005b). However, these studies assumed that the heads of tokens in a sentence were independent from each other, and had limited available features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The term token is used here to represent the basic unit of dependency parsing. Words are used as the basic unit in many languages, but other kinds of units are used in some languages (e.g. a chunk called bunsetsu is often used in Japanese). A part of speech (POS) tag is assumed to be attached to each token.

  2. 2.

    \({\cal H}(\mathbf{w})\) is a superset of the set of non-projective trees; it is an unnecessarily large set and contains ill-formed dependency structures such as graphs with cycles. This issue may cause a reduction in parsing performance, but this approximation is adopted for computational efficiency.

  3. 3.

    Although more accurate methods for the parameter estimation may give better performance, the static Monte Carlo method using fixed samples is adopted here in order to approximately estimate the parameters at a reasonable computational cost.

  4. 4.

    http://sourceforge.net/projects/mstparser/

  5. 5.

    Although 13 languages were handled in the shared task, only the four languages were used because the corpora of the four languages are freely available. http://nextens.uvt.nl/~conll/

References

  • Abeillé, A. (Ed.) (2003). Treebanks: Building and Using Parsed Corpora. Dordrecht: Kluwer.

    Google Scholar 

  • Aduriz, I., M.J. Aranzabe, J.M. Arriola, A. Atutxa, A.D. de Ilarraza, A. Garmendia, and M. Oronoz (2003). Construction of a Basque dependency treebank. In Proceedings of the 2nd Workshop on Treebanks and Linguistic Theories (TLT), pp. 201–204.

    Google Scholar 

  • Afonso, S., E. Bick, R. Haber, and D. Santos (2002). “Floresta sintá(c)tica”: a treebank for Portuguese. In Proceedings of LREC 2002, pp. 1698–1703.

    Google Scholar 

  • Andrieu, C., N. de Freitas, A. Doucet, and M.I. Jordan (2003). An introduction to MCMC for machine learning. Machine Learning 50, 5–43.

    Article  Google Scholar 

  • Böhmová, A., J. Hajič, E. Hajičová, and B. Hladká (2003). The PDT: a 3-level annotation scenario. See Abeillé (2003), Chapter 7, pp. 103–127.

  • Buchholz, S. and E. Marsi (2006). Conll-x shared task on multilingual dependency parsing. In Proceedings of CoNLL 2006, New York, NY, pp. 149–164.

    Google Scholar 

  • Chen, K., C. Luo, M. Chang, F. Chen, C. Chen, C. Huang, and Z. Gao (2003). Sinica treebank: Design criteria, representational issues and implementation. See Abeillé (2003), Chapter 13, pp. 231–248.

  • Collins, M. and T. Koo (2005). Discriminative reranking for natural language parsing. Computational Linguistics 31(1), 25–69.

    Article  Google Scholar 

  • Covington, M.A. (2001). A fundamental algorithm for dependency parsing. In Proceedings of ACM Southeast Conference 2001, pp. 95–102.

    Google Scholar 

  • Csendes, D., J. Csirik, T. Gyimóthy, and A. Kocsor (2005). The Szeged Treebank. Springer.

    Google Scholar 

  • Eisner, J. (1996). Three new probabilistic models for dependency parsing: an exploration. In Proceedings of COLING ’96, pp. 340–345.

    Google Scholar 

  • Hajič, J., O. Smrž, P. Zemánek, J. Šnaidauf, and E. Beška (2004). Prague Arabic dependency treebank: Development in data and tools. In Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools, pp. 110–117.

    Google Scholar 

  • Hall, K. (2007). K-best spanning tree parsing. In Proceedings of ACL 2007, pp. 392–399.

    Google Scholar 

  • Johansson, R. and P. Nugues (2007). Extended constituent-to-dependency conversion for English. In Proceedings of the 16th Nordic Conference on Computational Linguistics (NODALIDA).

    Google Scholar 

  • Johnson, M., S. Geman, S. Canon, Z. Chi, and S. Riezler (1999). Estimators for stochastic “unification-based” grammars. In Proceedings of ACL’99, pp. 535–541.

    Google Scholar 

  • Kromann, M.T. (2003). The Danish dependency treebank and the underlying linguistic theory. In Proceedings of the 2nd Workshop on Treebanks and Linguistic Theories (TLT).

    Google Scholar 

  • Kudo, T. and Y. Matsumoto (2002). Japanese dependency analysis using cascaded chunking. In Proceedings of CoNLL 2002, pp. 63–69.

    Google Scholar 

  • Liu, D.C. and J. Nocedal (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming 45(3), 503–528.

    Article  Google Scholar 

  • Marcus, M., B. Santorini, and M. Marcinkiewicz (1993). Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19(2), 313–330.

    Google Scholar 

  • Martí, M.A., M. Taulé, L. Màrquez, and M. Bertran (2007). CESS-ECE: A multilingual and multilevel annotated corpus. Available for download from: http://www.lsi.upc.edu/~mbertran/cess-ece/.

  • McDonald, R., K. Crammer, and F. Pereira (2005a). Online large-margin training of dependency parsers. In Proceedings of ACL 2005, pp. 91–98.

    Google Scholar 

  • McDonald, R. and F. Pereira (2006). Online learning of approximate dependency parsing algorithms. In Proceedings of EACL 2006, pp. 81–88.

    Google Scholar 

  • McDonald, R., F. Pereira, K. Ribarow, and J. Hajic (2005b). Non-projective dependency parsing using spanning tree algorithms. In Proceedings of HLT/EMNLP 2005, pp. 523–530.

    Google Scholar 

  • Montemagni, S., F. Barsotti, M. Battista, N. Calzolari, O. Corazzari, A. Lenci, A. Zampolli, F. Fanciulli, M. Massetani, R. Raffaelli, R. Basili, M.T. Pazienza, D. Saracino, F. Zanzotto, N. Nana, F. Pianesi, and R. Delmonte (2003). Building the Italian syntactic-semantic treebank. See Abeillé (2003), Chapter 11, pp. 189–210.

  • Nakagawa, T., T. Kudo, and Y. Matsumoto (2002). Revision learning and its application to part-of-speech tagging. In Proceedings of ACL 2002, pp. 497–504.

    Google Scholar 

  • Nilsson, J., J. Hall, and J. Nivre (2005). MAMBA meets TIGER: reconstructing a Swedish treebank from antiquity. In Proceedings of the NODALIDA Special Session on Treebanks.

    Google Scholar 

  • Nivre, J. (2003). An efficient algorithm for projective dependency parsing. In Proceedings of IWPT 2003, pp. 149–160.

    Google Scholar 

  • Nivre, J., J. Hall, S. Kübler, R. McDonald, J. Nilsson, S. Riedel, and D. Yuret (2007). The CoNLL 2007 shared task on dependency parsing. In Proceedings of the CoNLL 2007 Shared Task. Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL).

    Google Scholar 

  • Oflazer, K., B. Say, D.Z. Hakkani-Tür, and G. Tür (2003). Building a Turkish treebank. See Abeillé (2003), Chapter 15, pp. 261–277.

  • Prokopidis, P., E. Desypri, M. Koutsombogera, H. Papageorgiou, and S. Piperidis (2005). Theoretical and practical issues in the construction of a Greek dependency treebank. In Proceedings of the 4th Workshop on Treebanks and Linguistic Theories (TLT), pp. 149–160.

    Google Scholar 

  • Riedel, S. and J. Clarke (2006). Incremental integer linear programming for non-projective dependency parsing. In Proceedings of EMNLP 2006, pp. 129–137.

    Google Scholar 

  • Rosenfeld, R., S.F. Chen, and X. Zhu (2001). Whole-sentence exponential language models: a vehicle for linguistic-statistical integration. Computers Speech and Language 15(1), 55–73.

    Article  Google Scholar 

  • Roth, D. and W. Yih (2004). A linear programming formulation for global inference in natural language tasks. In Proceedings of CoNLL 2004, pp. 1–8.

    Google Scholar 

  • Tamura, A., H. Takamura, and M. Okumura (2007). Japanese dependency analysis using ancestor-descendant relations. In Proceedings of EMNLP-CoNLL 2007, pp. 600–609.

    Google Scholar 

  • van der Beek, L., G. Bouma, R. Malouf, and G. van Noord (2002). The Alpino dependency treebank. In M. Theune, A. Nijholt, H. Hondorp (eds.),Computational Linguistics in the Netherlands 2001. ( http://www.rodopi.nl/senj.asp?BookId=LC+45).

  • Yamada, H. and Y. Matsumoto (2003). Statistical dependency analysis with support vector machines. In Proceedings of IWPT 2003, pp. 195–206.

    Google Scholar 

Download references

Acknowledgements

The author would like to thank Professor Yuji Matsumoto at Nara Institute of Science and Technology and Dr. Hiroyasu Yamada at JustSystems Corporation for their help.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tetsuji Nakagawa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Nakagawa, T. (2010). Dependency Parsing Using Global Features. In: Bunt, H., Merlo, P., Nivre, J. (eds) Trends in Parsing Technology. Text, Speech and Language Technology, vol 43. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9352-3_5

Download citation

Publish with us

Policies and ethics