From Treebank Conversion to Automatic Dependency Parsing for Vietnamese

  • Dat Quoc Nguyen
  • Dai Quoc Nguyen
  • Son Bao Pham
  • Phuong-Thai Nguyen
  • Minh Le Nguyen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8455)

Abstract

This paper presents a new conversion method to automatically transform a constituent-based Vietnamese Treebank into dependency trees. On a dependency Treebank created according to our new approach, we examine two state-of-the-art dependency parsers: the MSTParser and the MaltParser. Experiments show that the MSTParser outperforms the MaltParser. To the best of our knowledge, we report the highest performances published to date in the task of dependency parsing for Vietnamese. Particularly, on gold standard POS tags, we get an unlabeled attachment score of 79.08% and a labeled attachment score of 71.66%.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kübler, S., McDonald, R., Nivre, J.: Dependency Parsing. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers (2009)Google Scholar
  2. 2.
    Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, CoNLL-X, pp. 149–164 (2006)Google Scholar
  3. 3.
    Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 Shared Task on Dependency Parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 915–932 (2007)Google Scholar
  4. 4.
    McDonald, R., Nivre, J.: Characterizing the Errors of Data-Driven Dependency Parsing Models. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 122–131 (June 2007)Google Scholar
  5. 5.
    McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Non-projective Dependency Parsing Using Spanning Tree Algorithms. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT 2005, pp. 523–530 (2005)Google Scholar
  6. 6.
    McDonald, R., Lerman, K., Pereira, F.: Multilingual Dependency Analysis with a Two-stage Discriminative Parser. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, CoNLL-X 2006, pp. 216–220 (2006)Google Scholar
  7. 7.
    Nakagawa, T.: Multilingual Dependency Parsing Using Global Features. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 952–956 (2007)Google Scholar
  8. 8.
    Koo, T., Collins, M.: Efficient Third-order Dependency Parsers. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 1–11 (2010)Google Scholar
  9. 9.
    Yamada, H., Matsumoto, Y.: Statistical dependency analysis with support vector machines. In: Proceedings of the 8th International Workshop of Parsing Technologies, IWPT 2003 (2003)Google Scholar
  10. 10.
    Nilsson, J., Nivre, J., Hall, J.: Graph Transformations in Data-Driven Dependency Parsing. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 257–264 (July 2006)Google Scholar
  11. 11.
    Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S., Marsi, E.: MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering 13, 1 (2007)CrossRefGoogle Scholar
  12. 12.
    Nivre, J., McDonald, R.: Integrating Graph-Based and Transition-Based Dependency Parsers. In: Proceedings of ACL 2008, pp. 950–958. HLT (June 2008)Google Scholar
  13. 13.
    Zhang, Y., Clark, S.: A Tale of Two Parsers: Investigating and Combining Graph-based and Transition-based Dependency Parsing. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii, pp. 562–571 (October 2008)Google Scholar
  14. 14.
    Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)Google Scholar
  15. 15.
    Nguyen, P.T., Vu, X.L., Nguyen, T.M.H., Nguyen, V.H., Le, H.P.: Building a Large Syntactically-Annotated Corpus of Vietnamese. In: Proceedings of the Third Linguistic Annotation Workshop, pp. 182–185 (August 2009)Google Scholar
  16. 16.
    Johansson, R., Nugues, P.: Extended Constituent-to-dependency Conversion for English. In: Proceedings of 16th Nordic Conference of Computational Linguistics, NODALIDA 2007, Tartu, Estonia, pp. 105–112 (2007)Google Scholar
  17. 17.
    Collins, M.: Three generative, lexicalised models for statistical parsing. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, ACL 1997, pp. 16–23 (1997)Google Scholar
  18. 18.
    Seeker, W., Kuhn, J.: Making Ellipses Explicit in Dependency Conversion for a German Treebank. In: Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, pp. 3132–3139 (2012)Google Scholar
  19. 19.
    Candito, M., Crabbé, B., Denis, P.: Statistical French dependency parsing: treebank conversion and first results. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010 (2010)Google Scholar
  20. 20.
    Gelbukh, A., Calvo, H., Torres, S.: Transforming a constituency treebank into a dependency treebank. In: Proceedings of XXI Conference of the Spanish Society for Natural Language Processing, SEPLN 2005, vol. 35, pp. 145–152 (2005)Google Scholar
  21. 21.
    Marinov, S., Nivre, J.: A data-driven dependency parser for Bulgarian. In: Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (2005)Google Scholar
  22. 22.
    Ma, X., Zhang, X., Zhao, H., Lu, B.L.: Dependency Parser for Chinese Constituent Parsing. In: Joint Conference on Chinese Language Processing, pp. 1–6 (2010)Google Scholar
  23. 23.
    Choi, J.D., Palmer, M.: Statistical dependency parsing in Korean: from corpus generation to automatic parsing. In: Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages, pp. 1–11 (2011)Google Scholar
  24. 24.
    Hong, P.L., Nguyen, T.M.H., Roussanaly, A.: Vietnamese Parsing with an Automatically Extracted Tree-Adjoining Grammar. In: Proceedings of the 9th IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future, pp. 1–6. IEEE (February 2012)Google Scholar
  25. 25.
    Thi, L.N., My, L.H., Viet, H.N., Minh, H.N.T., Hong, P.L.: Building a Treebank for Vietnamese Dependency Parsing. In: Proceedings of the 10th IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (2013)Google Scholar
  26. 26.
    Le-Hong, P., Nguyen, T.M.H., Nguyen, P.T., Roussanaly, A.: Automated extraction of tree adjoining grammars from a treebank for Vietnamese. In: Proceedings of The Tenth International Workshop on Tree Adjoining Grammars and Related Formalisms (2010)Google Scholar
  27. 27.
    Choi, J.D., Palmer, M.: Robust constituent-to-dependency conversion for English. In: Proceedings of 9th Treebanks and Linguistic Theories Workshop, pp. 55–66 (2010)Google Scholar
  28. 28.
    de Marneffe, M.C., Manning, C.D.: The Stanford typed dependencies representation. In: Proceedings of the Coling 2008 workshop on Cross-Framework and Cross-Domain Parser Evaluation. Number, pp. 1–8 (2008)Google Scholar
  29. 29.
    Čmejrek, M., Cu\vr\’in, J., Havelka, J.: Prague Czech-English Dependency Treebank: Any Hopes for a Common Annotation Scheme? In: HLT-NAACL 2004 Workshop: Frontiers in Corpus Annotation, pp. 47–54 (May 2004)Google Scholar
  30. 30.
    Ballesteros, M., Nivre, J.: MaltOptimizer: A System for MaltParser Optimization. In: Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, vol. (2006), pp. 2757–2763 (2012)Google Scholar
  31. 31.
    Nguyen, D.Q., Nguyen, D.Q., Pham, S.B., Pham, D.D.: Ripple Down Rules for Part-of-Speech Tagging. In: Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2011, vol. Part I, pp. 190–201 (2011)Google Scholar
  32. 32.
    Nguyen, D.Q., Nguyen, D.Q., Pham, D.D., Pham, S.B.: RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger. In: Proc. of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014 (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Dat Quoc Nguyen
    • 1
  • Dai Quoc Nguyen
    • 1
  • Son Bao Pham
    • 1
  • Phuong-Thai Nguyen
    • 1
  • Minh Le Nguyen
    • 2
  1. 1.Faculty of Information Technology, University of Engineering and TechnologyVietnam National UniversityHanoiVietnam
  2. 2.School of Information ScienceJapan Advanced Institute of Science and TechnologyJapan

Personalised recommendations