Fast Dependency Parsing Using Distributed Word Representations

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9441)


In this work, we propose to use distributed word representations in a greedy, transition-based dependency parsing framework. Instead of using a very large number of sparse indicator features, the multinomial logistic regression classifier employed by the parser learns and uses a small number of dense features, therefore it can work very fast. The distributed word representations are produced by a continuous skip-gram model using a neural network architecture. Experiments on a Vietnamese dependency treebank show that the parser not only works faster but also achieves better accuracy in comparison to a conventional transition-based dependency parser.


Natural Language Processing Dependency Graph Dependency Parser Syntactic Dependency Parsing Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research is partly funded by the Vietnam National University, Hanoi (VNU) under project number QG.15.04. The last author is funded by Hanoi University of Science (HUS) under project number TN.15.04. The authors would like to thank Dr. Dang Hoang Vu of FPT Research for providing us the distributed representations of Vietnamese words. We are grateful to our anonymous reviewers for their helpful comments which helped us improve the quality of the article in terms of both presentation and content.


  1. 1.
    McDonald, R., Nivre, J.: Analyzing and integrating dependency parsers. Comput. Linguist. 37(1), 197–230 (2011)CrossRefGoogle Scholar
  2. 2.
    Chen, D., Manning, C.D.: A fast and accurate dependency parser using neural networks. In: Proceedings of EMNLP, pp. 740–750. ACL (2014)Google Scholar
  3. 3.
    Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of ACL, Uppsala, Sweden, pp. 384–394 (2010)Google Scholar
  4. 4.
    Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)zbMATHGoogle Scholar
  5. 5.
    Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. In: Proceedings of AISTATS, Barbados, pp. 246–252 (2005)Google Scholar
  6. 6.
    Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of ICML, New York, NY, USA, pp. 160–167 (2008)Google Scholar
  7. 7.
    Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 21, pp. 1081–1088. Curran Associates Inc. (2009)Google Scholar
  8. 8.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR, Scottsdale, Arizona, USA (2013)Google Scholar
  9. 9.
    Nivre, J.: An efficient algorithm for projective dependency parsing. In: Proceedings of the 8th International Workshop on Parsing Technologies (IWPT 03), Nancy, France, pp. 149–160 (2003)Google Scholar
  10. 10.
    Nivre, J., Scholz, M.: Deterministic dependency parsing of English text. In: Proceedings of COLING 2004, Geneva, Switzerland (2004)Google Scholar
  11. 11.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates Inc. (2013)Google Scholar
  12. 12.
    Andrew, G., Gao, J.: Scalable training of \(l_1\)-regularized log-linear models. In: Proceedings of ICML, Oregon State University, Corvallis, USA, pp. 33–40 (2007)Google Scholar
  13. 13.
    Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)zbMATHGoogle Scholar
  14. 14.
    Le-Hong, P., Thi Minh Huyên, N., Roussanaly, A., Vinh, H.T.: A hybrid approach to word segmentation of Vietnamese texts. In: Martín-Vide, C., Fernau, H., Otto, F. (eds.) LATA 2008. LNCS, vol. 5196, pp. 240–249. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  15. 15.
    Nguyen, T.L., Ha, M.L., Nguyen, V.H., Nguyen, T.M.H., Le-Hong, P.: Building a treebank for Vietnamese dependency parsing. In: The 10th IEEE RIVF, Hanoi, Vietnam, pp. 147–151. IEEE (2013)Google Scholar
  16. 16.
    Nguyen, P.T., Xuan, L.V., Nguyen, T.M.H., Nguyen, V.H., Le-Hong, P.: Building a large syntactically-annotated corpus of Vietnamese. In: Proceedings of the 3rd Linguistic Annotation Workshop, ACL-IJCNLP, Suntec City, Singapore, pp. 182–185 (2009)Google Scholar
  17. 17.
    Bohnet, B.: Very high accuracy and fast dependency parsing is not a contradiction. In: Proceedings of COLING, Beijing, China, pp. 89–97 (2010)Google Scholar
  18. 18.
    Koo, T., Carreras, X., Collins, M.: Simple semi-supervised dependency parsing. In: Proceedings of ACL-HLT, Columbus, Ohio, USA, pp. 595–603 (2008)Google Scholar
  19. 19.
    Garg, N., Henderson, J.: Temporal restricted Boltzman machines for dependency parsing. In: Proceedings of ACL-HLT, Portland, Oregon, USA, pp. 11–17 (2011)Google Scholar
  20. 20.
    Collins, M.: Head-driven statistical models for natural language parsing. Comput. Linguist. 29(4), 589–637 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Le-Hong, P., Roussanaly, A., Nguyen, T.M.H.: A syntactic component for Vietnamese language processing. J. Lang. Model. 3(1), 145–183 (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.VNU University of ScienceHanoiVietnam
  2. 2.Dalat UniversityLamdongVietnam
  3. 3.FPT ResearchHanoiVietnam

Personalised recommendations