The Effect of Semi-supervised Learning on Parsing Long Distance Dependencies in German and Swedish

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6233)


This paper shows how the best data-driven dependency parsers available today [1] can be improved by learning from unlabeled data. We focus on German and Swedish and show that labeled attachment scores improve by 1.5%-2.5%. Error analysis shows that improvements are primarily due to better recovery of long distance dependencies.


Random Forest Unlabeled Data Development Data Dependency Tree Distance Dependency 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Martins, A., Das, D., Smith, N., Xing, E.: Stacking dependency parsers. In: EMNLP, Honolulu, Hawaii (2008)Google Scholar
  2. 2.
    Rimell, L., Clark, S., Steedman, M.: Unbounded dependency recovery for parser evaluation. In: EMNLP, Singapore (2009)Google Scholar
  3. 3.
    Abney, S.: Semi-supervised learning for computational linguistics. Chapman and Hall, Boca Raton (2008)Google Scholar
  4. 4.
    Wolpert, D.: Stacked generalization. Neural Networks 5, 241–259 (1992)CrossRefGoogle Scholar
  5. 5.
    Sagae, K., Lavie, A.: Parser combination by reparsing. In: HLT-NAACL, New York City, NY (2006)Google Scholar
  6. 6.
    Hall, J.: colleagues: Single malt or blended? In: CONLL, Prague, Czech Republic (2007)Google Scholar
  7. 7.
    Nivre, J., McDonald, R.: Integrating graph-based and transition-based dependency parsers. In: ACL-HLT, Columbus, Ohio (2008)Google Scholar
  8. 8.
    Fishel, M., Nivre, J.: Voting and stacking in data-driven dependency parsing. In: NODALIDA, Odense, Denmark (2009)Google Scholar
  9. 9.
    Surdeanu, M., Manning, C.: Ensemble models for dependency parsing: cheap and good? In: NAACL, Los Angeles, CA (2010)Google Scholar
  10. 10.
    Li, M., Zhou, Z.H.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17(11), 1529–1541 (2005)CrossRefGoogle Scholar
  11. 11.
    Koo, T., Carreras, X., Collins, M.: Simple semi-supervised dependency parsing. In: ACL, Columbus, Ohio (2008)Google Scholar
  12. 12.
    Wang, Q., Lin, D., Schuurmans, D.: Semi-supervised convex training for dependency parsing. In: ACL, Columbus, Ohio (2008)Google Scholar
  13. 13.
    Suzuki, J., Isozaki, H., Carreras, X., Collins, M.: Semi-supervised convex training for dependency parsing. In: EMNLP, Singapore (2009)Google Scholar
  14. 14.
    Sagae, K., Tsujii, J.: Dependency parsing and domain adaptation with lr models and parser ensembles. In: EMNLP-CONLL, Prague, Czech Republic (2007)Google Scholar
  15. 15.
    Chen, W., Zhang, Y., Isahara, H.: Chinese chunking with tri-training learning. In: Computer processing of oriental languages, pp. 466–473. Springer, Berlin (2006)CrossRefGoogle Scholar
  16. 16.
    Nguyen, T., Nguyen, L., Shimazu, A.: Using semi-supervised learning for question classification. Journal of Natural Language Processing 15, 3–21 (2008)Google Scholar
  17. 17.
    Sindhwani, V., Keerthi, S.: Large scale semi-supervised linear SVMs. In: ACM SIGIR, Seattle, WA (2006)Google Scholar
  18. 18.
    McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Non-projective dependency parsing using spanning tree algorithms. In: HLT-EMNLP 2005, Vancouver, British Columbia (2005)Google Scholar
  19. 19.
    Nivre, J.: Colleagues: MaltParser. Natural Language Engineering 13(2), 95–135 (2007)Google Scholar
  20. 20.
    Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)zbMATHCrossRefGoogle Scholar
  21. 21.
    Brants, S., Hansen, S., Lezius, W., Smith, G.: The TIGER treebank. In: TLT, Sozopol, Bulgaria (2002)Google Scholar
  22. 22.
    Nilsson, J., Hall, J., Nivre, J.: MAMBA meets TIGER: Reconstructing a Swedish treebank from antiquity. In: NODALIDA, Joensuu, Finland (2005)Google Scholar
  23. 23.
    Gimenez, J., Marquez, L.: SVMTool: a general POS tagger generator based on support vector machines. In: LREC, Lisbon, Portugal (2004)Google Scholar
  24. 24.
    Eisner, J.: Three new probabilistic models for dependency parsing. In: COLING, Copenhagen, Denmark (1996)Google Scholar
  25. 25.
    Zeman, D., Žabokrtský, Z.: Improving parsing accuracy by combining diverse dependency parsers. In: IWPT, Vancouver, Canada (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.Center for Language TechnologyUniversity of CopenhagenCopenhagen S

Personalised recommendations