Advertisement

Deep Learning in Lexical Analysis and Parsing

  • Wanxiang Che
  • Yue Zhang
Chapter

Abstract

Lexical analysis and parsing tasks model the deeper properties of the words and their relationships to each other. The commonly used techniques involve word segmentation, part-of-speech tagging and parsing. A typical characteristic of such tasks is that the outputs are structured. Two types of methods are usually used to solve these structured prediction tasks: graph-based methods and transition-based methods. Graph-based methods differentiate output structures based on their characteristics directly, while transition-based methods transform output construction processes into state transition processes, differentiating sequences of transition actions. Neural network models have been successfully used for both graph-based and transition-based structured prediction. In this chapter, we give a review of applying deep learning in lexical analysis and parsing, and compare with traditional statistical methods.

References

  1. Andor, D., Alberti, C., Weiss, D., Severyn, A., Presta, A., Ganchev, K., et al. (2016). Globally normalized transition-based neural networks. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 2442–2452). Berlin, Germany: Association for Computational Linguistics.Google Scholar
  2. Ballesteros, M., Dyer, C., & Smith, N. A. (2015). Improved transition-based parsing by modeling characters instead of words with LSTMs. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 349–359). Lisbon, Portugal: Association for Computational Linguistics.Google Scholar
  3. Ballesteros, M., Goldberg, Y., Dyer, C., & Smith, N. A. (2016). Training with exploration improves a greedy stack LSTM parser. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2005–2010). Austin, Texas: Association for Computational Linguistics.Google Scholar
  4. Bengio, S., Vinyals, O., Jaitly, N., & Shazeer, N. (2015). Scheduled sampling for sequence prediction with recurrent neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15 (pp. 1171–1179). Cambridge, MA, USA: MIT Press.Google Scholar
  5. Bohnet, B. & Nivre, J. (2012). A transition-based system for joint part-of-speech tagging and labeled non-projective dependency parsing. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 1455–1465). Jeju Island, Korea: Association for Computational Linguistics.Google Scholar
  6. Booth, T. L. (1969). Probabilistic representation of formal languages. 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, 00, 74–81.Google Scholar
  7. Cai, D., & Zhao, H. (2016). Neural word segmentation learning for Chinese. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 409–420). Berlin, Germany: Association for Computational Linguistics.Google Scholar
  8. Carnie, A. (2012). Syntax: A Generative Introduction (3rd ed.). New York: Wiley-Blackwell.Google Scholar
  9. Chen, D., & Manning, C. (2014). A fast and accurate dependency parser using neural networks. In Proceedings of EMNLP-2014.Google Scholar
  10. Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1724–1734). Doha, Qatar: Association for Computational Linguistics.Google Scholar
  11. Choi, J. D., & Palmer, M. (2011). Getting the most out of transition-based dependency parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 687–692). Portland, Oregon, USA: Association for Computational Linguistics.Google Scholar
  12. Chu, Y., & Liu, T. (1965). On the shortest arborescence of a directed graph. Scientia Sinica, 14, 1396–1400.MathSciNetzbMATHGoogle Scholar
  13. Clark, S., & Curran, J. R. (2007). Wide-coverage efficient statistical parsing with ccg and log-linear models. Computational Linguistics, 33(4), 493–552.CrossRefGoogle Scholar
  14. Collins, M. (1997). Three generative, lexicalised models for statistical parsing. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (pp. 16–23). Madrid, Spain: Association for Computational Linguistics.Google Scholar
  15. Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (pp. 1–8). Association for Computational Linguistics.Google Scholar
  16. Collins, M., & Roark, B. (2004). Incremental parsing with the perceptron algorithm. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL’04), Main Volume (pp. 111–118). Barcelona, Spain.Google Scholar
  17. Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08 (pp. 160–167). New York, NY, USA: ACM.Google Scholar
  18. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493–2537.zbMATHGoogle Scholar
  19. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research, 7, 551–585.MathSciNetzbMATHGoogle Scholar
  20. Crammer, K., & Singer, Y. (2003). Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3, 951–991.zbMATHGoogle Scholar
  21. Cross, J., & Huang, L. (2016). Span-based constituency parsing with a structure-label system and provably optimal dynamic oracles. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 1–11). Austin, Texas: Association for Computational Linguistics.Google Scholar
  22. Dozat, T., & Manning, C. D. (2016). Deep biaffine attention for neural dependency parsing. CoRR, abs/1611.01734.Google Scholar
  23. Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121–2159.MathSciNetzbMATHGoogle Scholar
  24. Durrett, G., & Klein, D. (2015). Neural CRF parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 1: Long Papers, pp. 302–312). Beijing, China: Association for Computational Linguistics.Google Scholar
  25. Dyer, C., Ballesteros, M., Ling, W., Matthews, A., & Smith, N. A. (2015). Transition-based dependency parsing with stack long short-term memory. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 1: Long Papers, pp. 334–343). Beijing, China: Association for Computational Linguistics.Google Scholar
  26. Edmonds, J. (1967). Optimum branchings. Journal of Research of the National Bureau of Standards, 71B, 233–240.MathSciNetCrossRefGoogle Scholar
  27. Eisner, J. (1996). Efficient normal-form parsing for combinatory categorial grammar. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (pp. 79–86). Santa Cruz, California, USA: Association for Computational Linguistics.Google Scholar
  28. Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211.CrossRefGoogle Scholar
  29. Freund, Y., & Schapire, R. E. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37(3), 277–296.CrossRefGoogle Scholar
  30. Graves, A. (2008). Supervised sequence labelling with recurrent neural networks. Ph.D. thesis, Technical University Munich.Google Scholar
  31. Hall, D., Durrett, G., & Klein, D. (2014). Less grammar, more features. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 228–237). Baltimore, MD: Association for Computational Linguistics.Google Scholar
  32. Hatori, J., Matsuzaki, T., Miyao, Y., & Tsujii, J. (2012). Incremental joint approach to word segmentation, pos tagging, and dependency parsing in Chinese. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 1045–1053), Jeju Island, Korea: Association for Computational Linguistics.Google Scholar
  33. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRefGoogle Scholar
  34. Huang, L., Fayong, S., & Guo, Y. (2012). Structured perceptron with inexact search. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 142–151). Montréal, Canada: Association for Computational Linguistics.Google Scholar
  35. Jurafsky, D., & Martin, J. H. (2009). Speech and language processing (2nd ed.). Upper Saddle River, NJ, USA: Prentice-Hall Inc.Google Scholar
  36. Kbler, S., McDonald, R., & Nivre, J. (2009). Dependency parsing. Synthesis Lectures on Human Language Technologies, 2(1), 1–127.CrossRefGoogle Scholar
  37. Kiperwasser, E., & Goldberg, Y. (2016). Simple and accurate dependency parsing using bidirectional lstm feature representations. Transactions of the Association for Computational Linguistics, 4, 313–327.Google Scholar
  38. Lafferty, J. D., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01 (pp. 282–289), San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.Google Scholar
  39. Lewis, M., & Steedman, M. (2014). A* CCG parsing with a supertag-factored model. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 990–1000). Doha, Qatar: Association for Computational Linguistics.Google Scholar
  40. Li, F., Zhang, Y., Zhang, M., & Ji, D. (2016). Joint models for extracting adverse drug events from biomedical text. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016 (pp. 2838–2844). New York, NY, USA, 9–15 July 2016.Google Scholar
  41. Li, Q., & Ji, H. (2014). Incremental joint extraction of entity mentions and relations. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 402–412). Baltimore, MD: Association for Computational Linguistics.Google Scholar
  42. Liu, J., & Zhang, Y. (2015). An empirical comparison between n-gram and syntactic language models for word ordering. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 369–378). Lisbon, Portugal: Association for Computational Linguistics.Google Scholar
  43. Liu, Y., Che, W., Guo, J., Qin, B., & Liu, T. (2016). Exploring segment representations for neural segmentation models. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016 (pp. 2880–2886). New York, NY, USA, 9–15 July 2016.Google Scholar
  44. Liu, Y., Zhang, Y., Che, W., & Qin, B. (2015). Transition-based syntactic linearization. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 113–122). Denver, Colorado: Association for Computational Linguistics.Google Scholar
  45. Luong, T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 1412–1421). Lisbon, Portugal: Association for Computational Linguistics.Google Scholar
  46. Lyu, C., Zhang, Y., & Ji, D. (2016). Joint word segmentation, pos-tagging and syntactic chunking. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16 (pp. 3007–3014). AAAI Press.Google Scholar
  47. Manning, C. D., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA, USA: MIT Press.zbMATHGoogle Scholar
  48. McDonald, R. (2006). Discriminative learning spanning tree algorithm for dependency parsing. PhD thesis, University of Pennsylvania.Google Scholar
  49. Nivre, J. (2003). An efficient algorithm for projective dependency parsing. In Proceedings of the 8th International Workshop on Parsing Technologies (IWPT) (pp. 149–160).Google Scholar
  50. Nivre, J. (2008). Algorithms for deterministic incremental dependency parsing. Computational Linguistics, 34(4), 513–554.MathSciNetCrossRefGoogle Scholar
  51. Pei, W., Ge, T., & Chang, B. (2015). An effective neural network model for graph-based dependency parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 1: Long Papers, pp. 313–322), Beijing, China: Association for Computational Linguistics.Google Scholar
  52. Petrov, S., Barrett, L., Thibaux, R., & Klein, D. (2006). Learning accurate, compact, and interpretable tree annotation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp. 433–440), Sydney, Australia: Association for Computational Linguistics.Google Scholar
  53. Puduppully, R., Zhang, Y., & Shrivastava, M. (2016). Transition-based syntactic linearization with lookahead features. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 488–493). San Diego, CA: Association for Computational Linguistics.Google Scholar
  54. Qian, T., Zhang, Y., Zhang, M., Ren, Y., & Ji, D. (2015). A transition-based model for joint segmentation, pos-tagging and normalization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 1837–1846), Lisbon, Portugal: Association for Computational Linguistics.Google Scholar
  55. Sagae, K., & Lavie, A. (2005). A classifier-based parser with linear run-time complexity. In Proceedings of the Ninth International Workshop on Parsing Technology, Parsing ’05 (pp. 125–132). Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
  56. Sagae, K., Lavie, A., & MacWhinney, B. (2005). Automatic measurement of syntactic development in child language. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05) (pp. 197–204). Ann Arbor, MI: Association for Computational Linguistics.Google Scholar
  57. Sarawagi, S., & Cohen, W. W. (2004). Semi-Markov conditional random fields for information extraction. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems 17 (pp. 1185–1192). Cambridge: MIT Press.Google Scholar
  58. Shaalan, K. (2014). A survey of arabic named entity recognition and classification. Computational Linguistics, 40(2), 469–510.CrossRefGoogle Scholar
  59. Smith, N. A. (2011). Linguistic structure prediction. Morgan and Claypool: Synthesis Lectures on Human Language Technologies.Google Scholar
  60. Song, L., Zhang, Y., Song, K., & Liu, Q. (2014). Joint morphological generation and syntactic linearization. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI’14 (pp. 1522–1528). AAAI Press.Google Scholar
  61. Vaswani, A., Bisk, Y., Sagae, K., & Musa, R. (2016). Supertagging with LSTMs. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 232–237). San Diego, CA: Association for Computational Linguistics.Google Scholar
  62. Wang, W., & Chang, B. (2016). Graph-based dependency parsing with bidirectional LSTM. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 2306–2315). Berlin, Germany: Association for Computational Linguistics.Google Scholar
  63. Wang, Z., & Xue, N. (2014). Joint pos tagging and transition-based constituent parsing in Chinese with non-local features. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 733–742). Baltimore, MD: Association for Computational Linguistics.Google Scholar
  64. Watanabe, T., & Sumita, E. (2015). Transition-based neural constituent parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 1: Long Papers, pp. 1169–1179). Beijing, China: Association for Computational Linguistics.Google Scholar
  65. Wong, K.-F., Li, W., Xu, R., & Zhang, Z.-s., (2009). Introduction to Chinese natural language processing. Synthesis Lectures on Human Language Technologies,2(1), 1–148.CrossRefGoogle Scholar
  66. Xu, W., Auli, M., & Clark, S. (2015). CCG supertagging with a recurrent neural network. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 2: Short Papers, pp. 250–255). Beijing, China: Association for Computational Linguistics.Google Scholar
  67. Xu, W., Auli, M., & Clark, S. (2016). Expected f-measure training for shift-reduce parsing with recurrent neural networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 210–220). San Diego, CA: Association for Computational Linguistics.Google Scholar
  68. Xu, W., Clark, S., & Zhang, Y. (2014). Shift-reduce CCG parsing with a dependency model. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers).Google Scholar
  69. Xue, N. (2003). Chinese word segmentation as character tagging. International Journal of Computational Linguistics and Chinese Language Processing, 8, 29–48.Google Scholar
  70. Yamada, H., & Matsumoto, Y. (2003). Statistical dependency analysis with support vector machines. In In Proceedings of IWPT (pp. 195–206).Google Scholar
  71. Zhang, M., & Zhang, Y. (2015). Combining discrete and continuous features for deterministic transition-based dependency parsing. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 1316–1321). Lisbon, Portugal: Association for Computational Linguistics.Google Scholar
  72. Zhang, M., Zhang, Y., Che, W., & Liu, T. (2013). Chinese parsing exploiting characters. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 125–134). Sofia, Bulgaria: Association for Computational Linguistics.Google Scholar
  73. Zhang, M., Zhang, Y., Che, W., & Liu, T. (2014). Character-level Chinese dependency parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 1326–1336). Baltimore, MD: Association for Computational Linguistics.Google Scholar
  74. Zhang, M., Zhang, Y., & Fu, G. (2016a). Transition-based neural word segmentation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 421–431), Berlin, Germany: Association for Computational Linguistics.Google Scholar
  75. Zhang, Y., & Clark, S. (2007). Chinese segmentation with a word-based perceptron algorithm. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (pp. 840–847), Prague, Czech Republic: Association for Computational Linguistics.Google Scholar
  76. Zhang, Y., & Clark, S. (2008a). Joint word segmentation and POS tagging using a single perceptron. In Proceedings of ACL-08: HLT (pp. 888–896). Columbus, OH: Association for Computational Linguistics.Google Scholar
  77. Zhang, Y., & Clark, S. (2008b). A tale of two parsers: Investigating and combining graph-based and transition-based dependency parsing. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (pp. 562–571), Honolulu, HI: Association for Computational Linguistics.Google Scholar
  78. Zhang, Y., & Clark, S. (2009). Transition-based parsing of the Chinese Treebank using a global discriminative model. In Proceedings of the 11th International Conference on Parsing Technologies, IWPT ’09 (pp. 162–171). Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
  79. Zhang, Y., & Clark, S. (2010). A fast decoder for joint word segmentation and POS-tagging using a single discriminative model. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (pp. 843–852). Cambridge, MA: Association for Computational Linguistics.Google Scholar
  80. Zhang, Y., & Clark, S. (2011a). Shift-reduce CCG parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 683–692). Portland, OR, USA: Association for Computational Linguistics.Google Scholar
  81. Zhang, Y., & Clark, S. (2011b). Syntactic processing using the generalized perceptron and beam search. Computational Linguistics,37(1).CrossRefGoogle Scholar
  82. Zhang, Y., & Nivre, J. (2011). Transition-based dependency parsing with rich non-local features. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 188–193). Portland, OR, USA: Association for Computational Linguistics.Google Scholar
  83. Zhang, Z., Zhao, H., & Qin, L. (2016b). Probabilistic graph-based dependency parsing with convolutional neural network. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 1382–1392), Berlin, Germany: Association for Computational Linguistics.Google Scholar
  84. Zhou, H., Zhang, Y., Huang, S., & Chen, J. (2015). A neural probabilistic structured-prediction model for transition-based dependency parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 1: Long Papers, pp. 1213–1222), Beijing, China: Association for Computational Linguistics.Google Scholar
  85. Zhou, J., & Xu, W. (2015). End-to-end learning of semantic role labeling using recurrent neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 1: Long Papers, pp. 1127–1137), Beijing, China: Association for Computational Linguistics.Google Scholar
  86. Zhu, M., Zhang, Y., Chen, W., Zhang, M., & Zhu, J. (2013). Fast and accurate shift-reduce constituent parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 434–443), Sofia, Bulgaria: Association for Computational Linguistics.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Harbin Institute of TechnologyHarbinChina
  2. 2.Singapore University of Technology and DesignSingaporeSingapore

Personalised recommendations