Advertisement

Deep Learning for Natural Language Processing

  • Jiajun ZhangEmail author
  • Chengqing Zong
Chapter
Part of the Cognitive Computation Trends book series (COCT, volume 2)

Abstract

Natural language processing is a field of artificial intelligence and aims at designing computer algorithms to understand and process natural language as humans do. It becomes a necessity in the Internet age and big data era. From fundamental research to sophisticated applications, natural language processing includes many tasks, such as lexical analysis, syntactic and semantic parsing, discourse analysis, text classification, sentiment analysis, summarization, machine translation and question answering. In a long time, statistical models such as Naive Bayes (McCallum and Nigam et al., A comparison of event models for Naive Bayes text classification. In: AAAI-98 workshop on learning for text categorization, Madison, vol 752, pp 41–48, 1998), Support Vector Machine (Cortes and Vapnik, Mach Learn 20(3):273–297, 1995), Maximum Entropy (Berger et al., Comput Linguist 22(1):39–71, 1996) and Conditional Random Fields (Lafferty et al., Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML, 2001) are dominant methods for natural language processing (Manning and Schütze, Foundations of statistical natural language processing. MIT Press, Cambridge/London, 1999; Zong, Statistical natural language processing. Tsinghua University Press, Beijing, 2008). Recent years have witnessed the great success of deep learning in natural language processing, from Chinese word segmentation (Pei et al., Max-margin tensor neural network for Chinese word segmentation. In: Proceedings of ACL, pp 293–303, 2014; Chen et al., Long short-term memory neural networks for Chinese word segmentation. In: Proceedings of EMNLP, pp 1197–1206, 2015; Cai et al., Fast and accurate neural word segmentation for Chinese. In: Proceedings of ACL, pp 608–615, 2017), named entity recognition (Collobert et al., J Mach Learn Res 12:2493–2537, 2011; Lample et al., Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT, 2016; Dong et al., Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In: International conference on computer processing of oriental languages. Springer, pp 239–250, 2016; Dong et al., Multichannel LSTM-CRF for named entity recognition in Chinese social media. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, pp 197–208, 2017), sequential tagging (Vaswani et al., Supertagging with LSTMs. In: Proceedings of NAACL-HLT, pp 232–237, 2016; Wu et al., An empirical exploration of skip connections for sequential tagging. In: Proceedings of COLING, 2016a), syntactic parsing (Socher et al., Parsing with compositional vector grammars. In: Proceedings of ACL, pp 455–465, 2013; Chen and Manning, A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 740–750, 2014; Liu and Zhang, TACL 5:45–58, 2017), text summarization (Rush et al., A neural attention model for abstractive sentence summarization. In: Proceedings of EMNLP, 2015; See et al., Get to the point: summarization with pointer-generator networks. In: Proceedings of ACL, 2017), machine translation (Bahdanau et al., Neural machine translation by jointly learning to align and translate. In: Proceedings of ICLR, 2015; Sutskever et al., Sequence to sequence learning with neural networks. In: Proceedings of NIPS, 2014; Vawani et al., Attention is all you need. arXiv preprint arXiv:1706.03762, 2017) to question answering (Andreas et al., Learning to compose neural networks for question answering. In: Proceedings of NAACL-HLT, pp 232–237, 2016; Bordes et al., Question answering with subgraph embeddings. arXiv preprint arXiv:1406.3676, 2014; Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075, 2015; Yu et al., Deep learning for answer sentence selection. arXiv preprint arXiv:1412.1632, 2014). This chapter employs entity recognition, supertagging, machine translation and text summarization as case study to introduce the application of deep learning in natural language processing.

Keywords

Named entity recognition Super tagging Machine translation Text summarization Deep learning Natural language processing 

References

  1. Andreas J, Rohrbach M, Darrell T, Klein D (2016) Learning to compose neural networks for question answering. In: Proceedings of NAACL-HLT, pp 232–237Google Scholar
  2. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceedings of ICLRGoogle Scholar
  3. Barzilay R, McKeown KR (2005) Sentence fusion for multidocument news summarization. Comput Linguist 31(3):297–328Google Scholar
  4. Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166PubMedPubMedCentralGoogle Scholar
  5. Berger AL, Pietra VJD, Pietra SAD (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1):39–71Google Scholar
  6. Boitet C, Guillaume P, Quezel-Ambrunaz M (1982) Implementation and conversational environment of ariane 78.4, an integrated system for automated translation and human revision. In: Proceedings of the 9th conference on computational linguistics-volume 1. Academia Praha, pp 19–27Google Scholar
  7. Bordes A, Chopra S, Weston J (2014) Question answering with subgraph embeddings. arXiv preprint arXiv:1406.3676Google Scholar
  8. Bordes A, Usunier N, Chopra S, Weston J (2015) Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075Google Scholar
  9. Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311Google Scholar
  10. Cai D, Zhao H, Zhang Z, Xin Y, Wu Y, Huang F (2017) Fast and accurate neural word segmentation for Chinese. In: Proceedings of ACL, pp 608–615Google Scholar
  11. Chen D, Manning C (2014) A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 740–750Google Scholar
  12. Chen X, Qiu X, Zhu C, Liu P, Huang X (2015) Long short-term memory neural networks for Chinese word segmentation. In: Proceedings of EMNLP, pp 1197–1206Google Scholar
  13. Cheng Y, Xu W, He Z, He W, Wu H, Sun M, Liu Y (2016) Semi-supervised learning for neural machine translation. In: Proceedings of ACL 2016Google Scholar
  14. Chiu JP, Nichols E (2016) Named entity recognition with bidirectional LSTM-CNNs. Trans ACL 4:357–370Google Scholar
  15. Chopra S, Auli M, Rush AM (2016) Abstractive sentence summarization with attentive recurrent neural networks. In: Proceedings of NAACL-HLT, pp 93–98Google Scholar
  16. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537Google Scholar
  17. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297Google Scholar
  18. Devlin J, Zbib R, Huang Z, Lamar T, Schwartz RM, Makhoul J (2014) Fast and robust neural network joint models for statistical machine translation. In: Proceedings of ACL, pp 1370–1380Google Scholar
  19. Dong C, Zhang J, Zong C, Hattori M, Di H (2016) Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In: International conference on computer processing of oriental languages. Springer, pp 239–250Google Scholar
  20. Dong C, Wu H, Zhang J, Zong C (2017) Multichannel LSTM-CRF for named entity recognition in Chinese social media. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, pp 197–208Google Scholar
  21. Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479Google Scholar
  22. Filippova K, Strube M (2008) Sentence fusion via dependency graph compression. In: Proceedings of EMNLP, pp 177–185Google Scholar
  23. Fonseca ER, Rosa JLG, Aluísio SM (2015) Evaluating word embeddings and a revised corpus for part-of-speech tagging in Portuguese. J Braz Comput Soc 21(1):1–14Google Scholar
  24. Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122Google Scholar
  25. Gulcehre C, Ahn S, Nallapati R, Zhou B, Bengio Y (2016) Pointing the unknown words. In: Proceedings of ACLGoogle Scholar
  26. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. In: Proceedings of CVPRGoogle Scholar
  27. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780PubMedGoogle Scholar
  28. Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of NAACLGoogle Scholar
  29. Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICMLGoogle Scholar
  30. Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings of NAACL-HLTGoogle Scholar
  31. Li P, Liu Y, Sun M (2013) Recursive autoencoders for ITG-based translation. In: Proceedings of EMNLPGoogle Scholar
  32. Li X, Zhang J, Zong C (2016) Towards zero unknown word in neural machine translation. In: Proceedings of IJCAI 2016Google Scholar
  33. Liu J, Zhang Y (2017) Shift-reduce constituent parsing with neural lookahead features. TACL 5(Jan):45–58Google Scholar
  34. Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165Google Scholar
  35. Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of EMNLP 2015Google Scholar
  36. Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of ACLGoogle Scholar
  37. Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge/LondonGoogle Scholar
  38. McCallum A, Nigam K et al (1998) A comparison of event models for Naive Bayes text classification. In: AAAI-98 workshop on learning for text categorization, Madison, vol 752, pp 41–48Google Scholar
  39. Mi H, Sankaran B, Wang Z, Ittycheriah A (2016) A coverage embedding model for neural machine translation. In: Proceedings of EMNLP 2016Google Scholar
  40. Nagao M (1984) A framework of a mechanical translation between Japanese and English by analogy principle. In: Elithorn A, Banerji R (eds) Artificial and human intelligence, Elsevier Science Publishers B.V., pp 173–180Google Scholar
  41. Nallapati R, Zhou B, Gulcehre C, Xiang B et al (2016) Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Proceedings of CoNLLGoogle Scholar
  42. Nallapati R, Zhai F, Zhou B (2017) Summarunner: a recurrent neural network based sequence model for extractive summarization of documents. In: AAAI, pp 3075–3081Google Scholar
  43. Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of ACLGoogle Scholar
  44. Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Technical report, Stanford InfoLabGoogle Scholar
  45. Pei W, Ge T, Chang B (2014) Max-margin tensor neural network for Chinese word segmentation. In: Proceedings of ACL, pp 293–303Google Scholar
  46. Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. In: Proceedings of EMNLPGoogle Scholar
  47. See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator networks. In: Proceedings of ACLGoogle Scholar
  48. Sennrich R, Haddow B, Birch A (2016a) Improving neural machine translation models with monolingual data. In: Proceedings of ACL 2016Google Scholar
  49. Sennrich R, Haddow B, Birch A (2016b) Neural machine translation of rare words with subword units. In: Proceedings of ACL 2016Google Scholar
  50. Socher R, Bauer J, Manning CD et al (2013) Parsing with compositional vector grammars. In: Proceedings of ACL, pp 455–465Google Scholar
  51. Steedman M (2000) The syntactic process, vol 24. MIT Press, CambridgeGoogle Scholar
  52. Steedman M, Baldridge J (2011) Combinatory categorial grammar. In: Borsley RD, Börjars K (eds) Non-transformational syntax: formal and explicit models of grammar. Wiley-Blackwell, Chichester/MaldenGoogle Scholar
  53. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of NIPSGoogle Scholar
  54. Tu Z, Lu Z, Liu Y, Liu X, Li H (2016) Coverage-based neural machine translation. In: Proceedings of ACL 2016Google Scholar
  55. Vaswani A, Zhao Y, Fossum V, Chiang D (2013) Decoding with large-scale neural language models improves translation. In: Proceedings of EMNLP, pp 1387–1392Google Scholar
  56. Vaswani A, Bisk Y, Sagae K, Musa R (2016) Supertagging with LSTMs. In: Proceedings of NAACL-HLT, pp 232–237Google Scholar
  57. Vawani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv preprint arXiv:1706.03762Google Scholar
  58. Wu H, Zhang J, Zong C (2016a) An empirical exploration of skip connections for sequential tagging. In: Proceedings of COLING, pp 232–237Google Scholar
  59. Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al (2016b) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144Google Scholar
  60. Wu H, Zhang J, Zong C (2017) A dynamic window neural network for CCG supertagging. In: Proceedings of AAAIGoogle Scholar
  61. Yu L, Hermann KM, Blunsom P, Pulman S (2014) Deep learning for answer sentence selection. arXiv preprint arXiv:1412.1632Google Scholar
  62. Zhang J, Zong C (2015) Deep neural networks in machine translation: an overview. IEEE Intell Syst 30(5):16–25Google Scholar
  63. Zhang J, Zong C (2016) Exploring source-side monolingual data in neural machine translation. In: Proceedings of EMNLP 2016Google Scholar
  64. Zhang J, Liu S, Li M, Zhou M, Zong C (2014a) Bilingually-constrained phrase embeddings for machine translation. In: Proceedings of ACL, pp 111–121Google Scholar
  65. Zhang J, Liu S, Li M, Zhou M, Zong C (2014b) Mind the gap: machine translation by minimizing the semantic gap in embedding space. In: AAAI, pp 1657–1664Google Scholar
  66. Zhou Q, Yang N, Wei F, Zhou M (2017) Selective encoding for abstractive sentence summarization. In: Proceedings of ACLGoogle Scholar
  67. Zong C (2008) Statistical natural language processing. Tsinghua University Press, BeijingGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of SciencesBeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina
  3. 3.CAS Center for Excellence in Brain Science and Intelligence TechnologyUniversity of Chinese Academy of SciencesBeijingPeople’s Republic of China

Personalised recommendations