Skip to main content

Deep Learning for Natural Language Processing

  • Chapter
  • First Online:
Deep Learning: Fundamentals, Theory and Applications

Part of the book series: Cognitive Computation Trends ((COCT,volume 2))

Abstract

Natural language processing is a field of artificial intelligence and aims at designing computer algorithms to understand and process natural language as humans do. It becomes a necessity in the Internet age and big data era. From fundamental research to sophisticated applications, natural language processing includes many tasks, such as lexical analysis, syntactic and semantic parsing, discourse analysis, text classification, sentiment analysis, summarization, machine translation and question answering. In a long time, statistical models such as Naive Bayes (McCallum and Nigam et al., A comparison of event models for Naive Bayes text classification. In: AAAI-98 workshop on learning for text categorization, Madison, vol 752, pp 41–48, 1998), Support Vector Machine (Cortes and Vapnik, Mach Learn 20(3):273–297, 1995), Maximum Entropy (Berger et al., Comput Linguist 22(1):39–71, 1996) and Conditional Random Fields (Lafferty et al., Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML, 2001) are dominant methods for natural language processing (Manning and Schütze, Foundations of statistical natural language processing. MIT Press, Cambridge/London, 1999; Zong, Statistical natural language processing. Tsinghua University Press, Beijing, 2008). Recent years have witnessed the great success of deep learning in natural language processing, from Chinese word segmentation (Pei et al., Max-margin tensor neural network for Chinese word segmentation. In: Proceedings of ACL, pp 293–303, 2014; Chen et al., Long short-term memory neural networks for Chinese word segmentation. In: Proceedings of EMNLP, pp 1197–1206, 2015; Cai et al., Fast and accurate neural word segmentation for Chinese. In: Proceedings of ACL, pp 608–615, 2017), named entity recognition (Collobert et al., J Mach Learn Res 12:2493–2537, 2011; Lample et al., Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT, 2016; Dong et al., Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In: International conference on computer processing of oriental languages. Springer, pp 239–250, 2016; Dong et al., Multichannel LSTM-CRF for named entity recognition in Chinese social media. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, pp 197–208, 2017), sequential tagging (Vaswani et al., Supertagging with LSTMs. In: Proceedings of NAACL-HLT, pp 232–237, 2016; Wu et al., An empirical exploration of skip connections for sequential tagging. In: Proceedings of COLING, 2016a), syntactic parsing (Socher et al., Parsing with compositional vector grammars. In: Proceedings of ACL, pp 455–465, 2013; Chen and Manning, A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 740–750, 2014; Liu and Zhang, TACL 5:45–58, 2017), text summarization (Rush et al., A neural attention model for abstractive sentence summarization. In: Proceedings of EMNLP, 2015; See et al., Get to the point: summarization with pointer-generator networks. In: Proceedings of ACL, 2017), machine translation (Bahdanau et al., Neural machine translation by jointly learning to align and translate. In: Proceedings of ICLR, 2015; Sutskever et al., Sequence to sequence learning with neural networks. In: Proceedings of NIPS, 2014; Vawani et al., Attention is all you need. arXiv preprint arXiv:1706.03762, 2017) to question answering (Andreas et al., Learning to compose neural networks for question answering. In: Proceedings of NAACL-HLT, pp 232–237, 2016; Bordes et al., Question answering with subgraph embeddings. arXiv preprint arXiv:1406.3676, 2014; Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075, 2015; Yu et al., Deep learning for answer sentence selection. arXiv preprint arXiv:1412.1632, 2014). This chapter employs entity recognition, supertagging, machine translation and text summarization as case study to introduce the application of deep learning in natural language processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Andreas J, Rohrbach M, Darrell T, Klein D (2016) Learning to compose neural networks for question answering. In: Proceedings of NAACL-HLT, pp 232–237

    Google Scholar 

  • Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceedings of ICLR

    Google Scholar 

  • Barzilay R, McKeown KR (2005) Sentence fusion for multidocument news summarization. Comput Linguist 31(3):297–328

    Article  Google Scholar 

  • Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166

    Article  CAS  PubMed  Google Scholar 

  • Berger AL, Pietra VJD, Pietra SAD (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1):39–71

    Google Scholar 

  • Boitet C, Guillaume P, Quezel-Ambrunaz M (1982) Implementation and conversational environment of ariane 78.4, an integrated system for automated translation and human revision. In: Proceedings of the 9th conference on computational linguistics-volume 1. Academia Praha, pp 19–27

    Google Scholar 

  • Bordes A, Chopra S, Weston J (2014) Question answering with subgraph embeddings. arXiv preprint arXiv:1406.3676

    Google Scholar 

  • Bordes A, Usunier N, Chopra S, Weston J (2015) Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075

    Google Scholar 

  • Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311

    Google Scholar 

  • Cai D, Zhao H, Zhang Z, Xin Y, Wu Y, Huang F (2017) Fast and accurate neural word segmentation for Chinese. In: Proceedings of ACL, pp 608–615

    Google Scholar 

  • Chen D, Manning C (2014) A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 740–750

    Google Scholar 

  • Chen X, Qiu X, Zhu C, Liu P, Huang X (2015) Long short-term memory neural networks for Chinese word segmentation. In: Proceedings of EMNLP, pp 1197–1206

    Google Scholar 

  • Cheng Y, Xu W, He Z, He W, Wu H, Sun M, Liu Y (2016) Semi-supervised learning for neural machine translation. In: Proceedings of ACL 2016

    Google Scholar 

  • Chiu JP, Nichols E (2016) Named entity recognition with bidirectional LSTM-CNNs. Trans ACL 4:357–370

    Google Scholar 

  • Chopra S, Auli M, Rush AM (2016) Abstractive sentence summarization with attentive recurrent neural networks. In: Proceedings of NAACL-HLT, pp 93–98

    Google Scholar 

  • Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537

    Google Scholar 

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    Google Scholar 

  • Devlin J, Zbib R, Huang Z, Lamar T, Schwartz RM, Makhoul J (2014) Fast and robust neural network joint models for statistical machine translation. In: Proceedings of ACL, pp 1370–1380

    Google Scholar 

  • Dong C, Zhang J, Zong C, Hattori M, Di H (2016) Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In: International conference on computer processing of oriental languages. Springer, pp 239–250

    Google Scholar 

  • Dong C, Wu H, Zhang J, Zong C (2017) Multichannel LSTM-CRF for named entity recognition in Chinese social media. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, pp 197–208

    Google Scholar 

  • Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479

    Article  Google Scholar 

  • Filippova K, Strube M (2008) Sentence fusion via dependency graph compression. In: Proceedings of EMNLP, pp 177–185

    Google Scholar 

  • Fonseca ER, Rosa JLG, Aluísio SM (2015) Evaluating word embeddings and a revised corpus for part-of-speech tagging in Portuguese. J Braz Comput Soc 21(1):1–14

    Article  Google Scholar 

  • Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122

    Google Scholar 

  • Gulcehre C, Ahn S, Nallapati R, Zhou B, Bengio Y (2016) Pointing the unknown words. In: Proceedings of ACL

    Book  Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. In: Proceedings of CVPR

    Google Scholar 

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  CAS  PubMed  Google Scholar 

  • Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of NAACL

    Book  Google Scholar 

  • Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML

    Google Scholar 

  • Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT

    Book  Google Scholar 

  • Li P, Liu Y, Sun M (2013) Recursive autoencoders for ITG-based translation. In: Proceedings of EMNLP

    Google Scholar 

  • Li X, Zhang J, Zong C (2016) Towards zero unknown word in neural machine translation. In: Proceedings of IJCAI 2016

    Google Scholar 

  • Liu J, Zhang Y (2017) Shift-reduce constituent parsing with neural lookahead features. TACL 5(Jan):45–58

    Google Scholar 

  • Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165

    Article  Google Scholar 

  • Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of EMNLP 2015

    Google Scholar 

  • Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of ACL

    Book  Google Scholar 

  • Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge/London

    Google Scholar 

  • McCallum A, Nigam K et al (1998) A comparison of event models for Naive Bayes text classification. In: AAAI-98 workshop on learning for text categorization, Madison, vol 752, pp 41–48

    Google Scholar 

  • Mi H, Sankaran B, Wang Z, Ittycheriah A (2016) A coverage embedding model for neural machine translation. In: Proceedings of EMNLP 2016

    Google Scholar 

  • Nagao M (1984) A framework of a mechanical translation between Japanese and English by analogy principle. In: Elithorn A, Banerji R (eds) Artificial and human intelligence, Elsevier Science Publishers B.V., pp 173–180

    Google Scholar 

  • Nallapati R, Zhou B, Gulcehre C, Xiang B et al (2016) Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Proceedings of CoNLL

    Book  Google Scholar 

  • Nallapati R, Zhai F, Zhou B (2017) Summarunner: a recurrent neural network based sequence model for extractive summarization of documents. In: AAAI, pp 3075–3081

    Google Scholar 

  • Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of ACL

    Google Scholar 

  • Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Technical report, Stanford InfoLab

    Google Scholar 

  • Pei W, Ge T, Chang B (2014) Max-margin tensor neural network for Chinese word segmentation. In: Proceedings of ACL, pp 293–303

    Google Scholar 

  • Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. In: Proceedings of EMNLP

    Book  Google Scholar 

  • See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator networks. In: Proceedings of ACL

    Google Scholar 

  • Sennrich R, Haddow B, Birch A (2016a) Improving neural machine translation models with monolingual data. In: Proceedings of ACL 2016

    Google Scholar 

  • Sennrich R, Haddow B, Birch A (2016b) Neural machine translation of rare words with subword units. In: Proceedings of ACL 2016

    Google Scholar 

  • Socher R, Bauer J, Manning CD et al (2013) Parsing with compositional vector grammars. In: Proceedings of ACL, pp 455–465

    Google Scholar 

  • Steedman M (2000) The syntactic process, vol 24. MIT Press, Cambridge

    Google Scholar 

  • Steedman M, Baldridge J (2011) Combinatory categorial grammar. In: Borsley RD, Börjars K (eds) Non-transformational syntax: formal and explicit models of grammar. Wiley-Blackwell, Chichester/Malden

    Google Scholar 

  • Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of NIPS

    Google Scholar 

  • Tu Z, Lu Z, Liu Y, Liu X, Li H (2016) Coverage-based neural machine translation. In: Proceedings of ACL 2016

    Google Scholar 

  • Vaswani A, Zhao Y, Fossum V, Chiang D (2013) Decoding with large-scale neural language models improves translation. In: Proceedings of EMNLP, pp 1387–1392

    Google Scholar 

  • Vaswani A, Bisk Y, Sagae K, Musa R (2016) Supertagging with LSTMs. In: Proceedings of NAACL-HLT, pp 232–237

    Google Scholar 

  • Vawani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv preprint arXiv:1706.03762

    Google Scholar 

  • Wu H, Zhang J, Zong C (2016a) An empirical exploration of skip connections for sequential tagging. In: Proceedings of COLING, pp 232–237

    Google Scholar 

  • Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al (2016b) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144

    Google Scholar 

  • Wu H, Zhang J, Zong C (2017) A dynamic window neural network for CCG supertagging. In: Proceedings of AAAI

    Google Scholar 

  • Yu L, Hermann KM, Blunsom P, Pulman S (2014) Deep learning for answer sentence selection. arXiv preprint arXiv:1412.1632

    Google Scholar 

  • Zhang J, Zong C (2015) Deep neural networks in machine translation: an overview. IEEE Intell Syst 30(5):16–25

    Article  Google Scholar 

  • Zhang J, Zong C (2016) Exploring source-side monolingual data in neural machine translation. In: Proceedings of EMNLP 2016

    Google Scholar 

  • Zhang J, Liu S, Li M, Zhou M, Zong C (2014a) Bilingually-constrained phrase embeddings for machine translation. In: Proceedings of ACL, pp 111–121

    Google Scholar 

  • Zhang J, Liu S, Li M, Zhou M, Zong C (2014b) Mind the gap: machine translation by minimizing the semantic gap in embedding space. In: AAAI, pp 1657–1664

    Google Scholar 

  • Zhou Q, Yang N, Wei F, Zhou M (2017) Selective encoding for abstractive sentence summarization. In: Proceedings of ACL

    Book  Google Scholar 

  • Zong C (2008) Statistical natural language processing. Tsinghua University Press, Beijing

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiajun Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zhang, J., Zong, C. (2019). Deep Learning for Natural Language Processing. In: Huang, K., Hussain, A., Wang, QF., Zhang, R. (eds) Deep Learning: Fundamentals, Theory and Applications. Cognitive Computation Trends, vol 2. Springer, Cham. https://doi.org/10.1007/978-3-030-06073-2_5

Download citation

Publish with us

Policies and ethics