Deep Learning for Natural Language Processing

Zhang, Jiajun; Zong, Chengqing

doi:10.1007/978-3-030-06073-2_5

Jiajun Zhang^6,7 &
Chengqing Zong^6,7,8

Part of the book series: Cognitive Computation Trends ((COCT,volume 2))

4811 Accesses
2 Citations

Abstract

Natural language processing is a field of artificial intelligence and aims at designing computer algorithms to understand and process natural language as humans do. It becomes a necessity in the Internet age and big data era. From fundamental research to sophisticated applications, natural language processing includes many tasks, such as lexical analysis, syntactic and semantic parsing, discourse analysis, text classification, sentiment analysis, summarization, machine translation and question answering. In a long time, statistical models such as Naive Bayes (McCallum and Nigam et al., A comparison of event models for Naive Bayes text classification. In: AAAI-98 workshop on learning for text categorization, Madison, vol 752, pp 41–48, 1998), Support Vector Machine (Cortes and Vapnik, Mach Learn 20(3):273–297, 1995), Maximum Entropy (Berger et al., Comput Linguist 22(1):39–71, 1996) and Conditional Random Fields (Lafferty et al., Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML, 2001) are dominant methods for natural language processing (Manning and Schütze, Foundations of statistical natural language processing. MIT Press, Cambridge/London, 1999; Zong, Statistical natural language processing. Tsinghua University Press, Beijing, 2008). Recent years have witnessed the great success of deep learning in natural language processing, from Chinese word segmentation (Pei et al., Max-margin tensor neural network for Chinese word segmentation. In: Proceedings of ACL, pp 293–303, 2014; Chen et al., Long short-term memory neural networks for Chinese word segmentation. In: Proceedings of EMNLP, pp 1197–1206, 2015; Cai et al., Fast and accurate neural word segmentation for Chinese. In: Proceedings of ACL, pp 608–615, 2017), named entity recognition (Collobert et al., J Mach Learn Res 12:2493–2537, 2011; Lample et al., Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT, 2016; Dong et al., Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In: International conference on computer processing of oriental languages. Springer, pp 239–250, 2016; Dong et al., Multichannel LSTM-CRF for named entity recognition in Chinese social media. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, pp 197–208, 2017), sequential tagging (Vaswani et al., Supertagging with LSTMs. In: Proceedings of NAACL-HLT, pp 232–237, 2016; Wu et al., An empirical exploration of skip connections for sequential tagging. In: Proceedings of COLING, 2016a), syntactic parsing (Socher et al., Parsing with compositional vector grammars. In: Proceedings of ACL, pp 455–465, 2013; Chen and Manning, A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 740–750, 2014; Liu and Zhang, TACL 5:45–58, 2017), text summarization (Rush et al., A neural attention model for abstractive sentence summarization. In: Proceedings of EMNLP, 2015; See et al., Get to the point: summarization with pointer-generator networks. In: Proceedings of ACL, 2017), machine translation (Bahdanau et al., Neural machine translation by jointly learning to align and translate. In: Proceedings of ICLR, 2015; Sutskever et al., Sequence to sequence learning with neural networks. In: Proceedings of NIPS, 2014; Vawani et al., Attention is all you need. arXiv preprint arXiv:1706.03762, 2017) to question answering (Andreas et al., Learning to compose neural networks for question answering. In: Proceedings of NAACL-HLT, pp 232–237, 2016; Bordes et al., Question answering with subgraph embeddings. arXiv preprint arXiv:1406.3676, 2014; Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075, 2015; Yu et al., Deep learning for answer sentence selection. arXiv preprint arXiv:1412.1632, 2014). This chapter employs entity recognition, supertagging, machine translation and text summarization as case study to introduce the application of deep learning in natural language processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Andreas J, Rohrbach M, Darrell T, Klein D (2016) Learning to compose neural networks for question answering. In: Proceedings of NAACL-HLT, pp 232–237
Google Scholar
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceedings of ICLR
Google Scholar
Barzilay R, McKeown KR (2005) Sentence fusion for multidocument news summarization. Comput Linguist 31(3):297–328
Article Google Scholar
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Article CAS PubMed Google Scholar
Berger AL, Pietra VJD, Pietra SAD (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1):39–71
Google Scholar
Boitet C, Guillaume P, Quezel-Ambrunaz M (1982) Implementation and conversational environment of ariane 78.4, an integrated system for automated translation and human revision. In: Proceedings of the 9th conference on computational linguistics-volume 1. Academia Praha, pp 19–27
Google Scholar
Bordes A, Chopra S, Weston J (2014) Question answering with subgraph embeddings. arXiv preprint arXiv:1406.3676
Google Scholar
Bordes A, Usunier N, Chopra S, Weston J (2015) Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075
Google Scholar
Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311
Google Scholar
Cai D, Zhao H, Zhang Z, Xin Y, Wu Y, Huang F (2017) Fast and accurate neural word segmentation for Chinese. In: Proceedings of ACL, pp 608–615
Google Scholar
Chen D, Manning C (2014) A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 740–750
Google Scholar
Chen X, Qiu X, Zhu C, Liu P, Huang X (2015) Long short-term memory neural networks for Chinese word segmentation. In: Proceedings of EMNLP, pp 1197–1206
Google Scholar
Cheng Y, Xu W, He Z, He W, Wu H, Sun M, Liu Y (2016) Semi-supervised learning for neural machine translation. In: Proceedings of ACL 2016
Google Scholar
Chiu JP, Nichols E (2016) Named entity recognition with bidirectional LSTM-CNNs. Trans ACL 4:357–370
Google Scholar
Chopra S, Auli M, Rush AM (2016) Abstractive sentence summarization with attentive recurrent neural networks. In: Proceedings of NAACL-HLT, pp 93–98
Google Scholar
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537
Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Google Scholar
Devlin J, Zbib R, Huang Z, Lamar T, Schwartz RM, Makhoul J (2014) Fast and robust neural network joint models for statistical machine translation. In: Proceedings of ACL, pp 1370–1380
Google Scholar
Dong C, Zhang J, Zong C, Hattori M, Di H (2016) Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In: International conference on computer processing of oriental languages. Springer, pp 239–250
Google Scholar
Dong C, Wu H, Zhang J, Zong C (2017) Multichannel LSTM-CRF for named entity recognition in Chinese social media. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, pp 197–208
Google Scholar
Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479
Article Google Scholar
Filippova K, Strube M (2008) Sentence fusion via dependency graph compression. In: Proceedings of EMNLP, pp 177–185
Google Scholar
Fonseca ER, Rosa JLG, Aluísio SM (2015) Evaluating word embeddings and a revised corpus for part-of-speech tagging in Portuguese. J Braz Comput Soc 21(1):1–14
Article Google Scholar
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122
Google Scholar
Gulcehre C, Ahn S, Nallapati R, Zhou B, Bengio Y (2016) Pointing the unknown words. In: Proceedings of ACL
Book Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. In: Proceedings of CVPR
Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article CAS PubMed Google Scholar
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of NAACL
Book Google Scholar
Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML
Google Scholar
Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT
Book Google Scholar
Li P, Liu Y, Sun M (2013) Recursive autoencoders for ITG-based translation. In: Proceedings of EMNLP
Google Scholar
Li X, Zhang J, Zong C (2016) Towards zero unknown word in neural machine translation. In: Proceedings of IJCAI 2016
Google Scholar
Liu J, Zhang Y (2017) Shift-reduce constituent parsing with neural lookahead features. TACL 5(Jan):45–58
Google Scholar
Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165
Article Google Scholar
Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of EMNLP 2015
Google Scholar
Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of ACL
Book Google Scholar
Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge/London
Google Scholar
McCallum A, Nigam K et al (1998) A comparison of event models for Naive Bayes text classification. In: AAAI-98 workshop on learning for text categorization, Madison, vol 752, pp 41–48
Google Scholar
Mi H, Sankaran B, Wang Z, Ittycheriah A (2016) A coverage embedding model for neural machine translation. In: Proceedings of EMNLP 2016
Google Scholar
Nagao M (1984) A framework of a mechanical translation between Japanese and English by analogy principle. In: Elithorn A, Banerji R (eds) Artificial and human intelligence, Elsevier Science Publishers B.V., pp 173–180
Google Scholar
Nallapati R, Zhou B, Gulcehre C, Xiang B et al (2016) Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Proceedings of CoNLL
Book Google Scholar
Nallapati R, Zhai F, Zhou B (2017) Summarunner: a recurrent neural network based sequence model for extractive summarization of documents. In: AAAI, pp 3075–3081
Google Scholar
Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of ACL
Google Scholar
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Technical report, Stanford InfoLab
Google Scholar
Pei W, Ge T, Chang B (2014) Max-margin tensor neural network for Chinese word segmentation. In: Proceedings of ACL, pp 293–303
Google Scholar
Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. In: Proceedings of EMNLP
Book Google Scholar
See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator networks. In: Proceedings of ACL
Google Scholar
Sennrich R, Haddow B, Birch A (2016a) Improving neural machine translation models with monolingual data. In: Proceedings of ACL 2016
Google Scholar
Sennrich R, Haddow B, Birch A (2016b) Neural machine translation of rare words with subword units. In: Proceedings of ACL 2016
Google Scholar
Socher R, Bauer J, Manning CD et al (2013) Parsing with compositional vector grammars. In: Proceedings of ACL, pp 455–465
Google Scholar
Steedman M (2000) The syntactic process, vol 24. MIT Press, Cambridge
Google Scholar
Steedman M, Baldridge J (2011) Combinatory categorial grammar. In: Borsley RD, Börjars K (eds) Non-transformational syntax: formal and explicit models of grammar. Wiley-Blackwell, Chichester/Malden
Google Scholar
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of NIPS
Google Scholar
Tu Z, Lu Z, Liu Y, Liu X, Li H (2016) Coverage-based neural machine translation. In: Proceedings of ACL 2016
Google Scholar
Vaswani A, Zhao Y, Fossum V, Chiang D (2013) Decoding with large-scale neural language models improves translation. In: Proceedings of EMNLP, pp 1387–1392
Google Scholar
Vaswani A, Bisk Y, Sagae K, Musa R (2016) Supertagging with LSTMs. In: Proceedings of NAACL-HLT, pp 232–237
Google Scholar
Vawani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv preprint arXiv:1706.03762
Google Scholar
Wu H, Zhang J, Zong C (2016a) An empirical exploration of skip connections for sequential tagging. In: Proceedings of COLING, pp 232–237
Google Scholar
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al (2016b) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144
Google Scholar
Wu H, Zhang J, Zong C (2017) A dynamic window neural network for CCG supertagging. In: Proceedings of AAAI
Google Scholar
Yu L, Hermann KM, Blunsom P, Pulman S (2014) Deep learning for answer sentence selection. arXiv preprint arXiv:1412.1632
Google Scholar
Zhang J, Zong C (2015) Deep neural networks in machine translation: an overview. IEEE Intell Syst 30(5):16–25
Article Google Scholar
Zhang J, Zong C (2016) Exploring source-side monolingual data in neural machine translation. In: Proceedings of EMNLP 2016
Google Scholar
Zhang J, Liu S, Li M, Zhou M, Zong C (2014a) Bilingually-constrained phrase embeddings for machine translation. In: Proceedings of ACL, pp 111–121
Google Scholar
Zhang J, Liu S, Li M, Zhou M, Zong C (2014b) Mind the gap: machine translation by minimizing the semantic gap in embedding space. In: AAAI, pp 1657–1664
Google Scholar
Zhou Q, Yang N, Wei F, Zhou M (2017) Selective encoding for abstractive sentence summarization. In: Proceedings of ACL
Book Google Scholar
Zong C (2008) Statistical natural language processing. Tsinghua University Press, Beijing
Google Scholar

Download references

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Jiajun Zhang & Chengqing Zong
University of Chinese Academy of Sciences, Beijing, China
Jiajun Zhang & Chengqing Zong
CAS Center for Excellence in Brain Science and Intelligence Technology, University of Chinese Academy of Sciences, Beijing, People’s Republic of China
Chengqing Zong

Authors

Jiajun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chengqing Zong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiajun Zhang .

Editor information

Editors and Affiliations

Xi’an Jiaotong-Liverpool University, Suzhou, China
Kaizhu Huang
School of Computing, Edinburgh Napier University, Edinburgh, UK
Amir Hussain
Xi’an Jiaotong-Liverpool University, Suzhou, China
Qiu-Feng Wang
Xi’an Jiaotong-Liverpool University, Suzhou, China
Rui Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhang, J., Zong, C. (2019). Deep Learning for Natural Language Processing. In: Huang, K., Hussain, A., Wang, QF., Zhang, R. (eds) Deep Learning: Fundamentals, Theory and Applications. Cognitive Computation Trends, vol 2. Springer, Cham. https://doi.org/10.1007/978-3-030-06073-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-06073-2_5
Published: 16 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-06072-5
Online ISBN: 978-3-030-06073-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics