Abstract
Natural language processing (NLP), utilizing computer programs to process large amounts of language data, is a key research area in artificial intelligence and computer science. Deep learning technologies have been well developed and applied in this area. However, the literature still lacks a succinct survey, which would allow readers to get a quick understanding of (1) how the deep learning technologies apply to NLP and (2) what the promising applications are. In this survey, we try to investigate the recent developments of NLP, centered around natural language understanding, to answer these two questions. First, we explore the newly developed word embedding or word representation methods. Then, we describe two powerful learning models, Recurrent Neural Networks and Convolutional Neural Networks. Next, we outline five key NLP applications, including (1) part-of-speech tagging and named entity recognition, two fundamental NLP applications; (2) machine translation and automatic English grammatical error correction, two applications with prominent commercial value; and (3) image description, an application requiring technologies of both computer vision and NLP. Moreover, we present a series of benchmark datasets which would be useful for researchers to evaluate the performance of models in the related applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
The source code can be found at https://github.com/rsennrich/nematus.
- 10.
- 11.
- 12.
- 13.
Ginger
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
References
Artetxe M, Labaka G, Agirre E, Cho K (2017) Unsupervised neural machine translation. CoRR, abs/1710.11041
Ba JL, Kiros R, Hinton EG (2016) Layer normalization. CoRR, abs/1607.06450
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473
Bernardi R, Çakici R, Elliott D, Erdem A, Erdem E, Ikizler-Cinbis N, Keller F, Muscat A, Plank B (2016) Automatic description generation from images: a survey of models, datasets, and evaluation measures. J Artif Intell Res 55:409–442
Bhirud SN, Bhavsar R, Pawar B (2017) Grammar checkers for natural languages:a review. Int J Natural Lang Comput 6(4):1
Brants T (2000) Tnt: a statistical part-of-speech tagger. In: ANLC’00, Stroudsburg. Association for Computational Linguistics, pp 224–231
Brill E (1992) A simple rule-based part of speech tagger. In: ANLC, Stroudsburg, pp 152–155
Britz D, Goldie A, Luong M, Le VQ (2017) Massive exploration of neural machine translation architectures. CoRR, abs/1703.03906
Chieu LH, Ng TH (2002) Named entity recognition: a maximum entropy approach using global information. In: COLING, Taipei
Chiu JPC, Nichols E (2016) Named entity recognition with bidirectional LSTM-CNNs. TACL 4:357–370
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555
Chung J, Cho K, Bengio Y (2016) A character-level decoder without explicit segmentation for neural machine translation. In: ACL, Berlin
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa PP (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537
Costa-Jussà MR, Fonollosa JAR (2016) Character-based neural machine translation. In: ACL, Berlin
Dale R, Kilgarriff A (2011) Helping our own: the HOO 2011 pilot shared task. In: ENLG, Nancy, pp 242–249
Daniel N (2003) A rule-based style and grammar checker. Master’s thesis, Bielefeld University, Bielefeld
Daudaravicius V, Banchs ER, Volodina E, Napoles C (2016) A report on the automatic evaluation of scientific writing shared task. In: Proceedings of the 11th workshop on innovative use of NLP for building educational applications, BEA@NAACL-HLT 2016, San Diego, 16 June 2016, pp 53–62
dos Santos CN, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts. In: COLING, Dublin, pp 69–78
Elman LJ (1990) Finding structure in time. Cogn Sci 14(2):179–211
Firat O, Cho K, Sankaran B, Yarman-Vural FT, Bengio Y (2017) Multi-way, multilingual neural machine translation. Comput Speech Lang 45:236–252
Firth RJ (1957) A synopsis of linguistic theory 1930–1955. Studies in linguistic analysis. Blackwell, Oxford, pp 1–32
Florian R, Ittycheriah A, Jing H, Zhang T (2003) Named entity recognition through classifier combination. In: Proceedings of the seventh conference on natural language learning, CoNLL 2003, Held in cooperation with HLT-NAACL 2003, Edmonton, 31 May–1 June 2003, pp 168–171
Gehring J, Auli M, Grangier D, Dauphin Y (2017) A convolutional encoder model for neural machine translation. In: ACL, Vancouver, pp 123–135
Gehring J, Auli M, Grangier D, Yarats D, Dauphin NY (2017) Convolutional sequence to sequence learning. In: ICML, Sydney, pp 1243–1252
Gers AF, Schmidhuber J (2000) Recurrent nets that time and count. In: IJCNN (3), Como, pp 189–194
Goodfellow JI, Bengio Y, Courville CA (2016) Deep learning. Adaptive computation and machine learning. MIT Press, Cambridge
Graves A, Mohamed A, Hinton EG (2013) Speech recognition with deep recurrent neural networks. In: IEEE ICASSP, British Columbia, pp 6645–6649
Greff K, Srivastava KR, KoutnÃk J, Steunebrink RB, Schmidhuber J (2017) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232
Gucehre C, Firat O, Xu K, Cho K, Barrault L, Lin H, Bougares F, Schwenk H, Bengio Y (2015) On using monolingual corpora in neural machine translation. CoRR, abs/1503.03535
Harris Z (1954) Distributional structure. Word 10(23):146–162
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, Las Vegas, pp 770–778
Hoang TD, Chollampatt S, Ng TH (2016) Exploiting n-best hypotheses to improve an SMT approach to grammatical error correction. In: IJCAI, pp 2803–2809
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Hodosh M, Young P, Hockenmaier J (2013) Framing image description as a ranking task: data, models and evaluation metrics. J Artif Intell Res 47:853–899
Huang Z, Xu W, Yu K (2015) Bidirectional LSTM-CRF models for sequence tagging. CoRR, abs/1508.01991
Hu Z, Zhang Z, Yang H, Chen Q, Zuo D (2017) A deep learning approach for predicting the quality of online health expert question-answering services. J Biomed Inform 71:241–253
Hu Z, Zhang Z, Yang H, Chen Q, Zhu R, Zuo D (2018) Predicting the quality of online health expert question-answering services with temporal features in a deep learning framework. Neurocomputing 275:2769–2782
Jean S, Cho K, Memisevic R, Bengio Y (2015) On using very large target vocabulary for neural machine translation. In: ACL, Beijing, pp 1–10
Ji S, Vishwanathan SVN, Satish N, Anderson JM, Dubey P (2015) Blackout: speeding up recurrent neural network language models with very large vocabularies. CoRR, abs/1511.06909
Johnson M, Schuster M, Le VQ, Krikun M, Wu Y, Chen Z, Thorat N, Viégas FB, Wattenberg M, Corrado G, Hughes M, Dean J (2017) Google’s multilingual neural machine translation system: enabling zero-shot translation. TACL 5:339–351
Józefowicz R, Zaremba W, Sutskever I (2015) An empirical exploration of recurrent network architectures. In: ICML, Lille, pp 2342–2350
Junczys-Dowmunt M, Grundkiewicz R (2016) Phrase-based machine translation is state-of-the-art for automatic grammatical error correction. In: EMNLP, Austin, pp 1546–1556
Jurafsky D, Martin HJ (2017) Speech and language processing – an introduction to natural language processing. Computational linguistics, and speech recognition. 3rd edn. Prentice Hall, p 1032
Karpathy A, Fei-Fei L (2017) Deep visual-semantic alignments for generating image descriptions. IEEE Trans Pattern Anal Mach Intell 39(4):664–676
Kim Y (2014) Convolutional neural networks for sentence classification. In: EMNLP, Doha, pp 1746–1751
Kingma PD, Ba J (2014) Adam: a method for stochastic optimization. CoRR, abs/1412.6980
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: MT summit, vol 5, pp 79–86
KoutnÃk J, Greff K, Gomez JF, Schmidhuber J (2014) A clockwork RNN. In: ICML, Beijing, pp 1863–1871
Krizhevsky A, Sutskever I, Hinton EG (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Lafferty DJ, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML, Williams College, pp 282–289
Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: NAACL HLT, San Diego, pp 260–270
Lample G, Denoyer L, Ranzato M (2017) Unsupervised machine translation using monolingual corpora only. CoRR, abs/1711.00043
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
LeCun Y, Bengio Y, Hinton EG (2015) Deep learning. Nature 521(7553):436–444
Lewis DD, Yang Y, Rose GT, Li F (2004) RCV1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397
Lin T, Maire M, Belongie JS, Hays J, Perona P, Ramanan D, Dollár P, Zitnick LC (2014) Microsoft COCO: common objects in context. In: ECCV, Zurich, pp 740–755
Luong M, Manning DC (2016) Achieving open vocabulary neural machine translation with hybrid word-character models. In: ACL, Berlin
Luong M, Le VQ, Sutskever I, Vinyals O, Kaiser L (2015a) Multi-task sequence to sequence learning. CoRR, abs/1511.06114
Luong T, Pham H, Manning DC (2015b) Effective approaches to attention-based neural machine translation. In: EMNLP, Lisbon, pp 1412–1421
Luong T, Sutskever I, Le VQ, Vinyals O, Zaremba W (2015c) Addressing the rare word problem in neural machine translation. In: ACL, Beijing, pp 11–19
Ma X, Hovy HE (2016) End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: ACL, Berlin
Maas LA, Daly ER, Pham TP, Huang D, Ng YA, Potts C (2011) Learning word vectors for sentiment analysis. In: The 49th annual meeting of the Association for Computational Linguistics: human language technologies, proceedings of the conference, 19–24 June 2011, Portland, pp 142–150
Manchanda B, Athavale AV, Kumar Sharma S (2016) Various techniques used for grammar checking. Int J Comput Appl Inf Technol 9(1):177
Manning DC (2011) Part-of-speech tagging from 97% to 100%: is it time for some linguistics? In: CICLing, Tokyo, pp 171–189
Marcus PM, Santorini B, Marcinkiewicz AM (1993) Building a large annotated corpus of English: the penn treebank. Comput Linguist 19(2):313–330
Marcus M, Santorini B, Marcinkiewicz M, Taylor A (1999) Treebank-3 LDC99T42. Web Download. Linguistic Data Consortium, Philadelphia. https://catalog.ldc.upenn.edu/LDC99T42
McCallum A, Freitag D, Pereira FCN (2000) Maximum entropy Markov models for information extraction and segmentation. In: ICML’00. Morgan Kaufmann Publishers Inc., San Francisco, pp 591–598
Melamud O, McClosky D, Patwardhan S, Bansal M (2016) The role of context types and dimensionality in learning word embeddings. In: NAACL HLT, San Diego, pp 1030–1040
Mi H, Wang Z, Ittycheriah A (2016) Vocabulary manipulation for neural machine translation. In: ACL, Berlin
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. CoRR, abs/1301.3781
Mikolov T, Sutskever I, Chen K, Corrado SG, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: NIPS, Lake Tahoe, pp 3111–3119
Nazar R, Renau I (2012) Google books n-gram corpus used as a grammar checker. In: Proceedings of the second workshop on computational linguistics and writing (CLW 2012): linguistic and cognitive aspects of document creation and document engineering, EACL 2012, Stroudsburg. Association for Computational Linguistics, pp 27–34
Ng TH, Wu MS, Wu Y, Hadiwinoto C, Tetreault RJ (2013) The conll-2013 shared task on grammatical error correction. In: Proceedings of the seventeenth conference on computational natural language learning: shared task, CoNLL 2013, Sofia, 8–9 Aug 2013, pp 1–12
Ng TH, Wu MS, Briscoe T, Hadiwinoto C, Susanto HR, Bryant C (2014) The conll-2014 shared task on grammatical error correction. In: CoNLL, Baltimore, pp 1–14
Nivre J et al (2017) Universal dependencies 2.1. LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics ( ’UFAL), Faculty of Mathematics and Physics, Charles University
Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting of the Association for Computational Linguistics, Barcelona, 21–26 July 2004, pp 271–278
Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: ACL 2005, 43rd annual meeting of the Association for Computational Linguistics, proceedings of the conference, 25–30 June 2005, University of Michigan, USA, pp 115–124
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. CoRR, cs.CL/0205070
Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: ICML, Atlanta, pp 1310–1318
Pennington J, Socher R, Manning DC (2014) Glove: global vectors for word representation. In: EMNLP, Doha, pp 1532–1543
Plank B, Søgaard A, Goldberg Y (2016) Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. In: ACL, Berlin
Plummer AB, Wang L, Cervantes MC, Caicedo CJ, Hockenmaier J, Lazebnik S (2017) Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models. Int J Comput Vis 123(1):74–93
Rong X (2014) word2vec parameter learning explained. CoRR, abs/1411.2738
Rozovskaya A, Roth D (2010) Training paradigms for correcting errors in grammar and usage. In: HLT’10, Stroudsburg. Association for Computational Linguistics, pp 154–162
Rozovskaya A, Roth D (2016) Grammatical error correction: machine translation and classifiers. In: ACL, Berlin
Ruder S (2016) An overview of gradient descent optimization algorithms. CoRR, abs/1609.04747
Ruder S, Ghaffari P, Breslin GJ (2016) A hierarchical model of reviews for aspect-based sentiment analysis. In: EMNLP, Austin, pp 999–1005
Schmaltz A, Kim Y, Rush MA, Shieber MS (2016) Sentence-level grammatical error identification as sequence-to-sequence correction. In: Proceedings of the 11th workshop on innovative use of NLP for building educational applications, BEA@NAACL-HLT 2016, 16 June 2016, San Diego, pp 242–251
Sennrich R, Haddow B, Birch A (2016a) Improving neural machine translation models with monolingual data. In: ACL, Berlin
Sennrich R, Haddow B, Birch A (2016b) Neural machine translation of rare words with subword units. In: ACL, Berlin
Srivastava N, Hinton EG, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Sutskever I, Vinyals O, Le VQ (2014) Sequence to sequence learning with neural networks. In: NIPS, Montreal, pp 3104–3112
Toutanova K, Klein D, Manning DC, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL, Edmonton
Ueffing N, Ney H (2003) Using POS information for statistical machine translation into morphologically rich languages. In: EACL’03, Stroudsburg. Association for Computational Linguistics, pp 347–354
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez NA, Kaiser L, Polosukhin I (2017) Attention is all you need. In: NIPS, Long Beach, pp 6000–6010
Vinyals O, Toshev A, Bengio S, Erhan D (2017) Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans Pattern Anal Mach Intell 39(4):652–663
Wang P, Qian Y, Soong KF, He L, Zhao H (2015) A unified tagging solution: bidirectional LSTM recurrent neural network with word embedding. CoRR, abs/1511.00215
Wiseman S, Rush MA (2016) Sequence-to-sequence learning as beam-search optimization. In: EMNLP, Austin, pp 1296–1306
Wu J, Chang J, Chang SJ (2013) Correcting serial grammatical errors based on n-grams and syntax. IJCLCLP 18(4)
Wu Y, Schuster M, Chen Z, Le VQ, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, Kaiser L, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR, abs/1609.08144
Xu K, Ba J, Kiros R, Cho K, Courville CA, Salakhutdinov R, Zemel SR, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: ICML, Lille, pp 2048–2057
Yang Z, Salakhutdinov R, Cohen WW (2016) Multi-task cross-lingual sequence tagging from scratch. CoRR, abs/1603.06270
Yin W, Yu M, Xiang B, Zhou B, Schütze H (2016) Simple question answering by attentive convolutional neural network. In: COLING, Osaka, pp 1746–1756
Zhou J, Cao Y, Wang X, Li P, Xu W (2016) Deep recurrent models with fast-forward connections for neural machine translation. TACL 4:371–383
Ziemski M, Junczys-Dowmunt M, Pouliquen B (2016) The united nations parallel corpus v1.0. In: Proceedings of the tenth international conference on language resources and evaluation LREC 2016, Portorovz, 23–28 May 2016
Acknowledgements
The work described in this paper was partially supported by the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. UGC/IDS14/16).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Yang, H., Luo, L., Chueng, L.P., Ling, D., Chin, F. (2019). Deep Learning and Its Applications to Natural Language Processing. In: Huang, K., Hussain, A., Wang, QF., Zhang, R. (eds) Deep Learning: Fundamentals, Theory and Applications. Cognitive Computation Trends, vol 2. Springer, Cham. https://doi.org/10.1007/978-3-030-06073-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-06073-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-06072-5
Online ISBN: 978-3-030-06073-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)