Deep Learning and Its Applications to Natural Language Processing

Yang, Haiqin; Luo, Linkai; Chueng, Lap Pong; Ling, David; Chin, Francis

doi:10.1007/978-3-030-06073-2_4

Haiqin Yang⁶,
Linkai Luo⁶,
Lap Pong Chueng⁶,
David Ling⁶ &
…
Francis Chin⁶

Part of the book series: Cognitive Computation Trends ((COCT,volume 2))

8605 Accesses
13 Citations
6 Altmetric

Abstract

Natural language processing (NLP), utilizing computer programs to process large amounts of language data, is a key research area in artificial intelligence and computer science. Deep learning technologies have been well developed and applied in this area. However, the literature still lacks a succinct survey, which would allow readers to get a quick understanding of (1) how the deep learning technologies apply to NLP and (2) what the promising applications are. In this survey, we try to investigate the recent developments of NLP, centered around natural language understanding, to answer these two questions. First, we explore the newly developed word embedding or word representation methods. Then, we describe two powerful learning models, Recurrent Neural Networks and Convolutional Neural Networks. Next, we outline five key NLP applications, including (1) part-of-speech tagging and named entity recognition, two fundamental NLP applications; (2) machine translation and automatic English grammatical error correction, two applications with prominent commercial value; and (3) image description, an application requiring technologies of both computer vision and NLP. Moreover, we present a series of benchmark datasets which would be useful for researchers to evaluate the performance of models in the related applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Review on Natural Language Processing: Back to Basics

Deep Learning Methods in Natural Language Processing

Are Deep Learning Approaches Suitable for Natural Language Processing?

Notes

References

Artetxe M, Labaka G, Agirre E, Cho K (2017) Unsupervised neural machine translation. CoRR, abs/1710.11041
Google Scholar
Ba JL, Kiros R, Hinton EG (2016) Layer normalization. CoRR, abs/1607.06450
Google Scholar
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473
Google Scholar
Bernardi R, Çakici R, Elliott D, Erdem A, Erdem E, Ikizler-Cinbis N, Keller F, Muscat A, Plank B (2016) Automatic description generation from images: a survey of models, datasets, and evaluation measures. J Artif Intell Res 55:409–442
Article Google Scholar
Bhirud SN, Bhavsar R, Pawar B (2017) Grammar checkers for natural languages:a review. Int J Natural Lang Comput 6(4):1
Article Google Scholar
Brants T (2000) Tnt: a statistical part-of-speech tagger. In: ANLC’00, Stroudsburg. Association for Computational Linguistics, pp 224–231
Google Scholar
Brill E (1992) A simple rule-based part of speech tagger. In: ANLC, Stroudsburg, pp 152–155
Google Scholar
Britz D, Goldie A, Luong M, Le VQ (2017) Massive exploration of neural machine translation architectures. CoRR, abs/1703.03906
Google Scholar
Chieu LH, Ng TH (2002) Named entity recognition: a maximum entropy approach using global information. In: COLING, Taipei
Book Google Scholar
Chiu JPC, Nichols E (2016) Named entity recognition with bidirectional LSTM-CNNs. TACL 4:357–370
Google Scholar
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555
Google Scholar
Chung J, Cho K, Bengio Y (2016) A character-level decoder without explicit segmentation for neural machine translation. In: ACL, Berlin
Book Google Scholar
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa PP (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537
Google Scholar
Costa-Jussà MR, Fonollosa JAR (2016) Character-based neural machine translation. In: ACL, Berlin
Book Google Scholar
Dale R, Kilgarriff A (2011) Helping our own: the HOO 2011 pilot shared task. In: ENLG, Nancy, pp 242–249
Google Scholar
Daniel N (2003) A rule-based style and grammar checker. Master’s thesis, Bielefeld University, Bielefeld
Google Scholar
Daudaravicius V, Banchs ER, Volodina E, Napoles C (2016) A report on the automatic evaluation of scientific writing shared task. In: Proceedings of the 11th workshop on innovative use of NLP for building educational applications, BEA@NAACL-HLT 2016, San Diego, 16 June 2016, pp 53–62
Google Scholar
dos Santos CN, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts. In: COLING, Dublin, pp 69–78
Google Scholar
Elman LJ (1990) Finding structure in time. Cogn Sci 14(2):179–211
Article Google Scholar
Firat O, Cho K, Sankaran B, Yarman-Vural FT, Bengio Y (2017) Multi-way, multilingual neural machine translation. Comput Speech Lang 45:236–252
Article Google Scholar
Firth RJ (1957) A synopsis of linguistic theory 1930–1955. Studies in linguistic analysis. Blackwell, Oxford, pp 1–32
Google Scholar
Florian R, Ittycheriah A, Jing H, Zhang T (2003) Named entity recognition through classifier combination. In: Proceedings of the seventh conference on natural language learning, CoNLL 2003, Held in cooperation with HLT-NAACL 2003, Edmonton, 31 May–1 June 2003, pp 168–171
Google Scholar
Gehring J, Auli M, Grangier D, Dauphin Y (2017) A convolutional encoder model for neural machine translation. In: ACL, Vancouver, pp 123–135
Google Scholar
Gehring J, Auli M, Grangier D, Yarats D, Dauphin NY (2017) Convolutional sequence to sequence learning. In: ICML, Sydney, pp 1243–1252
Google Scholar
Gers AF, Schmidhuber J (2000) Recurrent nets that time and count. In: IJCNN (3), Como, pp 189–194
Google Scholar
Goodfellow JI, Bengio Y, Courville CA (2016) Deep learning. Adaptive computation and machine learning. MIT Press, Cambridge
Google Scholar
Graves A, Mohamed A, Hinton EG (2013) Speech recognition with deep recurrent neural networks. In: IEEE ICASSP, British Columbia, pp 6645–6649
Google Scholar
Greff K, Srivastava KR, Koutník J, Steunebrink RB, Schmidhuber J (2017) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232
Article Google Scholar
Gucehre C, Firat O, Xu K, Cho K, Barrault L, Lin H, Bougares F, Schwenk H, Bengio Y (2015) On using monolingual corpora in neural machine translation. CoRR, abs/1503.03535
Google Scholar
Harris Z (1954) Distributional structure. Word 10(23):146–162
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, Las Vegas, pp 770–778
Google Scholar
Hoang TD, Chollampatt S, Ng TH (2016) Exploiting n-best hypotheses to improve an SMT approach to grammatical error correction. In: IJCAI, pp 2803–2809
Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article CAS Google Scholar
Hodosh M, Young P, Hockenmaier J (2013) Framing image description as a ranking task: data, models and evaluation metrics. J Artif Intell Res 47:853–899
Article Google Scholar
Huang Z, Xu W, Yu K (2015) Bidirectional LSTM-CRF models for sequence tagging. CoRR, abs/1508.01991
Google Scholar
Hu Z, Zhang Z, Yang H, Chen Q, Zuo D (2017) A deep learning approach for predicting the quality of online health expert question-answering services. J Biomed Inform 71:241–253
Article Google Scholar
Hu Z, Zhang Z, Yang H, Chen Q, Zhu R, Zuo D (2018) Predicting the quality of online health expert question-answering services with temporal features in a deep learning framework. Neurocomputing 275:2769–2782
Article Google Scholar
Jean S, Cho K, Memisevic R, Bengio Y (2015) On using very large target vocabulary for neural machine translation. In: ACL, Beijing, pp 1–10
Google Scholar
Ji S, Vishwanathan SVN, Satish N, Anderson JM, Dubey P (2015) Blackout: speeding up recurrent neural network language models with very large vocabularies. CoRR, abs/1511.06909
Google Scholar
Johnson M, Schuster M, Le VQ, Krikun M, Wu Y, Chen Z, Thorat N, Viégas FB, Wattenberg M, Corrado G, Hughes M, Dean J (2017) Google’s multilingual neural machine translation system: enabling zero-shot translation. TACL 5:339–351
Google Scholar
Józefowicz R, Zaremba W, Sutskever I (2015) An empirical exploration of recurrent network architectures. In: ICML, Lille, pp 2342–2350
Google Scholar
Junczys-Dowmunt M, Grundkiewicz R (2016) Phrase-based machine translation is state-of-the-art for automatic grammatical error correction. In: EMNLP, Austin, pp 1546–1556
Google Scholar
Jurafsky D, Martin HJ (2017) Speech and language processing – an introduction to natural language processing. Computational linguistics, and speech recognition. 3rd edn. Prentice Hall, p 1032
Google Scholar
Karpathy A, Fei-Fei L (2017) Deep visual-semantic alignments for generating image descriptions. IEEE Trans Pattern Anal Mach Intell 39(4):664–676
Article Google Scholar
Kim Y (2014) Convolutional neural networks for sentence classification. In: EMNLP, Doha, pp 1746–1751
Google Scholar
Kingma PD, Ba J (2014) Adam: a method for stochastic optimization. CoRR, abs/1412.6980
Google Scholar
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: MT summit, vol 5, pp 79–86
Google Scholar
Koutník J, Greff K, Gomez JF, Schmidhuber J (2014) A clockwork RNN. In: ICML, Beijing, pp 1863–1871
Google Scholar
Krizhevsky A, Sutskever I, Hinton EG (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
Lafferty DJ, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML, Williams College, pp 282–289
Google Scholar
Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: NAACL HLT, San Diego, pp 260–270
Google Scholar
Lample G, Denoyer L, Ranzato M (2017) Unsupervised machine translation using monolingual corpora only. CoRR, abs/1711.00043
Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
LeCun Y, Bengio Y, Hinton EG (2015) Deep learning. Nature 521(7553):436–444
Article CAS Google Scholar
Lewis DD, Yang Y, Rose GT, Li F (2004) RCV1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397
Google Scholar
Lin T, Maire M, Belongie JS, Hays J, Perona P, Ramanan D, Dollár P, Zitnick LC (2014) Microsoft COCO: common objects in context. In: ECCV, Zurich, pp 740–755
Google Scholar
Luong M, Manning DC (2016) Achieving open vocabulary neural machine translation with hybrid word-character models. In: ACL, Berlin
Book Google Scholar
Luong M, Le VQ, Sutskever I, Vinyals O, Kaiser L (2015a) Multi-task sequence to sequence learning. CoRR, abs/1511.06114
Google Scholar
Luong T, Pham H, Manning DC (2015b) Effective approaches to attention-based neural machine translation. In: EMNLP, Lisbon, pp 1412–1421
Google Scholar
Luong T, Sutskever I, Le VQ, Vinyals O, Zaremba W (2015c) Addressing the rare word problem in neural machine translation. In: ACL, Beijing, pp 11–19
Google Scholar
Ma X, Hovy HE (2016) End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: ACL, Berlin
Book Google Scholar
Maas LA, Daly ER, Pham TP, Huang D, Ng YA, Potts C (2011) Learning word vectors for sentiment analysis. In: The 49th annual meeting of the Association for Computational Linguistics: human language technologies, proceedings of the conference, 19–24 June 2011, Portland, pp 142–150
Google Scholar
Manchanda B, Athavale AV, Kumar Sharma S (2016) Various techniques used for grammar checking. Int J Comput Appl Inf Technol 9(1):177
Google Scholar
Manning DC (2011) Part-of-speech tagging from 97% to 100%: is it time for some linguistics? In: CICLing, Tokyo, pp 171–189
Google Scholar
Marcus PM, Santorini B, Marcinkiewicz AM (1993) Building a large annotated corpus of English: the penn treebank. Comput Linguist 19(2):313–330
Google Scholar
Marcus M, Santorini B, Marcinkiewicz M, Taylor A (1999) Treebank-3 LDC99T42. Web Download. Linguistic Data Consortium, Philadelphia. https://catalog.ldc.upenn.edu/LDC99T42
Google Scholar
McCallum A, Freitag D, Pereira FCN (2000) Maximum entropy Markov models for information extraction and segmentation. In: ICML’00. Morgan Kaufmann Publishers Inc., San Francisco, pp 591–598
Google Scholar
Melamud O, McClosky D, Patwardhan S, Bansal M (2016) The role of context types and dimensionality in learning word embeddings. In: NAACL HLT, San Diego, pp 1030–1040
Google Scholar
Mi H, Wang Z, Ittycheriah A (2016) Vocabulary manipulation for neural machine translation. In: ACL, Berlin
Book Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. CoRR, abs/1301.3781
Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado SG, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: NIPS, Lake Tahoe, pp 3111–3119
Google Scholar
Nazar R, Renau I (2012) Google books n-gram corpus used as a grammar checker. In: Proceedings of the second workshop on computational linguistics and writing (CLW 2012): linguistic and cognitive aspects of document creation and document engineering, EACL 2012, Stroudsburg. Association for Computational Linguistics, pp 27–34
Google Scholar
Ng TH, Wu MS, Wu Y, Hadiwinoto C, Tetreault RJ (2013) The conll-2013 shared task on grammatical error correction. In: Proceedings of the seventeenth conference on computational natural language learning: shared task, CoNLL 2013, Sofia, 8–9 Aug 2013, pp 1–12
Google Scholar
Ng TH, Wu MS, Briscoe T, Hadiwinoto C, Susanto HR, Bryant C (2014) The conll-2014 shared task on grammatical error correction. In: CoNLL, Baltimore, pp 1–14
Google Scholar
Nivre J et al (2017) Universal dependencies 2.1. LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics ( ’UFAL), Faculty of Mathematics and Physics, Charles University
Google Scholar
Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting of the Association for Computational Linguistics, Barcelona, 21–26 July 2004, pp 271–278
Google Scholar
Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: ACL 2005, 43rd annual meeting of the Association for Computational Linguistics, proceedings of the conference, 25–30 June 2005, University of Michigan, USA, pp 115–124
Google Scholar
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. CoRR, cs.CL/0205070
Google Scholar
Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: ICML, Atlanta, pp 1310–1318
Google Scholar
Pennington J, Socher R, Manning DC (2014) Glove: global vectors for word representation. In: EMNLP, Doha, pp 1532–1543
Google Scholar
Plank B, Søgaard A, Goldberg Y (2016) Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. In: ACL, Berlin
Book Google Scholar
Plummer AB, Wang L, Cervantes MC, Caicedo CJ, Hockenmaier J, Lazebnik S (2017) Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models. Int J Comput Vis 123(1):74–93
Article Google Scholar
Rong X (2014) word2vec parameter learning explained. CoRR, abs/1411.2738
Google Scholar
Rozovskaya A, Roth D (2010) Training paradigms for correcting errors in grammar and usage. In: HLT’10, Stroudsburg. Association for Computational Linguistics, pp 154–162
Google Scholar
Rozovskaya A, Roth D (2016) Grammatical error correction: machine translation and classifiers. In: ACL, Berlin
Google Scholar
Ruder S (2016) An overview of gradient descent optimization algorithms. CoRR, abs/1609.04747
Google Scholar
Ruder S, Ghaffari P, Breslin GJ (2016) A hierarchical model of reviews for aspect-based sentiment analysis. In: EMNLP, Austin, pp 999–1005
Google Scholar
Schmaltz A, Kim Y, Rush MA, Shieber MS (2016) Sentence-level grammatical error identification as sequence-to-sequence correction. In: Proceedings of the 11th workshop on innovative use of NLP for building educational applications, BEA@NAACL-HLT 2016, 16 June 2016, San Diego, pp 242–251
Google Scholar
Sennrich R, Haddow B, Birch A (2016a) Improving neural machine translation models with monolingual data. In: ACL, Berlin
Book Google Scholar
Sennrich R, Haddow B, Birch A (2016b) Neural machine translation of rare words with subword units. In: ACL, Berlin
Book Google Scholar
Srivastava N, Hinton EG, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Google Scholar
Sutskever I, Vinyals O, Le VQ (2014) Sequence to sequence learning with neural networks. In: NIPS, Montreal, pp 3104–3112
Google Scholar
Toutanova K, Klein D, Manning DC, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL, Edmonton
Book Google Scholar
Ueffing N, Ney H (2003) Using POS information for statistical machine translation into morphologically rich languages. In: EACL’03, Stroudsburg. Association for Computational Linguistics, pp 347–354
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez NA, Kaiser L, Polosukhin I (2017) Attention is all you need. In: NIPS, Long Beach, pp 6000–6010
Google Scholar
Vinyals O, Toshev A, Bengio S, Erhan D (2017) Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans Pattern Anal Mach Intell 39(4):652–663
Article Google Scholar
Wang P, Qian Y, Soong KF, He L, Zhao H (2015) A unified tagging solution: bidirectional LSTM recurrent neural network with word embedding. CoRR, abs/1511.00215
Google Scholar
Wiseman S, Rush MA (2016) Sequence-to-sequence learning as beam-search optimization. In: EMNLP, Austin, pp 1296–1306
Google Scholar
Wu J, Chang J, Chang SJ (2013) Correcting serial grammatical errors based on n-grams and syntax. IJCLCLP 18(4)
Google Scholar
Wu Y, Schuster M, Chen Z, Le VQ, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, Kaiser L, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR, abs/1609.08144
Google Scholar
Xu K, Ba J, Kiros R, Cho K, Courville CA, Salakhutdinov R, Zemel SR, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: ICML, Lille, pp 2048–2057
Google Scholar
Yang Z, Salakhutdinov R, Cohen WW (2016) Multi-task cross-lingual sequence tagging from scratch. CoRR, abs/1603.06270
Google Scholar
Yin W, Yu M, Xiang B, Zhou B, Schütze H (2016) Simple question answering by attentive convolutional neural network. In: COLING, Osaka, pp 1746–1756
Google Scholar
Zhou J, Cao Y, Wang X, Li P, Xu W (2016) Deep recurrent models with fast-forward connections for neural machine translation. TACL 4:371–383
Google Scholar
Ziemski M, Junczys-Dowmunt M, Pouliquen B (2016) The united nations parallel corpus v1.0. In: Proceedings of the tenth international conference on language resources and evaluation LREC 2016, Portorovz, 23–28 May 2016
Google Scholar

Download references

Acknowledgements

The work described in this paper was partially supported by the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. UGC/IDS14/16).

Author information

Authors and Affiliations

Department of Computing, Deep Learning Research and Application Centre, Hang Seng Management College, Sha Tin, Hong Kong
Haiqin Yang, Linkai Luo, Lap Pong Chueng, David Ling & Francis Chin

Authors

Haiqin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Linkai Luo
View author publications
You can also search for this author in PubMed Google Scholar
Lap Pong Chueng
View author publications
You can also search for this author in PubMed Google Scholar
David Ling
View author publications
You can also search for this author in PubMed Google Scholar
Francis Chin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haiqin Yang .

Editor information

Editors and Affiliations

Xi’an Jiaotong-Liverpool University, Suzhou, China
Kaizhu Huang
School of Computing, Edinburgh Napier University, Edinburgh, UK
Amir Hussain
Xi’an Jiaotong-Liverpool University, Suzhou, China
Qiu-Feng Wang
Xi’an Jiaotong-Liverpool University, Suzhou, China
Rui Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yang, H., Luo, L., Chueng, L.P., Ling, D., Chin, F. (2019). Deep Learning and Its Applications to Natural Language Processing. In: Huang, K., Hussain, A., Wang, QF., Zhang, R. (eds) Deep Learning: Fundamentals, Theory and Applications. Cognitive Computation Trends, vol 2. Springer, Cham. https://doi.org/10.1007/978-3-030-06073-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-06073-2_4
Published: 16 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-06072-5
Online ISBN: 978-3-030-06073-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

Deep Learning and Its Applications to Natural Language Processing

Abstract

Access this chapter

Similar content being viewed by others

A Review on Natural Language Processing: Back to Basics

Deep Learning Methods in Natural Language Processing

Are Deep Learning Approaches Suitable for Natural Language Processing?

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Deep Learning and Its Applications to Natural Language Processing

Abstract

Access this chapter

Similar content being viewed by others

A Review on Natural Language Processing: Back to Basics

Deep Learning Methods in Natural Language Processing

Are Deep Learning Approaches Suitable for Natural Language Processing?

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation