Background and Related Work

Mou, Lili; Jin, Zhi

doi:10.1007/978-981-13-1870-2_2

Lili Mou¹⁶ &
Zhi Jin¹⁷

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

1367 Accesses

Abstract

In this chapter, we introduce the background of neural networks and review related literature. Section 2.1 introduces the general neural network and its learning algorithm—backpropagation. Section 2.2 addresses specialty of natural language processing, and introduces neural language models and word embedding learning. Section 2.3 introduces existing structure-sensitive neural networks, including the convolutional neural network, recurrent neural network, and recursive neural network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The orthodox perceptron, introduced by Rosenblatt [40], only uses thresholding as the activation function, that is, if the weighted sum of input is less than a threshold, the perceptron outputs 0, or otherwise, 1. In this sense, the perceptron is a special type of neuron. However, we do not distinguish these two terminologies as they are very similar.
2.
We can unambiguously distinguish from the font if the target label is represented by the index or one-hot vector. Therefore, it is common to omit the superscripts “id” and “onehot.”
3.
The assumption is in fact trivial because every finite, discrete distribution is a multinomial distribution.
4.
The backpropagation equations are useful only when we implement backpropagation manually. Nowadays, mature auto-differentiation tools are available, e.g., TensorFlow abd pytorch, where backpropagation is handled automatically. However, it is still interesting to understand backpropagation from a mathematical perspective, and manual implementation is also a fun exercise.
5.
An interesting terminological abuse is that textbook stochastic gradient descent (SGD) usually refers to updating with a single data point, i.e., the batch size is 1, but that it may refer to mini-batch gradient descent in research papers with a batch size greater than 1. In our book, we follow the convention of the literature and abuse the two terminologies if needed.
6.
We denote \(w_i, w_{i+1}, \ldots , w_j\) by \(\varvec{w}_i^j\) for short.
7.
Subscripts I and O represent input and output, respectively.

References

Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems, pp. 153–160
Google Scholar
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Article Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)
Google Scholar
Bordes, A., Weston, J., Collobert, R., Bengio, Y.: Learning structured embeddings of knowledge bases. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence, pp. 301–306 (2011)
Google Scholar
Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches (2014). arXiv preprint arXiv:1409.1259
Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167 (2008)
Google Scholar
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signal Syst. 2(4), 303–314 (1989)
Article MathSciNet Google Scholar
Fu, R., Guo, J., Qin, B., Che, W., Wang, H., Liu, T.: Learning semantic hierarchies via word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 1199–1209 (2014)
Google Scholar
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649 (2013)
Google Scholar
Grbovic, M., Radosavljevic, V., Djuric, N., Bhamidipati, N., Savla, J., Bhagwan, V., Sharp, D.: E-commerce in your inbox: Product recommendations at scale. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1809–1818 (2015)
Google Scholar
Guo, J., Che, W., Wang, H., Liu, T.: Revisiting embedding features for simple semi-supervised learning. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, pp. 110–120 (2014)
Google Scholar
Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J. Mach. Learn. Res. 13(1), 307–361 (2012)
MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J., Franklin, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer (2009)
Google Scholar
Haykin, S.S., Haykin, S.S., Haykin, S.S., Haykin, S.S.: Neural Networks and Learning Machines. Pearson Education (2009)
Google Scholar
He, H., Gimpel, K., Lin, J.: Multi-perspective sentence similarity modeling with convolutional neural networks. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1576–1586 (2015)
Google Scholar
Hermann, K., Blunsom, P.: The role of syntax in vector space models of compositional semantics. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 894–904 (2013)
Google Scholar
Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hu, B., Lu, Z., Li, H., Chen, Q.: Convolutional neural network architectures for matching natural language sentences. In: Advances in Neural Information Processing Systems, pp. 2042–2050 (2014)
Google Scholar
Proakis, J.G., Manolakis, D.G.: Digital Signal Processing: Principles, Algorithms, and Applications. Pentice Hall (1996)
Google Scholar
Ji, Y., Eisenstein, J.: One vector is not enough: Entity-augmented distributed semantics for discourse relations. Trans. Assoc. Comput. Linguist. 3, 329–344 (2015)
Google Scholar
Jurafsky, D., Martin, J.: Speech and Language Processing. Pearson Education (2000)
Google Scholar
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 655–665 (2014)
Google Scholar
Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: Generalization gap and sharp minima. In: Proceedings fo the International Conference on Learning Representations (2017)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1746–1751 (2014)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (2015)
Google Scholar
Le, P., Zuidema, W.: Compositional distributional semantics with long short term memory (2015). arXiv preprint arXiv:1503.02510
Le, Q.V.: Building high-level features using large scale unsupervised learning. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8595–8598 (2013)
Google Scholar
LeCun, Y., Jackel, L., Bottou, L., Brunot, A., Cortes, C., Denker, J., Drucker, H., Guyon, I., Muller, U., Sackinger, E., et al.: Comparison of learning algorithms for handwritten digit recognition. In: Proceedings of the International Conference on Artificial Neural Networks, pp. 53–60 (1995)
Google Scholar
Lei, T., Barzilay, R., Jaakkola, T.: Molding CNNs for text: Non-linear, non-consecutive convolutions. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1565–1575 (2015)
Google Scholar
Li, W.: Random texts exhibit Zipf’s-law-like word frequency distribution. IEEE Trans. Inf. Theory 38(6), 1842–1845 (1992)
Article Google Scholar
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Proceedings ot the 11th Annual Conference of the International Speech Communication Association, pp. 1045–1048 (2010)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)
Google Scholar
Mou, L., Peng, H., Li, G., Xu, Y., Zhang, L., Jin, Z.: Discriminative neural sentence modeling by tree-based convolution. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2315–2325 (2015)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning, pp. 807–814 (2010)
Google Scholar
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks (2012). arXiv preprint arXiv:1211.5063
Peng, H., Mou, L., Li, G., Chen, Y., Lu, Y., Jin, Z.: A comparative study on regularization strategies for embedding-based neural networks. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2106–2111 (2015)
Google Scholar
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)
Google Scholar
Rosenblatt, F.: The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386 (1958)
Article Google Scholar
Socher, R., Huval, B., Manning, C., Ng, A.: Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1201–1211 (2012)
Google Scholar
Socher, R., Karpathy, A., Le, Q., Manning, C., Ng, A.: Grounded compositional semantics for finding and describing images with sentences. Trans. Assoc. Comput. Linguist. 2, 207–218 (2014)
Google Scholar
Socher, R., Pennington, J., Huang, E., Ng, A., Manning, C.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 151–161 (2011)
Google Scholar
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Google Scholar
Song, Y., Mou, L., Yan, R., Yi, L., Zhu, Z., Hu, X., Zhang, M.: Dialogue session segmentation by embedding-enhanced texttiling. Interspeech 2016, 2706–2710 (2016)
Article Google Scholar
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1139–1147 (2013)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar
Szeliski, R.: Computer Vision: Algorithms and Applications. Springer Science & Business Media (2010)
Google Scholar
Tai, K., Socher, R., Manning, D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, pp. 1556–1566 (2015)
Google Scholar
Tan, J., Wan, X., Xiao, J.: Abstractive document summarization with a graph-based attentional neural model. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1171–1181 (2017)
Google Scholar
Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)
Google Scholar
Thomas Laurent, J.v.B.: A recurrent neural network without chaos. In: Proceedings of the International Conference on Learning Representations (2017). https://openreview.net/forum?id=S1dIzvclg
Vincent, P.: A connection between score matching and denoising autoencoders. Neural Comput. 23(7), 1661–1674 (2011)
Article MathSciNet Google Scholar
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103 (2008)
Google Scholar
Webb, A.: Statistical Pattern Recognition. Wiley (2003)
Google Scholar
Xu, Y., Mou, L., Li, G., Chen, Y., Peng, H., Jin, Z.: Classifying relations via long short term memory networks along shortest dependency paths. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1785–1794 (2015)
Google Scholar
Zaremba, W., Sutskever, I.: Learning to execute (2014). arXiv preprint arXiv:1410.4615
Zeiler, M.D.: AdaDelta: An adaptive learning rate method (2012). arXiv preprint arXiv:1212.5701
Zhu, X., Sobhani, P., Guo, Y.: Long short-term memory over tree structures. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 1604–1612 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

AdeptMind Research, Toronto, ON, Canada
Lili Mou
Institute of Software, Peking University, Beijing, China
Zhi Jin

Authors

Lili Mou
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lili Mou .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mou, L., Jin, Z. (2018). Background and Related Work. In: Tree-Based Convolutional Neural Networks. SpringerBriefs in Computer Science. Springer, Singapore. https://doi.org/10.1007/978-981-13-1870-2_2

Download citation

DOI: https://doi.org/10.1007/978-981-13-1870-2_2
Published: 02 October 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1869-6
Online ISBN: 978-981-13-1870-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics