Skip to main content

Background and Related Work

  • Chapter
  • First Online:
Tree-Based Convolutional Neural Networks

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

  • 1367 Accesses

Abstract

In this chapter, we introduce the background of neural networks and review related literature. Section 2.1 introduces the general neural network and its learning algorithm—backpropagation. Section 2.2 addresses specialty of natural language processing, and introduces neural language models and word embedding learning. Section 2.3 introduces existing structure-sensitive neural networks, including the convolutional neural network, recurrent neural network, and recursive neural network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The orthodox perceptron, introduced by Rosenblatt [40], only uses thresholding as the activation function, that is, if the weighted sum of input is less than a threshold, the perceptron outputs 0, or otherwise, 1. In this sense, the perceptron is a special type of neuron. However, we do not distinguish these two terminologies as they are very similar.

  2. 2.

    We can unambiguously distinguish from the font if the target label is represented by the index or one-hot vector. Therefore, it is common to omit the superscripts “id” and “onehot.”

  3. 3.

    The assumption is in fact trivial because every finite, discrete distribution is a multinomial distribution.

  4. 4.

    The backpropagation equations are useful only when we implement backpropagation manually. Nowadays, mature auto-differentiation tools are available, e.g., TensorFlow abd pytorch, where backpropagation is handled automatically. However, it is still interesting to understand backpropagation from a mathematical perspective, and manual implementation is also a fun exercise.

  5. 5.

    An interesting terminological abuse is that textbook stochastic gradient descent (SGD) usually refers to updating with a single data point, i.e., the batch size is 1, but that it may refer to mini-batch gradient descent in research papers with a batch size greater than 1. In our book, we follow the convention of the literature and abuse the two terminologies if needed.

  6. 6.

    We denote \(w_i, w_{i+1}, \ldots , w_j\) by \(\varvec{w}_i^j\) for short.

  7. 7.

    Subscripts I and O represent input and output, respectively.

References

  1. Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  2. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems, pp. 153–160

    Google Scholar 

  3. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)

    Article  Google Scholar 

  4. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)

    Google Scholar 

  5. Bordes, A., Weston, J., Collobert, R., Bengio, Y.: Learning structured embeddings of knowledge bases. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence, pp. 301–306 (2011)

    Google Scholar 

  6. Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches (2014). arXiv preprint arXiv:1409.1259

  7. Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167 (2008)

    Google Scholar 

  8. Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signal Syst. 2(4), 303–314 (1989)

    Article  MathSciNet  Google Scholar 

  9. Fu, R., Guo, J., Qin, B., Che, W., Wang, H., Liu, T.: Learning semantic hierarchies via word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 1199–1209 (2014)

    Google Scholar 

  10. Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649 (2013)

    Google Scholar 

  11. Grbovic, M., Radosavljevic, V., Djuric, N., Bhamidipati, N., Savla, J., Bhagwan, V., Sharp, D.: E-commerce in your inbox: Product recommendations at scale. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1809–1818 (2015)

    Google Scholar 

  12. Guo, J., Che, W., Wang, H., Liu, T.: Revisiting embedding features for simple semi-supervised learning. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, pp. 110–120 (2014)

    Google Scholar 

  13. Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J. Mach. Learn. Res. 13(1), 307–361 (2012)

    MathSciNet  MATH  Google Scholar 

  14. Hastie, T., Tibshirani, R., Friedman, J., Franklin, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer (2009)

    Google Scholar 

  15. Haykin, S.S., Haykin, S.S., Haykin, S.S., Haykin, S.S.: Neural Networks and Learning Machines. Pearson Education (2009)

    Google Scholar 

  16. He, H., Gimpel, K., Lin, J.: Multi-perspective sentence similarity modeling with convolutional neural networks. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1576–1586 (2015)

    Google Scholar 

  17. Hermann, K., Blunsom, P.: The role of syntax in vector space models of compositional semantics. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 894–904 (2013)

    Google Scholar 

  18. Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  Google Scholar 

  19. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  20. Hu, B., Lu, Z., Li, H., Chen, Q.: Convolutional neural network architectures for matching natural language sentences. In: Advances in Neural Information Processing Systems, pp. 2042–2050 (2014)

    Google Scholar 

  21. Proakis, J.G., Manolakis, D.G.: Digital Signal Processing: Principles, Algorithms, and Applications. Pentice Hall (1996)

    Google Scholar 

  22. Ji, Y., Eisenstein, J.: One vector is not enough: Entity-augmented distributed semantics for discourse relations. Trans. Assoc. Comput. Linguist. 3, 329–344 (2015)

    Google Scholar 

  23. Jurafsky, D., Martin, J.: Speech and Language Processing. Pearson Education (2000)

    Google Scholar 

  24. Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 655–665 (2014)

    Google Scholar 

  25. Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: Generalization gap and sharp minima. In: Proceedings fo the International Conference on Learning Representations (2017)

    Google Scholar 

  26. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1746–1751 (2014)

    Google Scholar 

  27. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (2015)

    Google Scholar 

  28. Le, P., Zuidema, W.: Compositional distributional semantics with long short term memory (2015). arXiv preprint arXiv:1503.02510

  29. Le, Q.V.: Building high-level features using large scale unsupervised learning. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8595–8598 (2013)

    Google Scholar 

  30. LeCun, Y., Jackel, L., Bottou, L., Brunot, A., Cortes, C., Denker, J., Drucker, H., Guyon, I., Muller, U., Sackinger, E., et al.: Comparison of learning algorithms for handwritten digit recognition. In: Proceedings of the International Conference on Artificial Neural Networks, pp. 53–60 (1995)

    Google Scholar 

  31. Lei, T., Barzilay, R., Jaakkola, T.: Molding CNNs for text: Non-linear, non-consecutive convolutions. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1565–1575 (2015)

    Google Scholar 

  32. Li, W.: Random texts exhibit Zipf’s-law-like word frequency distribution. IEEE Trans. Inf. Theory 38(6), 1842–1845 (1992)

    Article  Google Scholar 

  33. Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Proceedings ot the 11th Annual Conference of the International Speech Communication Association, pp. 1045–1048 (2010)

    Google Scholar 

  34. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)

    Google Scholar 

  35. Mou, L., Peng, H., Li, G., Xu, Y., Zhang, L., Jin, Z.: Discriminative neural sentence modeling by tree-based convolution. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2315–2325 (2015)

    Google Scholar 

  36. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning, pp. 807–814 (2010)

    Google Scholar 

  37. Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks (2012). arXiv preprint arXiv:1211.5063

  38. Peng, H., Mou, L., Li, G., Chen, Y., Lu, Y., Jin, Z.: A comparative study on regularization strategies for embedding-based neural networks. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2106–2111 (2015)

    Google Scholar 

  39. Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)

    Google Scholar 

  40. Rosenblatt, F.: The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386 (1958)

    Article  Google Scholar 

  41. Socher, R., Huval, B., Manning, C., Ng, A.: Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1201–1211 (2012)

    Google Scholar 

  42. Socher, R., Karpathy, A., Le, Q., Manning, C., Ng, A.: Grounded compositional semantics for finding and describing images with sentences. Trans. Assoc. Comput. Linguist. 2, 207–218 (2014)

    Google Scholar 

  43. Socher, R., Pennington, J., Huang, E., Ng, A., Manning, C.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 151–161 (2011)

    Google Scholar 

  44. Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)

    Google Scholar 

  45. Song, Y., Mou, L., Yan, R., Yi, L., Zhu, Z., Hu, X., Zhang, M.: Dialogue session segmentation by embedding-enhanced texttiling. Interspeech 2016, 2706–2710 (2016)

    Article  Google Scholar 

  46. Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1139–1147 (2013)

    Google Scholar 

  47. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)

    Google Scholar 

  48. Szeliski, R.: Computer Vision: Algorithms and Applications. Springer Science & Business Media (2010)

    Google Scholar 

  49. Tai, K., Socher, R., Manning, D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, pp. 1556–1566 (2015)

    Google Scholar 

  50. Tan, J., Wan, X., Xiao, J.: Abstractive document summarization with a graph-based attentional neural model. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1171–1181 (2017)

    Google Scholar 

  51. Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)

    Google Scholar 

  52. Thomas Laurent, J.v.B.: A recurrent neural network without chaos. In: Proceedings of the International Conference on Learning Representations (2017). https://openreview.net/forum?id=S1dIzvclg

  53. Vincent, P.: A connection between score matching and denoising autoencoders. Neural Comput. 23(7), 1661–1674 (2011)

    Article  MathSciNet  Google Scholar 

  54. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103 (2008)

    Google Scholar 

  55. Webb, A.: Statistical Pattern Recognition. Wiley (2003)

    Google Scholar 

  56. Xu, Y., Mou, L., Li, G., Chen, Y., Peng, H., Jin, Z.: Classifying relations via long short term memory networks along shortest dependency paths. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1785–1794 (2015)

    Google Scholar 

  57. Zaremba, W., Sutskever, I.: Learning to execute (2014). arXiv preprint arXiv:1410.4615

  58. Zeiler, M.D.: AdaDelta: An adaptive learning rate method (2012). arXiv preprint arXiv:1212.5701

  59. Zhu, X., Sobhani, P., Guo, Y.: Long short-term memory over tree structures. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 1604–1612 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lili Mou .

Rights and permissions

Reprints and permissions

Copyright information

© 2018 The Author(s)

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mou, L., Jin, Z. (2018). Background and Related Work. In: Tree-Based Convolutional Neural Networks. SpringerBriefs in Computer Science. Springer, Singapore. https://doi.org/10.1007/978-981-13-1870-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1870-2_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1869-6

  • Online ISBN: 978-981-13-1870-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics