Advertisement

Background and Related Work

  • Lili MouEmail author
  • Zhi Jin
Chapter
Part of the SpringerBriefs in Computer Science book series (BRIEFSCOMPUTER)

Abstract

In this chapter, we introduce the background of neural networks and review related literature. Section 2.1 introduces the general neural network and its learning algorithm—backpropagation. Section 2.2 addresses specialty of natural language processing, and introduces neural language models and word embedding learning. Section 2.3 introduces existing structure-sensitive neural networks, including the convolutional neural network, recurrent neural network, and recursive neural network.

Keywords

Neural network Neural language modeling Word embeddings Convolutional neural network Recurrent neural network Recursive neural network 

References

  1. 1.
    Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)zbMATHGoogle Scholar
  2. 2.
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems, pp. 153–160Google Scholar
  3. 3.
    Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)CrossRefGoogle Scholar
  4. 4.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)Google Scholar
  5. 5.
    Bordes, A., Weston, J., Collobert, R., Bengio, Y.: Learning structured embeddings of knowledge bases. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence, pp. 301–306 (2011)Google Scholar
  6. 6.
    Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches (2014). arXiv preprint arXiv:1409.1259
  7. 7.
    Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167 (2008)Google Scholar
  8. 8.
    Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signal Syst. 2(4), 303–314 (1989)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Fu, R., Guo, J., Qin, B., Che, W., Wang, H., Liu, T.: Learning semantic hierarchies via word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 1199–1209 (2014)Google Scholar
  10. 10.
    Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649 (2013)Google Scholar
  11. 11.
    Grbovic, M., Radosavljevic, V., Djuric, N., Bhamidipati, N., Savla, J., Bhagwan, V., Sharp, D.: E-commerce in your inbox: Product recommendations at scale. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1809–1818 (2015)Google Scholar
  12. 12.
    Guo, J., Che, W., Wang, H., Liu, T.: Revisiting embedding features for simple semi-supervised learning. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, pp. 110–120 (2014)Google Scholar
  13. 13.
    Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J. Mach. Learn. Res. 13(1), 307–361 (2012)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Hastie, T., Tibshirani, R., Friedman, J., Franklin, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer (2009)Google Scholar
  15. 15.
    Haykin, S.S., Haykin, S.S., Haykin, S.S., Haykin, S.S.: Neural Networks and Learning Machines. Pearson Education (2009)Google Scholar
  16. 16.
    He, H., Gimpel, K., Lin, J.: Multi-perspective sentence similarity modeling with convolutional neural networks. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1576–1586 (2015)Google Scholar
  17. 17.
    Hermann, K., Blunsom, P.: The role of syntax in vector space models of compositional semantics. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 894–904 (2013)Google Scholar
  18. 18.
    Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  20. 20.
    Hu, B., Lu, Z., Li, H., Chen, Q.: Convolutional neural network architectures for matching natural language sentences. In: Advances in Neural Information Processing Systems, pp. 2042–2050 (2014)Google Scholar
  21. 21.
    Proakis, J.G., Manolakis, D.G.: Digital Signal Processing: Principles, Algorithms, and Applications. Pentice Hall (1996)Google Scholar
  22. 22.
    Ji, Y., Eisenstein, J.: One vector is not enough: Entity-augmented distributed semantics for discourse relations. Trans. Assoc. Comput. Linguist. 3, 329–344 (2015)Google Scholar
  23. 23.
    Jurafsky, D., Martin, J.: Speech and Language Processing. Pearson Education (2000)Google Scholar
  24. 24.
    Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 655–665 (2014)Google Scholar
  25. 25.
    Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: Generalization gap and sharp minima. In: Proceedings fo the International Conference on Learning Representations (2017)Google Scholar
  26. 26.
    Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1746–1751 (2014)Google Scholar
  27. 27.
    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (2015)Google Scholar
  28. 28.
    Le, P., Zuidema, W.: Compositional distributional semantics with long short term memory (2015). arXiv preprint arXiv:1503.02510
  29. 29.
    Le, Q.V.: Building high-level features using large scale unsupervised learning. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8595–8598 (2013)Google Scholar
  30. 30.
    LeCun, Y., Jackel, L., Bottou, L., Brunot, A., Cortes, C., Denker, J., Drucker, H., Guyon, I., Muller, U., Sackinger, E., et al.: Comparison of learning algorithms for handwritten digit recognition. In: Proceedings of the International Conference on Artificial Neural Networks, pp. 53–60 (1995)Google Scholar
  31. 31.
    Lei, T., Barzilay, R., Jaakkola, T.: Molding CNNs for text: Non-linear, non-consecutive convolutions. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1565–1575 (2015)Google Scholar
  32. 32.
    Li, W.: Random texts exhibit Zipf’s-law-like word frequency distribution. IEEE Trans. Inf. Theory 38(6), 1842–1845 (1992)CrossRefGoogle Scholar
  33. 33.
    Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Proceedings ot the 11th Annual Conference of the International Speech Communication Association, pp. 1045–1048 (2010)Google Scholar
  34. 34.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)Google Scholar
  35. 35.
    Mou, L., Peng, H., Li, G., Xu, Y., Zhang, L., Jin, Z.: Discriminative neural sentence modeling by tree-based convolution. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2315–2325 (2015)Google Scholar
  36. 36.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning, pp. 807–814 (2010)Google Scholar
  37. 37.
    Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks (2012). arXiv preprint arXiv:1211.5063
  38. 38.
    Peng, H., Mou, L., Li, G., Chen, Y., Lu, Y., Jin, Z.: A comparative study on regularization strategies for embedding-based neural networks. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2106–2111 (2015)Google Scholar
  39. 39.
    Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)Google Scholar
  40. 40.
    Rosenblatt, F.: The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386 (1958)CrossRefGoogle Scholar
  41. 41.
    Socher, R., Huval, B., Manning, C., Ng, A.: Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1201–1211 (2012)Google Scholar
  42. 42.
    Socher, R., Karpathy, A., Le, Q., Manning, C., Ng, A.: Grounded compositional semantics for finding and describing images with sentences. Trans. Assoc. Comput. Linguist. 2, 207–218 (2014)Google Scholar
  43. 43.
    Socher, R., Pennington, J., Huang, E., Ng, A., Manning, C.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 151–161 (2011)Google Scholar
  44. 44.
    Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)Google Scholar
  45. 45.
    Song, Y., Mou, L., Yan, R., Yi, L., Zhu, Z., Hu, X., Zhang, M.: Dialogue session segmentation by embedding-enhanced texttiling. Interspeech 2016, 2706–2710 (2016)CrossRefGoogle Scholar
  46. 46.
    Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1139–1147 (2013)Google Scholar
  47. 47.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)Google Scholar
  48. 48.
    Szeliski, R.: Computer Vision: Algorithms and Applications. Springer Science & Business Media (2010)Google Scholar
  49. 49.
    Tai, K., Socher, R., Manning, D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, pp. 1556–1566 (2015)Google Scholar
  50. 50.
    Tan, J., Wan, X., Xiao, J.: Abstractive document summarization with a graph-based attentional neural model. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1171–1181 (2017)Google Scholar
  51. 51.
    Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)Google Scholar
  52. 52.
    Thomas Laurent, J.v.B.: A recurrent neural network without chaos. In: Proceedings of the International Conference on Learning Representations (2017). https://openreview.net/forum?id=S1dIzvclg
  53. 53.
    Vincent, P.: A connection between score matching and denoising autoencoders. Neural Comput. 23(7), 1661–1674 (2011)MathSciNetCrossRefGoogle Scholar
  54. 54.
    Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103 (2008)Google Scholar
  55. 55.
    Webb, A.: Statistical Pattern Recognition. Wiley (2003)Google Scholar
  56. 56.
    Xu, Y., Mou, L., Li, G., Chen, Y., Peng, H., Jin, Z.: Classifying relations via long short term memory networks along shortest dependency paths. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1785–1794 (2015)Google Scholar
  57. 57.
    Zaremba, W., Sutskever, I.: Learning to execute (2014). arXiv preprint arXiv:1410.4615
  58. 58.
    Zeiler, M.D.: AdaDelta: An adaptive learning rate method (2012). arXiv preprint arXiv:1212.5701
  59. 59.
    Zhu, X., Sobhani, P., Guo, Y.: Long short-term memory over tree structures. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 1604–1612 (2015)Google Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.AdeptMind ResearchTorontoCanada
  2. 2.Institute of SoftwarePeking UniversityBeijingChina

Personalised recommendations