Hierarchical Parameter Sharing in Recursive Neural Networks with Long Short-Term Memory

  • Fengyu Li
  • Mingmin Chi
  • Dong Wu
  • Junyu Niu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10635)


Parameter Sharing (or weight sharing) is widely used in Neural Networks, such as Recursive Neural Networks (RvNNs) and its variants, to control model complexities and extract prior knowledge. The parameter sharing in RvNNs for language model assumes that non-leaf nodes in treebanks are generated by similar semantic compositionality, where hidden units of all the non-leaf nodes in RvNNs share model parameters. However, treebanks have several semantic levels with significantly different semantic compositionality. Accordingly, this leads to a poor classification performance if nodes in high semantic levels share the same parameters with those in low levels. In the paper, a novel parameter sharing strategy in a hierarchical manner is proposed over Long Short-Term Memory (LSTM) cells in Recursive Neural Networks, denoted as shLSTM-RvNN, in which weight connections in hidden units are clustered according to hierarchical semantic levels defined in Penn Treebank tagsets. Accordingly, the parameters in the same semantic level can be shared but those in different semantic levels should have different sets of connections weights. The proposed shLSTM-RvNN model is evaluated in benchmark data sets containing semantic compositionality. Empirical results show that the shLSTM-RvNN model increases classification accuracies but significantly reduces time complexities.


Recursive neural networks Long short-term memory networks Sentiment analysis Parameter sharing 



This work was supported in part by the Natural Science Foundation of China under Contract 71331005 and in part by the State Key Research and Development Program of China under Contract 2016YFE0100300.


  1. 1.
    Rumelhart, D.E., Mcclelland, J.L.: Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations. Parallel Distributed Processing (1986)Google Scholar
  2. 2.
    Lang, K.J., Waibel, A.H., Hinton, G.E.: A time-delay neural network architecture for isolated word recognition. Neural Netw. 3(1), 23–43 (1990)CrossRefGoogle Scholar
  3. 3.
    Lecun, Y.: Generalization and network design strategies. Connectionism in Perspective, pp. 143–155 (1989)Google Scholar
  4. 4.
    Nowlan, S.J., Hinton, G.E.: Simplifying neural networks by soft weight-sharing. Neural Comput. 4(4), 473–493 (1992)CrossRefGoogle Scholar
  5. 5.
    Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)MATHGoogle Scholar
  6. 6.
    Robinson, A.J.: An application of recurrent nets to phone probability estimation. IEEE Trans. Neural Netw. 5(2), 298–305 (1994)CrossRefGoogle Scholar
  7. 7.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley (2012)Google Scholar
  8. 8.
    Socher, R., Manning, C.D., Ng, A.Y.: Learning continuous phrase representations and syntactic parsing with recursive neural networks. In: Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop, pp. 1–9 (2010)Google Scholar
  9. 9.
    Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)Google Scholar
  10. 10.
    Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)CrossRefGoogle Scholar
  11. 11.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  12. 12.
    Clark, A., Fox, C., Lappin, S. (eds.).: The Handbook of Computational Linguistics and Natural Language Processing. Wiley, Malden (2013)Google Scholar
  13. 13.
    Zhu, X., Sobihani, P., Guo, H.: Long short-term memory over recursive structures. In: International Conference on Machine Learning, pp. 1604–1612 (2015)Google Scholar
  14. 14.
    Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of Association for Computational Linguistics (2015)Google Scholar
  15. 15.
    Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM (1999)Google Scholar
  16. 16.
    Gers, F.A., Schmidhuber, J.: Recurrent nets that time and count. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference, vol. 3, pp. 189–194 (2000)Google Scholar
  17. 17.
    Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)CrossRefGoogle Scholar
  18. 18.
    Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment Treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)Google Scholar
  19. 19.
    Bies, A., Ferguson, M., Katz, K., Macintyre, R., Tredinnick, V., Kim, G., Schasberger, B.: Bracketing guidelines for Treebank II style penn Treebank project. University of Pennsylvania, pp. 97–100 (1995)Google Scholar
  20. 20.
    De Marneffe, M.C., Manning, C.D.: Stanford typed dependencies manual. Technical report, Stanford University, pp. 338–345 (2008)Google Scholar
  21. 21.
    Chen, D., Manning, C.D.: A fast and accurate dependency parser using neural networks. In: Conference on Empirical Methods in Natural Language Processing, pp. 740–750 (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Software SchoolFudan UniversityShanghaiChina
  2. 2.School of Computer ScienceFudan UniversityShanghaiChina

Personalised recommendations