Advertisement

Densely Connected Bidirectional LSTM with Applications to Sentence Classification

  • Zixiang Ding
  • Rui XiaEmail author
  • Jianfei Yu
  • Xiang Li
  • Jian Yang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11109)

Abstract

Deep neural networks have recently been shown to achieve highly competitive performance in many computer vision tasks due to their abilities of exploring in a much larger hypothesis space. However, since most deep architectures like stacked RNNs tend to suffer from the vanishing-gradient and overfitting problems, their effects are still understudied in many NLP tasks. Inspired by this, we propose a novel multi-layer RNN model called densely connected bidirectional long short-term memory (DC-Bi-LSTM) in this paper, which essentially represents each layer by the concatenation of its hidden state and all preceding layers hidden states, followed by recursively passing each layers representation to all subsequent layers. We evaluate our proposed model on five benchmark datasets of sentence classification. DC-Bi-LSTM with depth up to 20 can be successfully trained and obtain significant improvements over the traditional Bi-LSTM with the same or even fewer parameters. Moreover, our model has promising performance compared with the state-of-the-art approaches.

Keywords

Sentence classification Densely connected Stacked RNNs 

Notes

Acknowledgments

The work was supported by the Natural Science Foundation of China (No. 61672288), and the Natural Science Foundation of Jiangsu Province for Excellent Young Scholars (No. BK20160085).

References

  1. 1.
    El Hihi, S., Bengio, Y.: Hierarchical recurrent neural networks for long-term dependencies. In: NIPS (1996)Google Scholar
  2. 2.
    Graves, A., Mohamed, A.r., Hinton, G.: Speech recognition with deep recurrent neural networks. In: ICASSP (2013)Google Scholar
  3. 3.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  4. 4.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  5. 5.
    Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: CVPR (2017)Google Scholar
  6. 6.
    Irsoy, O., Cardie, C.: Deep recursive neural networks for compositionality in language. In: NIPS (2014)Google Scholar
  7. 7.
    Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014)
  8. 8.
    Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
  9. 9.
    Li, X., Roth, D.: Learning question classifiers. In: COLING (2002)Google Scholar
  10. 10.
    Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: ACL (2004)Google Scholar
  11. 11.
    Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: ACL (2005)Google Scholar
  12. 12.
    Qian, Q., Huang, M., Lei, J., Zhu, X.: Linguistically regularized LSTMs for sentiment classification. arXiv preprint arXiv:1611.03949 (2016)
  13. 13.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Schmidhuber, J.: Learning complex, extended sequences using the principle of history compression. Neural Comput. 4(2), 234–242 (1992)CrossRefGoogle Scholar
  15. 15.
    Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP (2013)Google Scholar
  16. 16.
    Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: NIPS (2015)Google Scholar
  17. 17.
    Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)Google Scholar
  18. 18.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)Google Scholar
  19. 19.
    Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075 (2015)
  20. 20.
    Yin, W., Schütze, H.: Multichannel variable-size convolution for sentence classification. arXiv preprint arXiv:1603.04513 (2016)
  21. 21.
    Yu, M., Yin, W., Hasan, K.S., dos Santos, C., Xiang, B., Zhou, B.: Improved neural relation detection for knowledge base question answering. arXiv preprint arXiv:1704.06194 (2017)
  22. 22.
    Zhang, R., Lee, H., Radev, D.: Dependency sensitive convolutional neural networks for modeling sentences and documents. arXiv preprint arXiv:1611.02361 (2016)
  23. 23.
    Zhang, Y., Chen, G., Yu, D., Yaco, K., Khudanpur, S., Glass, J.: Highway long short-term memory RNNs for distant speech recognition. In: ICASSP (2016)Google Scholar
  24. 24.
    Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., Xu, B.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv preprint arXiv:1611.06639 (2016)

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Zixiang Ding
    • 1
  • Rui Xia
    • 1
    Email author
  • Jianfei Yu
    • 2
  • Xiang Li
    • 1
  • Jian Yang
    • 1
  1. 1.School of Computer Science and EngineeringNanjing University of Science and TechnologyNanjingChina
  2. 2.School of Information SystemsSingapore Management UniversitySingaporeSingapore

Personalised recommendations