Advertisement

Question Classification with Untrained Recurrent Embeddings

  • Daniele Di Sarli
  • Claudio Gallicchio
  • Alessio MicheliEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11946)

Abstract

Recurrent Neural Networks (RNNs) are at the foundation of many state-of-the-art results in text classification. However, to be effective in practical applications, they often require the use of sophisticated architectures and training techniques, such as gating mechanisms and pre-training by autoencoders or language modeling, with typically high computational cost. In this work, we show that such techniques could actually be not always necessary. In fact, our experimental results on a Question Classification task indicate that using state-of-the-art Reservoir Computing approaches for RNN design, it is possible to achieve competitive or comparable accuracy with a considerable advantage in terms of required training times.

Keywords

Text classification Recurrent Neural Networks Echo State Networks 

References

  1. 1.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.0473
  2. 2.
    Bengio, Y., Simard, P.Y., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)CrossRefGoogle Scholar
  3. 3.
    Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a Meeting Held at Granada, Spain, 12–14 December 2011, pp. 2546–2554 (2011)Google Scholar
  4. 4.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Bianchi, F.M., Scardapane, S., Løkse, S., Jenssen, R.: Bidirectional deep-readout echo state networks. In: 26th European Symposium on Artificial Neural Networks, ESANN 2018 (2018)Google Scholar
  6. 6.
    Boedecker, J., Obst, O., Mayer, N.M., Asada, M.: Studies on reservoir initialization and dynamics shaping in echo state networks. In: Proceedings of the 17th European Symposium on Artificial Neural Networks (ESANN), pp. 227–232. d-side publi (2009)Google Scholar
  7. 7.
    Cer, D., et al.: Universal sentence encoder for English. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018: System Demonstrations, pp. 169–174. Association for Computational Linguistics (2018)Google Scholar
  8. 8.
    Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, pp. 1724–1734. ACL (2014)Google Scholar
  9. 9.
    Croce, D., Filice, S., Basili, R.: On the impact of linguistic information in kernel-based deep architectures. In: Esposito, F., Basili, R., Ferilli, S., Lisi, F. (eds.) AI*IA 2017. LNCS, vol. 10640, pp. 359–371. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-70169-1_27CrossRefGoogle Scholar
  10. 10.
    Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
  11. 11.
    Gallicchio, C., Micheli, A.: Architectural and Markovian factors of echo state networks. Neural Netw. 24(5), 440–456 (2011)CrossRefGoogle Scholar
  12. 12.
    Gallicchio, C., Micheli, A.: Deep reservoir computing: a critical analysis. In: 24th European Symposium on Artificial Neural Networks, ESANN 2016 (2016)Google Scholar
  13. 13.
    Gallicchio, C., Micheli, A., Pedrelli, L.: Deep reservoir computing: a critical experimental analysis. Neurocomputing 268, 87–99 (2017)CrossRefGoogle Scholar
  14. 14.
    Grave, E., Bojanowski, P., Gupta, P., Joulin, A., Mikolov, T.: Learning word vectors for 157 languages. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018) (2018)Google Scholar
  15. 15.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  16. 16.
    Jaeger, H.: The “echo state” approach to analysing and training recurrent neural networks - with an erratum note. Technical report. German National Research Center for Information Technology GMD, Bonn, Germany (2001)Google Scholar
  17. 17.
    Jaeger, H., Haas, H.: Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 304(5667), 78–80 (2004)CrossRefGoogle Scholar
  18. 18.
    Jaeger, H., Lukosevicius, M., Popovici, D., Siewert, U.: Optimization and applications of echo state networks with leaky-integrator neurons. Neural Netw. 20(3), 335–352 (2007).  https://doi.org/10.1016/j.neunet.2007.04.016CrossRefzbMATHGoogle Scholar
  19. 19.
    Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, Volume 1: Long Papers, pp. 655–665. The Association for Computer Linguistics (2014)Google Scholar
  20. 20.
    Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, pp. 1746–1751. ACL (2014)Google Scholar
  21. 21.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings (2015)Google Scholar
  22. 22.
    Kirkpatrick, S., Gelatt, D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Lei, Z., Yang, Y., Yang, M., Liu, Y.: A multi-sentiment-resource enhanced attention network for sentiment classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Volume 2: Short Papers, pp. 758–763. Association for Computational Linguistics (2018)Google Scholar
  24. 24.
    Li, X., Roth, D.: Learning question classifiers. In: 19th International Conference on Computational Linguistics, COLING 2002 (2002)Google Scholar
  25. 25.
    Lin, Z., et al.: A structured self-attentive sentence embedding. In: 5th International Conference on Learning Representations, ICLR 2017, Conference Track Proceedings (2017)Google Scholar
  26. 26.
    Lukosevicius, M., Jaeger, H.: Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3(3), 127–149 (2009)CrossRefGoogle Scholar
  27. 27.
    Paszke, A., et al.: Automatic differentiation in PyTorch (2017). https://openreview.net/forum?id=BJJsrmfCZ
  28. 28.
    Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019). https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
  29. 29.
    Sachan, D.S., Zaheer, M., Salakhutdinov, R.: Revisiting LSTM networks for semi-supervised text classification via mixed objective function. In: AAAI 2019 (2019)CrossRefGoogle Scholar
  30. 30.
    Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)CrossRefGoogle Scholar
  31. 31.
    da Silva, J.P.C.G., Coheur, L., Mendes, A.C., Wichert, A.: From symbolic to sub-symbolic information in question classification. Artif. Intell. Rev. 35(2), 137–154 (2011)CrossRefGoogle Scholar
  32. 32.
    Strauss, T., Wustlich, W., Labahn, R.: Design strategies for weight matrices of echo state networks. Neural Comput. 24(12), 3246–3276 (2012)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Strubell, E., Ganesh, A., McCallum, A.: Energy and policy considerations for deep learning in NLP. In: ACL, no. 1, pp. 3645–3650. Association for Computational Linguistics (2019)Google Scholar
  34. 34.
    Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp. 6000–6010 (2017)Google Scholar
  35. 35.
    Verstraeten, D., Schrauwen, B., D’Haene, M., Stroobandt, D.: An experimental unification of reservoir computing methods. Neural Netw. 20(3), 391–403 (2007)CrossRefGoogle Scholar
  36. 36.
    Yildiz, I.B., Jaeger, H., Kiebel, S.J.: Re-visiting the echo state property. Neural Netw. 35, 1–9 (2012)CrossRefGoogle Scholar
  37. 37.
    Zhao, H., Lu, Z., Poupart, P.: Self-adaptive hierarchical sentence model. In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, pp. 4069–4076. AAAI Press (2015)Google Scholar
  38. 38.
    Zhou, C., Sun, C., Liu, Z., Lau, F.C.M.: A C-LSTM neural network for text classification. CoRR abs/1511.08630 (2015). http://arxiv.org/abs/1511.08630

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of PisaPisaItaly

Personalised recommendations