Skip to main content

Bidirectional internal memory gate recurrent neural networks for spoken language understanding

Abstract

Recurrent neural networks have encountered a wide success in different domains due to their high capability to code short- and long-term dependencies between basic features of a sequence. Different RNN units have been proposed to well manage the term dependencies with an efficient algorithm that requires few basic operations to reduce the processing time needed to learn the model. Among these units, the internal memory gate (IMG) have produce efficient accuracies faster than LSTM and GRU during a SLU task. This paper presents the bidirectional internal memory gate recurrent neural network (BIMG) that codes short- and long-term dependencies in forward and backward directions. Indeed, the BIMG is composed with IMG cells made of an unique gate managing short- and long-term dependencies by combining the advantages of the LSTM, GRU (short- and long-term dependencies) and the leaky unit (LU) (fast learning). The effectiveness and the robustness of the proposed BIMG-RNN is evaluated during a theme identification task of telephone conversations. The experiments show that BIMG reaches better accuracies than BGRU and BLSTM with a gain of 1.1 and a gain of 2.1 with IMG model. Moreover, BIMG requires less processing time than BGRU and BLSTM with a gain of 12% and 35% respectively.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    Best configuration observed on development set (number of neurons in the hidden layer) applied to test data set.

  2. 2.

    Best configuration observed on development set (number of neurons in the hidden layer) applied to test data set.

References

  1. Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.

  2. Bechet, F., Maza, B., Bigouroux, N., Bazillon, T., El-Beze, M., De Mori, R., & Arbillot, E. (2012). Decoda: a call-centre human-human spoken conversation corpus. In Proceedings of the LREC’12.

  3. Bengio, Y., Boulanger-Lewandowski, N., & Pascanu, R. (2013). Advances in optimizing recurrent networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8624–8628. IEEE.

  4. Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(Feb), 1137–1155.

    MATH  Google Scholar 

  5. Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166.

    Article  Google Scholar 

  6. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014a) Learning phrase representations using rnn encoder–decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.

  7. Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014b). On the properties of neural machine translation: Encoder–decoder approaches. arXiv preprint arXiv:1409.1259.

  8. Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.

  9. Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211.

    Article  Google Scholar 

  10. Fernández, S., Graves, A., & Schmidhuber, J. (2007). An application of recurrent neural networks to discriminative keyword spotting. In Proceedings of the Artificial Neural Networks—ICANN 2007, pp. 220–229. Springer.

  11. Gers, F. A., Eck, D., & Schmidhuber, J. (2001). Applying lstm to time series predictable through time-window approaches. In Proceedings of the Artificial Neural Networks ICANN 2001, pp. 669–676. Springer.

  12. Godfrey, J. J., Holliman, E. C., & McDaniel, J. (1992). Switchboard: Telephone speech corpus for research and development. In Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1992. ICASSP-92, vol. 1, pp. 517–520. IEEE.

  13. Graves, A., Fernández, S., & Schmidhuber, J. (2005). Bidirectional lSTM networks for improved phoneme classification and recognition. In Proceedings of the Artificial Neural Networks: Formal Models and Their Applications—ICANN 2005, pp. 799–804. Springer.

  14. Graves, A., Mohamed, A. r., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE.

  15. Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks, 18(5), 602–610.

    Article  Google Scholar 

  16. Hochreiter, S. (1998). The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 6(02), 107–116.

    Article  Google Scholar 

  17. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

    Article  Google Scholar 

  18. Linarès, G., Nocéra, P., Massonie, D., & Matrouf, D. (2007). The lia speech recognition system: from 10xrt to 1xrt. In Proceedings of the Text, Speech and Dialogue, pp. 302–308. Springer.

  19. Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011). Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics.

  20. Mikolov, T. (2012). Statistical language models based on neural networks. Presentation at Google, Mountain View, 2nd April.

  21. Morchid, M. (2017). Internal memory gate for recurrent neural networks with application to spoken language understanding. Proceedings of the Interspeech, pp. 3316–3319, 2017.

  22. Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673–2681.

    Article  Google Scholar 

  23. Sundermeyer, M., Schlüter, R., & Ney, H. (2012). LSTM neural networks for language modeling. In Proceedings of the INTERSPEECH, pp. 194–197.

  24. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems, pp. 3104–3112.

  25. Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015) Show and tell: A neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164.

  26. Vukotic, V., Raymond, C., & Gravier, G. (2016). A step beyond local observations with a dialog aware bidirectional GRU network for spoken language understanding. In: Proceedings of the Interspeech.

  27. Wöllmer, M., Eyben, F., Reiter, S., Schuller, B., Cox, C., Douglas-Cowie, E., & Cowie, R. (2008). Abandoning emotion classes-towards continuous emotion recognition with modelling of long-range dependencies. In Proceedings of the INTERSPEECH, vol. 2008, pp. 597–600. Citeseer.

Download references

Funding

This work has been funded by the AISSPER project supported by the French National Research Agency (ANR) under contract ANR-19-CE23-0004-01.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Mohamed Morchid.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Morchid, M. Bidirectional internal memory gate recurrent neural networks for spoken language understanding. Int J Speech Technol (2020). https://doi.org/10.1007/s10772-020-09708-9

Download citation

Keywords

  • Bidirectional recurrent neural network
  • Internal memory gate
  • Spoken language understanding