Advertisement

A Deep Learning Based Multi-task Ensemble Model for Intent Detection and Slot Filling in Spoken Language Understanding

  • Mauajama Firdaus
  • Shobhit Bhatnagar
  • Asif Ekbal
  • Pushpak Bhattacharyya
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11304)

Abstract

An important component of every dialog system is understanding the language popularly known as Spoken Language Understanding (SLU). Intent detection (ID) and slot filling (SF) are the two very important and inter-related tasks of SLU. In this paper, we propose a deep learning based multi-task ensemble model that can perform both intent detection and slot filling tasks together. We use a deep bi-directional recurrent neural network (RNN) with long short term memory (LSTM) and gated recurrent unit (GRU) as the base-level classifiers. A multi-layer perceptron (MLP) framework is used to combine the outputs together. A combined word embedding representation is used to train the model obtained from both Glove and word2vec. This is further augmented with the syntactic Part-of-Speech (PoS) information. On the benchmark ATIS dataset, our experiments show that the proposed ensemble multi-task model (MTM) achieves better results than the individual models and the existing state-of-the-art systems. Experiments on the another dataset, TRAINS also proves that the proposed multi-task ensemble model is more effective compared to the individual models.

Keywords

Intent detection Slot filling Deep learning Ensemble Multi-task 

Notes

Acknowledgment

Asif Ekbal acknowledges Young Faculty Research Fellowship (YFRF), funded by Visvesvaraya PhD scheme for Electronics and IT, Ministry of Electronics and Information Technology (MeitY), Government of India, being executed by Digital India Corporation (formerly Media Lab Asia).

References

  1. 1.
    Gorin, A.L., Riccardi, G., Wright, J.H.: How may i help you? Speech Commun. 23(1–2), 113–127 (1997)CrossRefGoogle Scholar
  2. 2.
    Price, P.J.: Evaluation of spoken language systems: the ATIS domain. In: Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, 24–27 June 1990 (1990)Google Scholar
  3. 3.
    Tur, G.: Model adaptation for spoken language understanding. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings (ICASSP 2005), vol. 1, pp. I-41. IEEE (2005)Google Scholar
  4. 4.
    Tur, G., Hakkani-Tur, D., Heck, L.: What is Left to be Understood in ATIS? In: 2010 IEEE Spoken Language Technology Workshop (SLT), pp. 19–24. IEEE (2010)Google Scholar
  5. 5.
    Haffner, P., Tur, G., Wright, J.H.: Optimizing SVMs for complex call classification. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings (ICASSP 2003), vol. 1, p. I. IEEE (2003)Google Scholar
  6. 6.
    Hakkani-Tur, D., Tur, G., Chotimongkol, A.: Using syntactic and semantic graphs for call classification. In: Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing (2005)Google Scholar
  7. 7.
    Kim, J.K., Tur, G., Celikyilmaz, A., Cao, B., Wang, Y.Y.: intent detection using semantically enriched word embeddings. In: 2016 IEEE Spoken Language Technology Workshop (SLT), pp. 414–419. IEEE (2016)Google Scholar
  8. 8.
    Luan, Y., Watanabe, S., Harsham, B.: Efficient learning for spoken language understanding tasks with word embedding based pre-training. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)Google Scholar
  9. 9.
    McCallum, A., Freitag, D., Pereira, F.C.: Maximum entropy Markov models for information extraction and segmentation. ICML 17, 591–598 (2000)Google Scholar
  10. 10.
    Raymond, C., Riccardi, G.: Generative and discriminative algorithms for spoken language understanding. In: Eighth Annual Conference of the International Speech Communication Association (2007)Google Scholar
  11. 11.
    Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)Google Scholar
  12. 12.
    Ravuri, S.V., Stolcke, A.: Recurrent neural network and LSTM models for lexical utterance classification. In: INTERSPEECH, pp. 135–139 (2015)Google Scholar
  13. 13.
    Sarikaya, R., Hinton, G.E., Ramabhadran, B.: Deep belief nets for natural language call routing. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5680–5683. IEEE (2011)Google Scholar
  14. 14.
    Ravuri, S., Stoicke, A.: A comparative study of neural network models for lexical intent classification. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 368–374. IEEE (2015)Google Scholar
  15. 15.
    Firdaus, M., Bhatnagar, S., Ekbal, A., Bhattacharyya, P.: Intent detection for spoken language understanding using a deep ensemble model. In: Geng, X., Kang, B.-H. (eds.) PRICAI 2018. LNCS (LNAI), vol. 11012, pp. 629–642. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-97304-3_48CrossRefGoogle Scholar
  16. 16.
    Deoras, A., Sarikaya, R.: Deep belief network based semantic taggers for spoken language understanding. In: Interspeech, pp. 2713–2717 (2013)Google Scholar
  17. 17.
    Mesnil, G., He, X., Deng, L., Bengio, Y.: Investigation of recurrent neural network architectures and learning methods for spoken language understanding. In: Interspeech, pp. 3771–3775 (2013)Google Scholar
  18. 18.
    Yao, K., Peng, B., Zweig, G., Yu, D., Li, X., Gao, F.: Recurrent conditional random field for language understanding. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4077–4081. IEEE (2014)Google Scholar
  19. 19.
    Jeong, M., Lee, G.G.: Triangular-chain conditional random fields. IEEE Trans. Audio Speech Lang. Process. 16(7), 1287–1302 (2008)CrossRefGoogle Scholar
  20. 20.
    Liu, B., Lane, I.: Joint online spoken language understanding and language modeling with recurrent neural networks. In: 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, p. 22 (2016)Google Scholar
  21. 21.
    Liu, B., Lane, I.: Attention-based recurrent neural network models for joint intent detection and slot filling. In: INTERSPEECH, pp. 685–689 (2016)Google Scholar
  22. 22.
    Hakkani-Tur, D., et al.: Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM. In: INTERSPEECH, pp. 715–719 (2016)Google Scholar
  23. 23.
    Zhang, X., Wang, H.: A joint model of intent determination and slot filling for spoken language understanding. In: IJCAI, pp. 2993–2999 (2016)Google Scholar
  24. 24.
    Xu, P., Sarikaya, R.: Convolutional neural network based triangular CRF for joint intent detection and slot filling. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 78–83. IEEE (2013)Google Scholar
  25. 25.
    Guo, D., Tur, G., Yih, W.t., Zweig, G.: Joint semantic utterance classification and slot filling with recursive neural networks. In: 2014 IEEE Spoken Language Technology Workshop (SLT), pp. 554–559. IEEE (2014)Google Scholar
  26. 26.
    Heeman, P.A., Allen, J.F.: The Trains 93 Dialogues. Technical report, Rochester University NYDept of Computer Science (1995)Google Scholar
  27. 27.
    Tur, G., Hakkani-Tur, D., Heck, L., Parthasarathy, S.: Sentence simplification for spoken language understanding. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5628–5631. IEEE (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Mauajama Firdaus
    • 1
  • Shobhit Bhatnagar
    • 1
  • Asif Ekbal
    • 1
  • Pushpak Bhattacharyya
    • 1
  1. 1.Department of Computer Science and EngineeringIndian Institute of Technology PatnaPatnaIndia

Personalised recommendations