Skip to main content
Log in

CEA-Net: a co-interactive external attention network for joint intent detection and slot filling

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Intent detection and slot filling are two crucial tasks for spoken language understanding, and they are closely related. The accuracy of spoken language understanding depends strongly on the effectiveness of the interaction between intent and slot representations. However, previous studies have primarily focused on exploring the interaction of intent and slot representations within individual utterances while neglecting the relevance of different utterances. The paper proposes the CEA-Net, which utilizes co-interactive external attention as its core mechanism to effectively capture information from multiple utterances and perform information interaction between the two tasks. Experimental results demonstrate that the CEA-Net achieves competitive results on the ATIS and SNIPS benchmarks while reducing the number of parameters by about 44% compared with the previous best open-source approach. Furthermore, since our framework models the correlation of multiple utterances, it shows promising effectiveness and robustness even with limited training resources or datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The datasets analyzed during the current study are available in the GitHub repository, https://github.com/MiuLab/SlotGated-SLU/tree/master/data.

References

  1. Tur G, De Mori R (2011) Spoken language understanding: systems for extracting semantic information from speech. John Wiley and Sons, Hoboken

    Book  Google Scholar 

  2. Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2021) Deep learning-based text classification: a comprehensive review. ACM Comput Surv (CSUR) 54(3):1–40

    Article  Google Scholar 

  3. Nguyen N, Guo Y (2007) Comparisons of sequence labeling algorithms and extensions. In: Proceedings of the 24th International Conference on Machine Learning, pp 681–688

  4. Weld H, Huang X, Long S, Poon J, Han SC (2022) A survey of joint intent detection and slot filling models in natural language understanding. ACM Comput Surv 55(8):1–38

    Article  Google Scholar 

  5. Hakkani-Tür D, Celikyilmaz A, Chen Y-N, Gao J, Deng L, Wang Y-Y (2016) Multi-domain joint semantic frame parsing using bi-directional rnn-lstm. In: Seventeenth Annual Conference of the International Speech Communication Association, pp 715–719

  6. Liu B, Lane I (2016) Joint online spoken language understanding and language modeling with recurrent neural networks. In: Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp 22–30

  7. Qin L, Che W, Li Y, Wen H, Liu T (2019) A stack-propagation framework with token-level intent detection for spoken language understanding. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 2078–2087

  8. Haihong E, Niu P, Chen Z, Song M (2019) A novel bi-directional interrelated model for joint intent detection and slot filling. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 5467–5471

  9. Qin L, tianbao X, Che W, Liu T (2021) A survey on spoken language understanding: recent advances and new frontiers. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21), pp 4577–4584

  10. Zhou P, Huang Z, Liu F, Zou Y (2021) Pin: a novel parallel interactive network for spoken language understanding. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp 2950–2957. IEEE

  11. Qin L, Liu T, Che W, Kang B, Zhao S, Liu T (2021) A co-interactive transformer for joint slot filling and intent detection. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 8193–8197. IEEE

  12. Guo M-H, Liu Z-N, Mu T-J, Hu S-M (2022) Beyond self-attention: external attention using two linear layers for visual tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence

  13. Chelba C, Mahajan M, Acero A (2003) Speech utterance classification. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03), vol 1, p IEEE

  14. Xu P, Sarikaya R (2013) Convolutional neural network based triangular crf for joint intent detection and slot filling. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp 78–83. IEEE

  15. Xia C, Zhang C, Yan X, Chang Y, Philip SY (2018) Zero-shot user intent detection via capsule neural networks. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3090–3099

  16. Ravuri S, Stolcke A (2015) Recurrent neural network and lstm models for lexical utterance classification. In: Sixteenth Annual Conference of the International Speech Communication Association, pp 135–139

  17. Mesnil G, He X, Deng L, Bengio Y (2013) Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Interspeech, pp 3771–3775

  18. Vu NT, Gupta P, Adel H, Schütze H (2016) Bi-directional recurrent neural network with ranking loss for spoken language understanding. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 6060–6064. IEEE

  19. Mesnil G, Dauphin Y, Yao K, Bengio Y, Deng L, Hakkani-Tur D, He X, Heck L, Tur G, Yu D et al (2014) Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans Audio Speech Lang Process 23(3):530–539

    Article  Google Scholar 

  20. Li C, Zhou Y, Chao G, Chu D (2022) Understanding users’ requirements precisely: a double bi-lstm-crf joint model for detecting user’s intentions and slot tags. Neural Comput Appl 34(16):13639–13648

    Article  Google Scholar 

  21. Zhang X, Wang H (2016) A joint model of intent determination and slot filling for spoken language understanding. IJCAI 16:2993–2999

    Google Scholar 

  22. Liu B, Lane IR (2016) Attention-based recurrent neural network models for joint intent detection and slot filling. In: Fourteenth Annual Conference of the International Speech Communication Association, pp 685–689

  23. Ma Z, Sun B, Li S (2022) A two-stage selective fusion framework for joint intent detection and slot filling. IEEE Transactions on Neural Networks and Learning Systems

  24. Goo C-W, Gao G, Hsu Y-K, Huo C-L, Chen T-C, Hsu K-W, Chen Y-N (2018) Slot-gated modeling for joint slot filling and intent prediction. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp 753–757

  25. Li C, Li L, Qi J (2018) A self-attentive model with gate mechanism for spoken language understanding. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 3824–3833

  26. Wang Y, Shen Y, Jin H (2018) A bi-model based rnn semantic frame parsing model for intent detection and slot filling. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp 309–314

  27. Zhang C, Li Y, Du N, Fan W, Philip SY (2019) Joint slot filling and intent detection via capsule neural networks. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 5259–5267

  28. Liu Y, Meng F, Zhang J, Zhou J, Chen Y, Xu J (2019) Cm-net: A novel collaborative memory network for spoken language understanding. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 1051–1060

  29. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  30. Cui L, Zhang Y (2019) Hierarchically-refined label attention network for sequence labeling. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 4115–4128

  31. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Process Syst 30

  32. Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1746–1751. Association for Computational Linguistics, Doha, Qatar

  33. Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT, pp 260–270

  34. Coucke A, Saade A, Ball A, Bluche T, Caulier A, Leroy D, Doumouro C, Gisselbrecht T, Caltagirone F, Lavril T, Primet M, Dureau J (2018) Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv:abs/1805.10190

  35. Hemphill CT, Godfrey JJ, Doddington GR (1990) The atis spoken language systems pilot corpus. In: Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, Jun 24-27

  36. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  37. He K, Lei S, Yang Y, Jiang H, Wang Z (2020) Syntactic graph convolutional network for spoken language understanding. In: Proceedings of the 28th International Conference on Computational Linguistics, pp 2728–2738

  38. Zhang L, Ma D, Zhang X, Yan X, Wang H (2020) Graph lstm with context-gated mechanism for spoken language understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 9539–9546

  39. Ding Z, Yang Z, Lin H, Wang J (2021) Focus on interaction: a novel dynamic graph model for joint multiple intent detection and slot filling. In: IJCAI, pp 3801–3807

  40. Wei P, Zeng B, Liao W (2022) Joint intent detection and slot filling with wheel-graph attention networks. J Intell Fuzzy Syst 42(3):2409–2420

    Article  Google Scholar 

  41. Hao X, Wang L, Zhu H, Guo X (2023) Joint agricultural intent detection and slot filling based on enhanced heterogeneous attention mechanism. Comput Electron Agric 207:107756

    Article  Google Scholar 

Download references

Acknowledgements

We thank all reviewers for their constructive comments. This work is supported by the Natural Science Foundation of China (61663044), Opening Project of Key Laboratory of Xinjiang, China (2020D04047), the National Key R &D Program of China (2020AAA0107902), and the Excellent Doctoral Student Research Innovation Project of Xinjiang University (No. XJU2022BS077).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Huang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Result details of limited training resources

Appendix A: Result details of limited training resources

In this section, we report the results details of the experimental of limited training resources. In particular, Table 4 reports the results of different models when using partial training sets, and Table 5 reports the results of different models when setting different training epochs.

Table 4 Results of experiments with limited training datasets on ATIS (/%)
Table 5 Results of experiments with limited training epochs on ATIS (/%)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, D., Jiang, L., Yin, L. et al. CEA-Net: a co-interactive external attention network for joint intent detection and slot filling. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-09733-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00521-024-09733-8

Keywords

Navigation