Deep Learning in Conversational Language Understanding
- 4 Citations
- 7.9k Downloads
Abstract
Recent advancements in AI resulted in increased availability of conversational assistants that can help with tasks such as seeking times to schedule an event and creating a calendar entry at that time, finding a restaurant and booking a table there at a certain time. However, creating automated agents with human-level intelligence still remains one of the most challenging problems of AI. One key component of such systems is conversational language understanding, which is a holy grail area of research for decades, as it is not a clearly defined task but relies heavily on the AI application it is used for. Nevertheless, this chapter attempts to compile the recent deep learning based literature on such goal-oriented conversational language understanding studies, starting with a historical perspective, pre-deep learning era work, moving toward most recent advances in this field.
Keywords
Language Understanding Slot Fillers Air Travel Information System (ATIS) Finding Incentives Deep Belief Network (DBNs)References
- Allen, J. (1995). Natural language understanding, chapter 8. Benjamin/Cummings.Google Scholar
- Allen, J. F., Miller, B. W., Ringger, E. K., & Sikorski, T. (1996). A robust system for natural spoken dialogue. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 62–70.Google Scholar
- Andreas, J., Rohrbach, M., Darrell, T., & Klein, D. (2016). Learning to compose neural networks for question answering. In Proceedings of NAACL.Google Scholar
- Bapna, A., Tur, G., Hakkani-Tur, D., & Heck, L. (2017). Towards zero-shot frame semantic parsing for domain scaling. In Proceedings of the Interspeech.Google Scholar
- Bellegarda, J. R. (2004). Statistical language model adaptation: Review and perspectives. Speech Communication Special Issue on Adaptation Methods for Speech Recognition, 42, 93–108.Google Scholar
- Bonneau-Maynard, H., Rosset, S., Ayache, C., Kuhn, A., & Mostefa, D. (2005). Semantic annotation of the French MEDIA dialog corpus. In Proceedings of the Interspeech, Lisbon, Portugal.Google Scholar
- Bowman, S. R., Gauthier, J., Rastogi, A., Gupta, R., & Manning, C. D. (2016). A fast unified model for parsing and sentence understanding. In Proceedings of ACL.Google Scholar
- Celikyilmaz, A., Sarikaya, R., Hakkani, D., Liu, X., Ramesh, N., & Tur, G. (2016). A new pre-training method for training deep learning models with application to spoken language understanding. In Proceedings of The 17th Annual Meeting of the International Speech Communication Association (INTERSPEECH 2016).Google Scholar
- Chen, Y.-N., Hakkani-Tur, D., & He, X. (2015a). Zero-shot learning of intent embeddings for expansion by convolutional deep structured semantic models. In Proceedings of the IEEE ICASSP.Google Scholar
- Chen, Y.-N., Hakkani-Tür, D., Tur, G., Gao, J., & Deng, L. (2016). End-to-end memory networks with knowledge carryover for multi-turn spoken language understanding. In Proceedings of the Interspeech, San Francisco, CA.Google Scholar
- Chen, Y.-N., Wang, W. Y., Gershman, A., & Rudnicky, A. I. (2015b). Matrix factorization with knowledge graph propagation for unsupervised spoken language understanding. In Proceedings of the ACLIJCNLP.Google Scholar
- Chen, Y.-N., Wang, W. Y., & Rudnicky, A. I. (2013). Unsupervised induction and filling of semantic slots for spoken dialogue systems using frame-semantic parsing. In Proceedings of the IEEE ASRU.Google Scholar
- Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.Google Scholar
- Chu-Carroll, J., & Carpenter, B. (1999). Vector-based natural language call routing. Computational Linguistics, 25(3), 361–388.Google Scholar
- Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the ICML, Helsinki, Finland.Google Scholar
- Dahl, D. A., Bates, M., Brown, M., Fisher, W., Hunicke-Smith, K., Pallett, D., et al. (1994). Expanding the scope of the ATIS task: the ATIS-3 corpus. In Proceedings of the Human Language Technology Workshop. Morgan Kaufmann.Google Scholar
- Damnati, G., Bechet, F., & de Mori, R. (2007). Spoken language understanding strategies on the france telecom 3000 voice agency corpus. In Proceedings of the ICASSP, Honolulu, HI.Google Scholar
- Dauphin, Y., Tur, G., Hakkani-Tür, D., & Heck, L. (2014). Zero-shot learning and clustering for semantic utterance classification. In Proceedings of the ICLR.Google Scholar
- Deng, L., & Li, X. (2013). Machine learning paradigms for speech recognition: An overview. IEEE Transactions on Audio, Speech, and Language Processing, 21(5), 1060–1089.CrossRefGoogle Scholar
- Deng, L., & O’Shaughnessy, D. (2003). Speech processing: A dynamic and optimization-oriented approach. Marcel Dekker, New York: Publisher.Google Scholar
- Deng, L., & Yu, D. (2011). Deep convex nets: A scalable architecture for speech pattern classification. In Proceedings of the Interspeech, Florence, Italy.Google Scholar
- Deoras, A., & Sarikaya, R. (2013). Deep belief network based semantic taggers for spoken language understanding. In Proceedings of the IEEE Interspeech, Lyon, France.Google Scholar
- Dupont, Y., Dinarelli, M., & Tellier, I. (2017). Label-dependencies aware recurrent neural networks. arXiv preprint arXiv:1706.01740.
- Elman, J. L. (1990). Finding structure in time. Cognitive science, 14(2), 179–211.CrossRefGoogle Scholar
- Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.MathSciNetCrossRefGoogle Scholar
- Gorin, A. L., Abella, A., Alonso, T., Riccardi, G., & Wright, J. H. (2002). Automated natural spoken dialog. IEEE Computer Magazine, 35(4), 51–56.CrossRefGoogle Scholar
- Gorin, A. L., Riccardi, G., & Wright, J. H. (1997). How may I help you? Speech Communication, 23, 113–127.CrossRefGoogle Scholar
- Guo, D., Tur, G., Yih, W.-t., & Zweig, G. (2014). Joint semantic utterance classification and slot filling with recursive neural networks. In In Proceedings of the IEEE SLT Workshop.Google Scholar
- Gupta, N., Tur, G., Hakkani-Tür, D., Bangalore, S., Riccardi, G., & Rahim, M. (2006). The AT&T spoken language understanding system. IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 213–222.CrossRefGoogle Scholar
- Hahn, S., Dinarelli, M., Raymond, C., Lefevre, F., Lehnen, P., Mori, R. D., et al. (2011). Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Transactions on Audio, Speech, and Language Processing, 19(6), 1569–1583.CrossRefGoogle Scholar
- Hakkani-Tür, D., Tur, G., Celikyilmaz, A., Chen, Y.-N., Gao, J., Deng, L., & Wang, Y.-Y. (2016). Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM. In Proceedings of the Interspeech, San Francisco, CA.Google Scholar
- He, X., & Deng, L. (2011). Speech recognition, machine translation, and speech translation a unified discriminative learning paradigm. In IEEE Signal Processing Magazine, 28(5), 126–133.MathSciNetCrossRefGoogle Scholar
- He, X. & Deng, L. (2013). Speech-centric information processing: An optimization-oriented approach. In Proceedings of the IEEE, 101(5), 1116–1135.CrossRefGoogle Scholar
- Hemphill, C. T., Godfrey, J. J., & Doddington, G. R. (1990). The ATIS spoken language systems pilot corpus. In Proceedings of the Workshop on Speech and Natural Language, HLT’90, pp. 96–101, Morristown, NJ, USA. Association for Computational Linguistics.Google Scholar
- Hinton, G., Deng, L., Yu, D., Dahl, G., Rahman Mohamed, A., Jaitly, N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6), 82–97.CrossRefGoogle Scholar
- Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Advances in Neural Computation, 18(7), 1527–1554.MathSciNetCrossRefGoogle Scholar
- Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735–1780.CrossRefGoogle Scholar
- Hori, C., Hori, T., Watanabe, S., & Hershey, J. R. (2014). Context sensitive spoken language understanding using role dependent lstm layers. In Proceedings of the Machine Learning for SLU Interaction NIPS 2015 Workshop.Google Scholar
- Huang, X., & Deng, L. (2010). An overview of modern speech recognition. In Handbook of Natural Language Processing, Second Edition, Chapter 15.Google Scholar
- Jaech, A., Heck, L., & Ostendorf, M. (2016). Domain adaptation of recurrent neural networks for natural language understanding. In Proceedings of the Interspeech, San Francisco, CA.Google Scholar
- Jordan, M. (1997). Serial order: A parallel distributed processing approach. Technical Report 8604, University of California San Diego, Institute of Computer Science.CrossRefGoogle Scholar
- Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. In Proceedings of the ACL, Baltimore, MD.Google Scholar
- Kim, Y. (2014). Convolutional neural networks for sentence classification. In Proceedings of the EMNLP, Doha, Qatar.Google Scholar
- Kim, Y.-B., Stratos, K., Sarikaya, R., & Jeong, M. (2015). New transfer learning techniques for disparate label sets. In Proceedings of the ACL-IJCNLP.Google Scholar
- Kuhn, R., & Mori, R. D. (1995). The application of semantic classification trees to natural language understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 449–460.CrossRefGoogle Scholar
- Kurata, G., Xiang, B., Zhou, B., & Yu, M. (2016a). Leveraging sentence-level information with encoder LSTM for semantic slot filling. In Proceedings of the EMNLP, Austin, TX.Google Scholar
- Kurata, G., Xiang, B., Zhou, B., & Yu, M. (2016b). Leveraging sentence-level information with encoder lstm for semantic slot filling. arXiv preprint arXiv:1601.01530.
- Lee, J. Y., & Dernoncourt, F. (2016). Sequential short-text classification with recurrent and convolutional neural networks. In Proceedings of the NAACL.Google Scholar
- Li, J., Deng, L., Gong, Y., & Haeb-Umbach, R. (2014). An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4), 745–777.CrossRefGoogle Scholar
- Liu, B., & Lane, I. (2015). Recurrent neural network structured output prediction for spoken language understanding. In Proc: NIPS Workshop on Machine Learning for Spoken Language Understanding and Interactions.Google Scholar
- Liu, B., & Lane, I. (2016). Attention-based recurrent neural network models for joint intent detection and slot filling. In Proceedings of the Interspeech, San Francisco, CA.Google Scholar
- Mesnil, G., Dauphin, Y., Yao, K., Bengio, Y., Deng, L., Hakkani-Tür, D., et al. (2015). Using recurrent neural networks for slot filling in spoken language understanding. IEEE Transactions on Audio, Speech, and Language Processing, 23(3), 530–539.CrossRefGoogle Scholar
- Mesnil, G., He, X., Deng, L., & Bengio, Y. (2013). Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In Proceedings of the Interspeech, Lyon, France.Google Scholar
- Natarajan, P., Prasad, R., Suhm, B., & McCarthy, D. (2002). Speech enabled natural language call routing: BBN call director. In Proceedings of the ICSLP, Denver, CO.Google Scholar
- Pieraccini, R., Tzoukermann, E., Gorelov, Z., Gauvain, J.-L., Levin, E., Lee, C.-H., et al. (1992). A speech understanding system based on statistical representation of semantics. In Proceedings of the ICASSP, San Francisco, CA.Google Scholar
- Price, P. J. (1990). Evaluation of spoken language systems: The ATIS domain. In Proceedings of the DARPA Workshop on Speech and Natural Language, Hidden Valley, PA.Google Scholar
- Ravuri, S., & Stolcke, A. (2015). Recurrent neural network and lstm models for lexical utterance classification. In Proceedings of the Interspeech.Google Scholar
- Raymond, C., & Riccardi, G. (2007). Generative and discriminative algorithms for spoken language understanding. In Proceedings of the Interspeech, Antwerp, Belgium.Google Scholar
- Sarikaya, R., Hinton, G. E., & Deoras, A. (2014). Application of deep belief networks for natural language understanding. IEEE Transactions on Audio, Speech, and Language Processing, 22(4).CrossRefGoogle Scholar
- Sarikaya, R., Hinton, G. E., & Ramabhadran, B. (2011). Deep belief nets for natural language call-routing. In Proceedings of the ICASSP, Prague, Czech Republic.Google Scholar
- Seneff, S. (1992). TINA: A natural language system for spoken language applications. Computational Linguistics, 18(1), 61–86.Google Scholar
- Simonnet, E., Camelin, N., Deleglise, P., & Esteve, Y. (2015). Exploring the use of attention-based recurrent neural networks for spoken language understanding. In Proceedings of the NIPS Workshop on Machine Learning for Spoken Language Understanding and Interaction.Google Scholar
- Socher, R., Lin, C. C., Ng, A. Y., & Manning, C. D. (2011). Parsing natural scenes and natural language with recursive neural networks. In Proceedings of ICML.Google Scholar
- Sordoni, A., Bengio, Y., Vahabi, H., Lioma, C., Simonsen, J. G., & Nie, J.-Y. (2015). A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In Proceedings of the ACM CIKM.Google Scholar
- Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Advances in neural information processing systems 27, chapter Sequence to sequence learning with neural networks.Google Scholar
- Tafforeau, J., Bechet, F., Artiere1, T., & Favre, B. (2016). Joint syntactic and semantic analysis with a multitask deep learning framework for spoken language understanding. In Proceedings of the Interspeech, San Francisco, CA.Google Scholar
- Tur, G., & Deng, L. (2011). Intent determination and spoken utterance classification, Chapter 4 in book: Spoken language understanding. New York, NY: Wiley.Google Scholar
- Tur, G., Hakkani-Tür, D., & Heck, L. (2010). What is left to be understood in ATIS? In Proceedings of the IEEE SLT Workshop, Berkeley, CA.Google Scholar
- Tur, G., & Mori, R. D. (Eds.). (2011). Spoken language understanding: Systems for extracting semantic information from speech. New York, NY: Wiley.zbMATHGoogle Scholar
- Vinyals, O., Fortunato, M., & Jaitly, N. (2015). Pointer networks. In Proceedings of the NIPS.Google Scholar
- Vinyals, O., & Le, Q. V. (2015). A neural conversational model. In Proceedings of the ICML.Google Scholar
- Vu, N. T., Gupta, P., Adel, H., & Schütze, H. (2016). Bi-directional recurrent neural network with ranking loss for spoken language understanding. In Proceedings of the IEEE ICASSP, Shanghai, China.Google Scholar
- Vukotic, V., Raymond, C., & Gravier, G. (2016). A step beyond local observations with a dialog aware bidirectional gru network for spoken language understanding. In Proceedings of the Interspeech, San Francisco, CA.Google Scholar
- Walker, M., Aberdeen, J., Boland, J., Bratt, E., Garofolo, J., Hirschman, L., et al. (2001). DARPA communicator dialog travel planning systems: The June 2000 data collection. In Proceedings of the Eurospeech Conference.Google Scholar
- Wang, Y., Deng, L., & Acero, A. (2011). Semantic frame based spoken language understanding, Chapter 3. New York, NY: Wiley.Google Scholar
- Ward, W., & Issar, S. (1994). Recent improvements in the CMU spoken language understanding system. In Proceedings of the ARPA HLT Workshop, pages 213–216.Google Scholar
- Weizenbaum, J. (1966). Eliza—A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45.CrossRefGoogle Scholar
- Woods, W. A. (1983). Language processing for speech understanding. Prentice-Hall International, Englewood Cliffs, NJ: In Computer Speech Processing.Google Scholar
- Xu, P., & Sarikaya, R. (2013). Convolutional neural network based triangular crf for joint intent detection and slot filling. In Proceedings of the IEEE ASRU.Google Scholar
- Yao, K., Peng, B., Zhang, Y., Yu, D., Zweig, G., & Shi, Y. (2014). Spoken language understanding using long short-term memory neural networks. In Proceedings of the IEEE SLT Workshop, South Lake Tahoe, CA. IEEE.Google Scholar
- Yao, K., Zweig, G., Hwang, M.-Y., Shi, Y., & Yu, D. (2013). Recurrent neural networks for language understanding. In Proceedings of the Interspeech, Lyon, France.Google Scholar
- Zhai, F., Potdar, S., Xiang, B., & Zhou, B. (2017). Neural models for sequence chunking. In Proceedings of the AAAI.Google Scholar
- Zhang, X., & Wang, H. (2016). A joint model of intent determination and slot filling for spoken language understanding. In Proceedings of the IJCAI.Google Scholar
- Zhu, S., & Yu, K. (2016a). Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. In submission.Google Scholar
- Zhu, S., & Yu, K. (2016b). Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. arXiv preprint arXiv:1608.02097.