Deep Learning in Spoken and Text-Based Dialog Systems

Celikyilmaz, Asli; Deng, Li; Hakkani-Tür, Dilek

doi:10.1007/978-981-10-5209-5_3

Deep Learning in Spoken and Text-Based Dialog Systems

Asli Celikyilmaz³,
Li Deng⁴ &
Dilek Hakkani-Tür⁵

Chapter
First Online: 24 May 2018

14k Accesses
5 Citations
2 Altmetric

Abstract

Last few decades have witnessed substantial breakthroughs on several areas of speech and language understanding research, specifically for building human to machine conversational dialog systems. Dialog systems, also known as interactive conversational agents, virtual agents or sometimes chatbots, are useful in a wide range of applications ranging from technical support services to language learning tools and entertainment. Recent success in deep neural networks has spurred the research in building data-driven dialog models. In this chapter, we present state-of-the-art neural network architectures and details on each of the components of building a successful dialog system using deep learning. Task-oriented dialog systems would be the focus of this chapter, and later different networks are provided for building open-ended non-task-oriented dialog systems. Furthermore, to facilitate research in this area, we have a survey of publicly available datasets and software tools suitable for data-driven learning of dialog systems. Finally, appropriate choice of evaluation metrics are discussed for the learning objective.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://www.microsoft.com/en-us/mobile/experiences/cortana/.
2.
http://www.apple.com/ios/siri/.
3.
https://developer.amazon.com/alexa.
4.
https://madeby.google.com/home.
5.
https://developers.facebook.com/blog/post/2016/04/12/bots-for-messenger/.
6.
We refer the reader to the “Deep Learning in Conversational Language Understanding” chapter in this book for more details in discussing this issue.
7.
https://www.microsoft.com/en-us/research/event/dialog-state-tracking-challenge/.
8.
http://camdial.org/~mh521/dstc/.
9.
http://www.colips.org/workshop/dstc4/.
10.
http://workshop.colips.org/dstc5/.
11.
https://datasets.maluuba.com/Frames.
12.
https://github.com/facebookresearch/ParlAI.
13.
https://github.com/rkadlec/ubuntu-ranking-dataset-creator.
14.
https://github.com/plison/opendial.
15.
https://github.com/facebookresearch/ParlAI.
16.
https://github.com/UFAL-DSG/alex.
17.
http://ufal.mff.cuni.cz/.
18.
https://github.com/cuayahuitl/SimpleDS.
19.
https://github.com/gunthercox/ChatterBot.
20.
https://github.com/pender/chatbot-rnn.
21.
http://meta-guide.com/software-meta-guide/100-best-github-chatbot.

References

Asri, L. E., He, J., & Suleman, K. (2016). A sequence-to-sequence model for user simulation in spoken dialogue systems. Interspeech.
Google Scholar
Aust, H., Oerder, M., Seide, F., & Steinbiss, V. (1995). The philips automatic train timetable information system. Speech Communication, 17, 249–262.
Article Google Scholar
Banchs, R. E., & Li., H. (2012). Iris: A chat-oriented dialogue system based on the vector space model. ACL.
Google Scholar
Banerjee, S., & Lavie, A. (2005). Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization.
Google Scholar
Bapna, A., Tur, G., Hakkani-Tur, D., & Heck, L. (2017). Improving frame semantic parsing with hierarchical dialogue encoders.
Google Scholar
Bateman, J., & Henschel, R. (1999). From full generation to near-templates without losing generality. In KI’99 Workshop, “May I Speak Freely?”.
Google Scholar
Blundell, C., Cornebise, J., Kavukcuoglu, K., & Wierstra, D. (2015). Weight uncertainty in neural networks. ICML.
Google Scholar
Bordes, A., Boureau, Y.-L., & Weston, J. (2017). Learning end-to-end goal-oriented dialog. In ICLR 2017
Google Scholar
Busemann, S., & Horacek, H. (1998). A flexible shallow approach to text generation. In International Natural Language Generation Workshop, Niagara-on-the-Lake, Canada
Google Scholar
Celikyilmaz, A., Sarikaya, R., Hakkani-Tur, D., Liu, X., Ramesh, N., & Tur, G. (2016). A new pre-training method for training deep learning models with application to spoken language understanding. In Proceedings of Interspeech (pp. 3255–3259).
Google Scholar
Chen, Y.-N., Hakkani-Tür, D., Tur, G., Gao, J., & Deng, L. (2016). End-to-end memory networks with knowledge carryover for multi-turn spoken language understanding. In Proceedings of The 17th Annual Meeting of the International Speech Communication Association (INTERSPEECH), San Francisco, CA. ISCA.
Google Scholar
Crook, P., & Marin, A. (2017). Sequence to sequence modeling for user simulation in dialog systems. Interspeech.
Google Scholar
Cuayahuitl, H. (2016). Simpleds: A simple deep reinforcement learning dialogue system. In International Workshop on Spoken Dialogue Systems (IWSDS).
Google Scholar
Cuayahuitl, H., Yu, S., Williamson, A., & Carse, J. (2016). Deep reinforcement learning for multi-domain dialogue systems. arXiv:1611.08675.
Dale, R., & Reiter, E. (2000). Building natural language generation systems. Cambridge, UK: Cambridge University Press.
Google Scholar
Deng, L. (2016). Deep learning from speech recognition to language and multi-modal processing. In APSIPA Transactions on Signal and Information Processing. Cambridge University Press.
Google Scholar
Deng, L., & Yu, D. (2015). Deep learning: Methods and applications. NOW Publishers.
Article MathSciNet Google Scholar
Deng, L., & Li, X. (2013). Machine learning paradigms for speech recognition: An overview. IEEE Transactions on Audio, Speech, and Language Processing, 21(5), 1060–1089.
Article Google Scholar
Dhingra, B., Li, L., Li, X., Gao, J., Chen, Y.-N., Ahmed, F., & Deng, L. (2016a). End-to-end reinforcement learning of dialogue agents for information access. arXiv:1609.00777.
Dhingra, B., Li, L., Li, X., Gao, J., Chen, Y.-N., Ahmed, F., & Deng, L. (2016b). Towards end-to-end reinforcement learning of dialogue agents for information access. ACL.
Google Scholar
Dodge, J., Gane, A., Zhang, X., Bordes, A., Chopra, S., Miller, A., Szlam, A., & Weston, J. (2015). Evaluating prerequisite qualities for learning end-to-end dialog systems. arXiv:1511.06931.
Elhadad, M., & Robin, J. (1996). An overview of surge: A reusable comprehensive syntactic realization component. Technical Report 96-03, Department of Mathematics and Computer Science, Ben Gurion University, Beer Sheva, Israel.
Google Scholar
Fatemi, M., Asri, L. E., Schulz, H., He, J., & Suleman, K. (2016a). Policy networks with two-stage training for dialogue systems. arXiv:1606.03152.
Fatemi, M., Asri, L. E., Schulz, H., He, J., & Suleman, K. (2016b). Policy networks with two-stage training for dialogue systems. arXiv:1606.03152.
Forgues, G., Pineau, J., Larcheveque, J.-M., & Tremblay, R. (2014). Bootstrapping dialog systems with word embeddings. NIPS ML-NLP Workshop.
Google Scholar
Gai, M., Mrki, N., Su, P.-H., Vandyke, D., Wen, T.-H., & Young, S. (2015). Policy committee for adaptation in multi-domain spoken dialogue sytems. ASRU.
Google Scholar
Gai, M., Mrki, N., Rojas-Barahona, L. M., Su, P.-H., Ultes, S., Vandyke, D., et al. (2016). Dialogue manager domain adaptation using Gaussian process reinforcement learning. Computer Speech and Language, 45, 552–569.
Google Scholar
Gasic, M., Jurcicek, F., Keizer, S., Mairesse, F., Thomson, B., Yu, K., & Young, S. (2010). Gaussian processes for fast policy optimisation of POMDP-based dialogue managers. In SIGDIAL.
Google Scholar
Gasic, M., Mrksic, N., Su, P.-H., Vandyke, D., & Wen, T.-H. (2015). Multi-agent learning in multi-domain spoken dialogue systems. NIPS workshop on Spoken Language Understanding and Interaction.
Google Scholar
Ge, W., & Xu, B. (2016). Dialogue management based on multi-domain corpus. In Special Interest Group on Discourse and Dialog.
Google Scholar
Georgila, K., Henderson, J., & Lemon, O. (2005). Learning user simulations for information state update dialogue systems. In 9th European Conference on Speech Communication and Technology (INTERSPEECH—EUROSPEECH).
Google Scholar
Georgila, K., Henderson, J., & Lemon, O. (2006). User simulation for spoken dialogue systems: Learning and evaluation. In INTERSPEECH—EUROSPEECH.
Google Scholar
Goller, C., & Kchler, A. (1996). Learning task-dependent distributed representations by backpropagation through structure. IEEE.
Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In NIPS.
Google Scholar
Gorin, A. L., Riccardi, G., & Wright, J. H. (1997). How may i help you? Speech Communication, 23, 113–127.
Article Google Scholar
Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks, 18, 602–610.
Article Google Scholar
Hakkani-Tür, D., Tur, G., Celikyilmaz, A., Chen, Y.-N., Gao, J., Deng, L., & Wang, Y.-Y. (2016). Multi-domain joint semantic frame parsing using bi-directional rnn-lstm. In Proceedings of Interspeech (pp. 715–719).
Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Berlin: Springer.
Book Google Scholar
He, X., & Deng, L. (2011). Speech recognition, machine translation, and speech translation a unified discriminative learning paradigm. In IEEE Signal Processing Magazine.
Google Scholar
He, X., & Deng, L. (2013). Speech-centric information processing: An optimization-oriented approach. In IEEE.
Article Google Scholar
He, J., Chen, J., He, X., Gao, J., Li, L., Deng, L., & Ostendorf, M. (2016). Deep reinforcement learning with a natural language action space. ACL.
Google Scholar
Hemphill, C. T., Godfrey, J. J., & Doddington, G. R. (1990). The ATIS spoken language systems pilot corpus. In DARPA Speech and Natural Language Workshop.
Google Scholar
Henderson, M., Thomson, B., & Williams, J. D. (2014). The third dialog state tracking challenge. In 2014 IEEE, Spoken Language Technology Workshop (SLT) (pp. 324–329). IEEE.
Google Scholar
Henderson, M., Thomson, B., & Young, S. (2013). Deep neural network approach for the dialog state tracking challenge. In Proceedings of the SIGDIAL 2013 Conference (pp. 467–471).
Google Scholar
Higashinaka, R., Imamura, K., Meguro, T., Miyazaki, C., Kobayashi, N., Sugiyama, H., et al. (2014). Towards an open-domain conversational system fully based on natural language processing. COLING.
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G., Rahman Mohamed, A., Jaitly, N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6), 82–97.
Article Google Scholar
Huang, X., & Deng, L. (2010). An overview of modern speech recognition. In Handbook of Natural Language Processing (2nd ed., Chapter 15).
Google Scholar
Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., & Heck, L. (2013). Learning deep structured semantic models for web search using click-through data. In ACM International Conference on Information and Knowledge Management (CIKM).
Google Scholar
Jaech, A., Heck, L., & Ostendorf, M. (2016). Domain adaptation of recurrent neural networks for natural language understanding.
Google Scholar
Kannan, A., & Vinyals, O. (2016). Adversarial evaluation of dialog models. In Workshop on Adversarial Training, NIPS 2016, Barcelona, Spain.
Google Scholar
Kim, Y.-B., Stratos, K., & Kim, D. (2017a). Adversarial adaptation of synthetic or stale data. ACL.
Google Scholar
Kim, Y.-B., Stratos, K., & Kim, D. (2017b). Domain attention with an ensemble of experts. ACL.
Google Scholar
Kim, Y.-B., Stratos, K., & Sarikaya, R. (2016a). Domainless adaptation by constrained decoding on a schema lattice. COLING.
Google Scholar
Kim, Y.-B., Stratos, K., & Sarikaya, R. (2016b). Frustratingly easy neural domain adaptation. COLING.
Google Scholar
Kumar, A., Irsoy, O., Su, J., Bradbury, J., English, R., Pierce, B., et al. (2015). Ask me anything: Dynamic memory networks for natural language processing. In Neural Information Processing Systems (NIPS).
Google Scholar
Kurata, G., Xiang, B., Zhou, B., & Yu, M. (2016). Leveraging sentence level information with encoder lstm for natural language understanding. arXiv:1601.01530.
Langkilde, I., & Knight, K. (1998). Generation that exploits corpus-based statistical knowledge. ACL.
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. IEEE, 86, 2278–2324.
Article Google Scholar
Lemon, O., & Rieserr, V. (2009). Reinforcement learning for adaptive dialogue systems—tutorial. EACL.
Google Scholar
Li, L., Balakrishnan, S., & Williams, J. (2009). Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection. InterSpeech.
Google Scholar
Li, J., Galley, M., Brockett, C., Gao, J., & Dolan, B. (2016a). A diversity-promoting objective function for neural conversation models. NAACL.
Google Scholar
Li, J., Galley, M., Brockett, C., Spithourakis, G. P., Gao, J., & Dolan, B. (2016b). A persona based neural conversational model. ACL.
Google Scholar
Li, J., Monroe, W., Shu, T., Jean, S., Ritter, A., & Jurafsky, D. (2017). Adversarial learning for neural dialogue generation. arXiv:1701.06547.
Li, J., Deng, L., Gong, Y., & Haeb-Umbach, R. (2014). An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4), 745–777.
Article Google Scholar
Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out: ACL-04 Workshop.
Google Scholar
Lipton, Z. C., Li, X., Gao, J., Li, L., Ahmed, F., & Deng, L. (2016). Efficient dialogue policy learning with bbq-networks. arXiv.org.
Lison, P. (2013). Structured probabilistic modelling for dialogue management. Department of Informatics Faculty of Mathematics and Natural Sciences University of Osloe.
Google Scholar
Liu, B., & Lane, I. (2016a). Attention-based recurrent neural network models for joint intent detection and slot filling. Interspeech.
Google Scholar
Liu, B., & Lane, I. (2016b). Attention-based recurrent neural network models for joint intent detection and slot filling. In SigDial.
Google Scholar
Liu, C.-W., Lowe, R., Serban, I. V., Noseworthy, M., Charlin, L., & Pineau, J. (2016). How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. EMNLP.
Google Scholar
Lowe, R., Pow, N., Serban, I. V., and Pineau, J. (2015b). The ubuntu dialogue corpus: A large dataset for research in unstructure multi-turn dialogue systems. In SIGDIAL 2015.
Google Scholar
Lowe, R., Pow, N., Serban, I. V., Charlin, L., and Pineau, J. (2015a). Incorporating unstructured textual knowledge sources into neural dialogue systems. In Neural Information Processing Systems Workshop on Machine Learning for Spoken Language Understanding.
Google Scholar
Mairesse, F., & Young, S. (2014). Stochastic language generation in dialogue using factored language models. Computer Linguistics.
Article Google Scholar
Mairesse, F. and Walker, M. A. (2011). Controlling user perceptions of linguistic style: Trainable generation of personality traits. Computer Linguistics.
Article Google Scholar
Mesnil, G., Dauphin, Y., Yao, K., Bengio, Y., Deng, L., Hakkani-Tur, D., et al. (2015). Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(3), 530–539.
Article Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).
Google Scholar
Mizil, C. D. N. & Lee, L. (2011). Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, ACL 2011.
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with deep reinforcement learning. NIPS Deep Learning Workshop.
Google Scholar
Mrkšić, N., Séaghdha, D. Ó., Wen, T.-H., Thomson, B., & Young, S. (2016). Neural belief tracker: Data-driven dialogue state tracking. arXiv:1606.03777.
Oh, A. H., & Rudnicky, A. I. (2000). Stochastic language generation for spoken dialogue systems. ANLP/NAACL Workshop on Conversational Systems.
Google Scholar
Papineni, K., Roukos, S., Ward, T., & Zhu, W. (2002). Bleu: A method for automatic evaluation of machine translation. In 40th annual meeting on Association for Computational Linguistics (ACL).
Google Scholar
Passonneau, R. J., Epstein, S. L., Ligorio, T., & Gordon, J. (2011). Embedded wizardry. In SIGDIAL 2011 Conference.
Google Scholar
Peng, B., Li, X., Li, L., Gao, J., Celikyilmaz, A., Lee, S., & Wong, K.-F. (2017). Composite task-completion dialogue system via hierarchical deep reinforcement learning. arxiv:1704.03084v2.
Pietquin, O., Geist, M., & Chandramohan, S. (2011a). Sample efficient on-line learning of optimal dialogue policies with kalman temporal differences. In IJCAI 2011, Barcelona, Spain.
Google Scholar
Pietquin, O., Geist, M., Chandramohan, S., & FrezzaBuet, H. (2011b). Sample-efficient batch reinforcement learning for dialogue management optimization. ACM Transactions on Speech and Language Processing.
Article Google Scholar
Ravuri, S., & Stolcke, A. (2015). Recurrent neural network and LSTM models for lexical utterance classification. In Sixteenth Annual Conference of the International Speech Communication Association.
Google Scholar
Ritter, A., Cherry, C., & Dolan., W. B. (2011). Data-driven response generation in social media. Empirical Methods in Natural Language Processing.
Google Scholar
Sarikaya, R., Hinton, G. E., & Ramabhadran, B. (2011). Deep belief nets for natural language call-routing. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5680–5683). IEEE.
Google Scholar
Sarikaya, R., Hinton, G. E., & Deoras, A. (2014). Application of deep belief networks for natural language understanding. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4), 778–784.
Article Google Scholar
Schatzmann, J., Weilhammer, K., & Matt Stutle, S. Y. (2006). A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. The Knowledge Engineering Review.
Article Google Scholar
Serban, I., Klinger, T., Tesauro, G., Talamadupula, K., Zhou, B., Bengio, Y., & Courville, A. (2016a). Multiresolution recurrent neural networks: An application to dialogue response generation. arXiv:1606.00776v2
Serban, I., Sordoni, A., & Bengio, Y. (2017). A hierarchical latent variable encoder-decoder model for generating dialogues. AAAI.
Google Scholar
Serban, I. V., Sordoni, A., Bengio, Y., Courville, A., & Pineau, J. (2015). Building end-to-end dialogue systems using generative hierarchical neural network models. AAAI.
Google Scholar
Serban, I. V., Sordoni, A., Bengio, Y., Courville, A., & Pineau, J. (2016b). Building end-to-end dialogue systems using generative hierarchical neural networks. AAAI.
Google Scholar
Shah, P., Hakkani-Tur, D., & Heck, L. (2016). Interactive reinforcement learning for task-oriented dialogue management. SIGDIAL.
Google Scholar
Shang, L., Lu, Z., & Li, H. (2015). Neural responding machine for short text conversation. ACL-IJCNLP.
Google Scholar
Simonnet, E., Camelin, N., Deléglise, P., & Estève, Y. (2015). Exploring the use of attention-based recurrent neural networks for spoken language understanding. In Machine Learning for Spoken Language Understanding and Interaction NIPS 2015 Workshop (SLUNIPS 2015).
Google Scholar
Simpson, A. & Eraser, N. M. (1993). Black box and glass box evaluation of the sundial system. In Third European Conference on Speech Communication and Technology.
Google Scholar
Singh, S. P., Kearns, M. J., Litman, D. J., & Walker, M. A. (2016). Reinforcement learning for spoken dialogue systems. NIPS.
Google Scholar
Sordoni, A., Galley, M., Auli, M., Brockett, C., Ji, Y., Mitchell, M., et al. (2015a). A neural network approach to context-sensitive generation of conversational responses. In North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2015).
Google Scholar
Sordoni, A., Galley, M., Auli, M., Brockett, C., Ji, Y., Mitchell, M., Nie, J.-Y., et al. (2015b). A neural network approach to context-sensitive generation of conversational responses. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 196–205), Denver, Colorado. Association for Computational Linguistics.
Google Scholar
Stent, A. (1999). Content planning and generation in continuous-speech spoken dialog systems. In KI’99 workshop, “May I Speak Freely?”.
Google Scholar
Stent, A., Prasad, R., & Walker, M. (2004). Trainable sentence planning for complex information presentation in spoken dialog systems. ACL.
Google Scholar
Su, P.-H., Gasic, M., Mrksic, N., Rojas-Barahona, L., Ultes, S., Vandyke, D., et al. (2016). On-line active reward learning for policy optimisation in spoken dialogue systems. arXiv:1605.07669.
Sukhbaatar, S., Weston, J., Fergus, R., et al. (2015). End-to-end memory networks. In Advances in neural information processing systems (pp. 2440–2448).
Google Scholar
Sutton, R. S., & Singh, S. P. (1999). Between mdps and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181–211.
Article MathSciNet Google Scholar
Tafforeau, J., Bechet, F., Artières, T., & Favre, B. (2016). Joint syntactic and semantic analysis with a multitask deep learning framework for spoken language understanding. In Interspeech (pp. 3260–3264).
Google Scholar
Tao, C., Mou, L., Zhao, D., & Yan, R. (2017). Ruber: An unsupervised method for automatic evaluation of open-domain dialog systems. ArXiv2017.
Google Scholar
Thomson, B., & Young, S. (2010). Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems. Computer Speech and Language, 24(4), 562–588.
Article Google Scholar
Tur, G., Deng, L., Hakkani-Tür, D., & He, X. (2012). Towards deeper understanding: Deep convex networks for semantic utterance classification. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5045–5048). IEEE.
Google Scholar
Tur, G., & Deng, L. (2011). Intent determination and spoken utterance classification, Chapter 4 in Book: Spoken language understanding. New York, NY: Wiley.
Google Scholar
Tur, G., & De Mori, R. (2011). Spoken language understanding: Systems for extracting semantic information from speech. New York: Wiley.
Book Google Scholar
Vinyals, O., & Le, Q. (2015). A neural conversational model. arXiv:1506.05869.
Walker, M., Stent, A., Mairesse, F., & Prasad, R. (2007). Individual and domain adaptation in sentence planning for dialogue. Journal of Artificial Intelligence Research.
Google Scholar
Wang, Z., Stylianou, Y., Wen, T.-H., Su, P.-H., & Young, S. (2015). Learning domain-independent dialogue policies via ontology parameterisation. In SIGDAIL.
Google Scholar
Wen, T.-H., Gasic, M., Mrksic, N., Rojas-Barahona, L. M., Pei-Hao, P., Ultes, S., et al. (2016a). A network-based end-to-end trainable task-oriented dialogue system. arXiv.
Google Scholar
Wen, T.-H., Gasic, M., Mrksic, N., Rojas-Barahona, L. M., Su, P.-H., Ultes, S., et al. (2016b). A network-based end-to-end trainable task-oriented dialogue system. arXiv:1604.04562.
Wen, T.-H., Gasic, M., Mrksic, N., Su, P.-H., Vandyke, D., & Young, S. (2015a). Semantically conditioned LSTM-based natural language generation for spoken dialogue systems. EMNLP.
Google Scholar
Wen, T.-H., Gasic, M., Mrksic, N., Su, P.-H., Vandyke, D., & Young, S. (2015b). Semantically conditioned LSTM-based natural language generation for spoken dialogue systems. arXiv:1508.01745
Weston, J., Chopra, S., & Bordesa, A. (2015). Memory networks. In International Conference on Learning Representations (ICLR).
Google Scholar
Williams, J. D., & Zweig, G. (2016a). End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning. arXiv:1606.01269.
Williams, J. D., & Zweig, G. (2016b). End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning. arXiv.
Google Scholar
Williams, J. D., Raux, A., Ramachandran, D., & Black, A. W. (2013). The dialog state tracking challenge. In SIGDIAL Conference (pp. 404–413).
Google Scholar
Williams, J., Raux, A., & Handerson, M. (2016). The dialog state tracking challenge series: A review. Dialogue and Discourse, 7(3), 4–33.
Google Scholar
Xu, P., & Sarikaya, R. (2013). Convolutional neural network based triangular CRF for joint intent detection and slot filling. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (pp. 78–83). IEEE.
Google Scholar
Yao, K., Zweig, G., Hwang, M.-Y., Shi, Y., & Yu, D. (2013). Recurrent neural networks for language understanding. In INTERSPEECH (pp. 2524–2528).
Google Scholar
Yu, Z., Black, A., & Rudnicky, A. I. (2017). Learning conversational systems that interleave task and non-task content. arXiv:1703.00099v1.
Yu, Y., Eshghi, A., & Lemon, O. (2016). Training an adaptive dialogue policy for interactive learning of visually grounded word meanings. SIGDIAL.
Google Scholar
Yu, Z., Papangelis, A., & Rudnicky, A. (2015). Ticktock: A non-goal-oriented multimodal dialog system with engagement awareness. In AAAI Spring Symposium.
Google Scholar
Yu, D., & Deng, L. (2015). Automatic speech recognition: A deep learning approach. Berlin: Springer.
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, Redmond, WA, USA
Asli Celikyilmaz
Citadel, Chicago & Seattle, USA
Li Deng
Google, Mountain View, CA, USA
Dilek Hakkani-Tür

Authors

Asli Celikyilmaz
View author publications
You can also search for this author in PubMed Google Scholar
Li Deng
View author publications
You can also search for this author in PubMed Google Scholar
Dilek Hakkani-Tür
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Asli Celikyilmaz .

Editor information

Editors and Affiliations

AI Research at Citadel , Chicago, Illinois, USA
Li Deng
Tsinghua University , Beijing, China
Yang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Celikyilmaz, A., Deng, L., Hakkani-Tür, D. (2018). Deep Learning in Spoken and Text-Based Dialog Systems. In: Deng, L., Liu, Y. (eds) Deep Learning in Natural Language Processing. Springer, Singapore. https://doi.org/10.1007/978-981-10-5209-5_3

Download citation

DOI: https://doi.org/10.1007/978-981-10-5209-5_3
Published: 24 May 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5208-8
Online ISBN: 978-981-10-5209-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics