Skip to main content

Deep Learning in Spoken and Text-Based Dialog Systems

  • Chapter
  • First Online:

Abstract

Last few decades have witnessed substantial breakthroughs on several areas of speech and language understanding research, specifically for building human to machine conversational dialog systems. Dialog systems, also known as interactive conversational agents, virtual agents or sometimes chatbots, are useful in a wide range of applications ranging from technical support services to language learning tools and entertainment. Recent success in deep neural networks has spurred the research in building data-driven dialog models. In this chapter, we present state-of-the-art neural network architectures and details on each of the components of building a successful dialog system using deep learning. Task-oriented dialog systems would be the focus of this chapter, and later different networks are provided for building open-ended non-task-oriented dialog systems. Furthermore, to facilitate research in this area, we have a survey of publicly available datasets and software tools suitable for data-driven learning of dialog systems. Finally, appropriate choice of evaluation metrics are discussed for the learning objective.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.microsoft.com/en-us/mobile/experiences/cortana/.

  2. 2.

    http://www.apple.com/ios/siri/.

  3. 3.

    https://developer.amazon.com/alexa.

  4. 4.

    https://madeby.google.com/home.

  5. 5.

    https://developers.facebook.com/blog/post/2016/04/12/bots-for-messenger/.

  6. 6.

    We refer the reader to the “Deep Learning in Conversational Language Understanding” chapter in this book for more details in discussing this issue.

  7. 7.

    https://www.microsoft.com/en-us/research/event/dialog-state-tracking-challenge/.

  8. 8.

    http://camdial.org/~mh521/dstc/.

  9. 9.

    http://www.colips.org/workshop/dstc4/.

  10. 10.

    http://workshop.colips.org/dstc5/.

  11. 11.

    https://datasets.maluuba.com/Frames.

  12. 12.

    https://github.com/facebookresearch/ParlAI.

  13. 13.

    https://github.com/rkadlec/ubuntu-ranking-dataset-creator.

  14. 14.

    https://github.com/plison/opendial.

  15. 15.

    https://github.com/facebookresearch/ParlAI.

  16. 16.

    https://github.com/UFAL-DSG/alex.

  17. 17.

    http://ufal.mff.cuni.cz/.

  18. 18.

    https://github.com/cuayahuitl/SimpleDS.

  19. 19.

    https://github.com/gunthercox/ChatterBot.

  20. 20.

    https://github.com/pender/chatbot-rnn.

  21. 21.

    http://meta-guide.com/software-meta-guide/100-best-github-chatbot.

References

  • Asri, L. E., He, J., & Suleman, K. (2016). A sequence-to-sequence model for user simulation in spoken dialogue systems. Interspeech.

    Google Scholar 

  • Aust, H., Oerder, M., Seide, F., & Steinbiss, V. (1995). The philips automatic train timetable information system. Speech Communication, 17, 249–262.

    Article  Google Scholar 

  • Banchs, R. E., & Li., H. (2012). Iris: A chat-oriented dialogue system based on the vector space model. ACL.

    Google Scholar 

  • Banerjee, S., & Lavie, A. (2005). Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization.

    Google Scholar 

  • Bapna, A., Tur, G., Hakkani-Tur, D., & Heck, L. (2017). Improving frame semantic parsing with hierarchical dialogue encoders.

    Google Scholar 

  • Bateman, J., & Henschel, R. (1999). From full generation to near-templates without losing generality. In KI’99 Workshop, “May I Speak Freely?”.

    Google Scholar 

  • Blundell, C., Cornebise, J., Kavukcuoglu, K., & Wierstra, D. (2015). Weight uncertainty in neural networks. ICML.

    Google Scholar 

  • Bordes, A., Boureau, Y.-L., & Weston, J. (2017). Learning end-to-end goal-oriented dialog. In ICLR 2017

    Google Scholar 

  • Busemann, S., & Horacek, H. (1998). A flexible shallow approach to text generation. In International Natural Language Generation Workshop, Niagara-on-the-Lake, Canada

    Google Scholar 

  • Celikyilmaz, A., Sarikaya, R., Hakkani-Tur, D., Liu, X., Ramesh, N., & Tur, G. (2016). A new pre-training method for training deep learning models with application to spoken language understanding. In Proceedings of Interspeech (pp. 3255–3259).

    Google Scholar 

  • Chen, Y.-N., Hakkani-Tür, D., Tur, G., Gao, J., & Deng, L. (2016). End-to-end memory networks with knowledge carryover for multi-turn spoken language understanding. In Proceedings of The 17th Annual Meeting of the International Speech Communication Association (INTERSPEECH), San Francisco, CA. ISCA.

    Google Scholar 

  • Crook, P., & Marin, A. (2017). Sequence to sequence modeling for user simulation in dialog systems. Interspeech.

    Google Scholar 

  • Cuayahuitl, H. (2016). Simpleds: A simple deep reinforcement learning dialogue system. In International Workshop on Spoken Dialogue Systems (IWSDS).

    Google Scholar 

  • Cuayahuitl, H., Yu, S., Williamson, A., & Carse, J. (2016). Deep reinforcement learning for multi-domain dialogue systems. arXiv:1611.08675.

  • Dale, R., & Reiter, E. (2000). Building natural language generation systems. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  • Deng, L. (2016). Deep learning from speech recognition to language and multi-modal processing. In APSIPA Transactions on Signal and Information Processing. Cambridge University Press.

    Google Scholar 

  • Deng, L., & Yu, D. (2015). Deep learning: Methods and applications. NOW Publishers.

    Article  MathSciNet  Google Scholar 

  • Deng, L., & Li, X. (2013). Machine learning paradigms for speech recognition: An overview. IEEE Transactions on Audio, Speech, and Language Processing, 21(5), 1060–1089.

    Article  Google Scholar 

  • Dhingra, B., Li, L., Li, X., Gao, J., Chen, Y.-N., Ahmed, F., & Deng, L. (2016a). End-to-end reinforcement learning of dialogue agents for information access. arXiv:1609.00777.

  • Dhingra, B., Li, L., Li, X., Gao, J., Chen, Y.-N., Ahmed, F., & Deng, L. (2016b). Towards end-to-end reinforcement learning of dialogue agents for information access. ACL.

    Google Scholar 

  • Dodge, J., Gane, A., Zhang, X., Bordes, A., Chopra, S., Miller, A., Szlam, A., & Weston, J. (2015). Evaluating prerequisite qualities for learning end-to-end dialog systems. arXiv:1511.06931.

  • Elhadad, M., & Robin, J. (1996). An overview of surge: A reusable comprehensive syntactic realization component. Technical Report 96-03, Department of Mathematics and Computer Science, Ben Gurion University, Beer Sheva, Israel.

    Google Scholar 

  • Fatemi, M., Asri, L. E., Schulz, H., He, J., & Suleman, K. (2016a). Policy networks with two-stage training for dialogue systems. arXiv:1606.03152.

  • Fatemi, M., Asri, L. E., Schulz, H., He, J., & Suleman, K. (2016b). Policy networks with two-stage training for dialogue systems. arXiv:1606.03152.

  • Forgues, G., Pineau, J., Larcheveque, J.-M., & Tremblay, R. (2014). Bootstrapping dialog systems with word embeddings. NIPS ML-NLP Workshop.

    Google Scholar 

  • Gai, M., Mrki, N., Su, P.-H., Vandyke, D., Wen, T.-H., & Young, S. (2015). Policy committee for adaptation in multi-domain spoken dialogue sytems. ASRU.

    Google Scholar 

  • Gai, M., Mrki, N., Rojas-Barahona, L. M., Su, P.-H., Ultes, S., Vandyke, D., et al. (2016). Dialogue manager domain adaptation using Gaussian process reinforcement learning. Computer Speech and Language, 45, 552–569.

    Google Scholar 

  • Gasic, M., Jurcicek, F., Keizer, S., Mairesse, F., Thomson, B., Yu, K., & Young, S. (2010). Gaussian processes for fast policy optimisation of POMDP-based dialogue managers. In SIGDIAL.

    Google Scholar 

  • Gasic, M., Mrksic, N., Su, P.-H., Vandyke, D., & Wen, T.-H. (2015). Multi-agent learning in multi-domain spoken dialogue systems. NIPS workshop on Spoken Language Understanding and Interaction.

    Google Scholar 

  • Ge, W., & Xu, B. (2016). Dialogue management based on multi-domain corpus. In Special Interest Group on Discourse and Dialog.

    Google Scholar 

  • Georgila, K., Henderson, J., & Lemon, O. (2005). Learning user simulations for information state update dialogue systems. In 9th European Conference on Speech Communication and Technology (INTERSPEECH—EUROSPEECH).

    Google Scholar 

  • Georgila, K., Henderson, J., & Lemon, O. (2006). User simulation for spoken dialogue systems: Learning and evaluation. In INTERSPEECH—EUROSPEECH.

    Google Scholar 

  • Goller, C., & Kchler, A. (1996). Learning task-dependent distributed representations by backpropagation through structure. IEEE.

    Google Scholar 

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In NIPS.

    Google Scholar 

  • Gorin, A. L., Riccardi, G., & Wright, J. H. (1997). How may i help you? Speech Communication, 23, 113–127.

    Article  Google Scholar 

  • Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks, 18, 602–610.

    Article  Google Scholar 

  • Hakkani-Tür, D., Tur, G., Celikyilmaz, A., Chen, Y.-N., Gao, J., Deng, L., & Wang, Y.-Y. (2016). Multi-domain joint semantic frame parsing using bi-directional rnn-lstm. In Proceedings of Interspeech (pp. 715–719).

    Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Berlin: Springer.

    Book  Google Scholar 

  • He, X., & Deng, L. (2011). Speech recognition, machine translation, and speech translation a unified discriminative learning paradigm. In IEEE Signal Processing Magazine.

    Google Scholar 

  • He, X., & Deng, L. (2013). Speech-centric information processing: An optimization-oriented approach. In IEEE.

    Article  Google Scholar 

  • He, J., Chen, J., He, X., Gao, J., Li, L., Deng, L., & Ostendorf, M. (2016). Deep reinforcement learning with a natural language action space. ACL.

    Google Scholar 

  • Hemphill, C. T., Godfrey, J. J., & Doddington, G. R. (1990). The ATIS spoken language systems pilot corpus. In DARPA Speech and Natural Language Workshop.

    Google Scholar 

  • Henderson, M., Thomson, B., & Williams, J. D. (2014). The third dialog state tracking challenge. In 2014 IEEE, Spoken Language Technology Workshop (SLT) (pp. 324–329). IEEE.

    Google Scholar 

  • Henderson, M., Thomson, B., & Young, S. (2013). Deep neural network approach for the dialog state tracking challenge. In Proceedings of the SIGDIAL 2013 Conference (pp. 467–471).

    Google Scholar 

  • Higashinaka, R., Imamura, K., Meguro, T., Miyazaki, C., Kobayashi, N., Sugiyama, H., et al. (2014). Towards an open-domain conversational system fully based on natural language processing. COLING.

    Google Scholar 

  • Hinton, G., Deng, L., Yu, D., Dahl, G., Rahman Mohamed, A., Jaitly, N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6), 82–97.

    Article  Google Scholar 

  • Huang, X., & Deng, L. (2010). An overview of modern speech recognition. In Handbook of Natural Language Processing (2nd ed., Chapter 15).

    Google Scholar 

  • Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., & Heck, L. (2013). Learning deep structured semantic models for web search using click-through data. In ACM International Conference on Information and Knowledge Management (CIKM).

    Google Scholar 

  • Jaech, A., Heck, L., & Ostendorf, M. (2016). Domain adaptation of recurrent neural networks for natural language understanding.

    Google Scholar 

  • Kannan, A., & Vinyals, O. (2016). Adversarial evaluation of dialog models. In Workshop on Adversarial Training, NIPS 2016, Barcelona, Spain.

    Google Scholar 

  • Kim, Y.-B., Stratos, K., & Kim, D. (2017a). Adversarial adaptation of synthetic or stale data. ACL.

    Google Scholar 

  • Kim, Y.-B., Stratos, K., & Kim, D. (2017b). Domain attention with an ensemble of experts. ACL.

    Google Scholar 

  • Kim, Y.-B., Stratos, K., & Sarikaya, R. (2016a). Domainless adaptation by constrained decoding on a schema lattice. COLING.

    Google Scholar 

  • Kim, Y.-B., Stratos, K., & Sarikaya, R. (2016b). Frustratingly easy neural domain adaptation. COLING.

    Google Scholar 

  • Kumar, A., Irsoy, O., Su, J., Bradbury, J., English, R., Pierce, B., et al. (2015). Ask me anything: Dynamic memory networks for natural language processing. In Neural Information Processing Systems (NIPS).

    Google Scholar 

  • Kurata, G., Xiang, B., Zhou, B., & Yu, M. (2016). Leveraging sentence level information with encoder lstm for natural language understanding. arXiv:1601.01530.

  • Langkilde, I., & Knight, K. (1998). Generation that exploits corpus-based statistical knowledge. ACL.

    Google Scholar 

  • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. IEEE, 86, 2278–2324.

    Article  Google Scholar 

  • Lemon, O., & Rieserr, V. (2009). Reinforcement learning for adaptive dialogue systems—tutorial. EACL.

    Google Scholar 

  • Li, L., Balakrishnan, S., & Williams, J. (2009). Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection. InterSpeech.

    Google Scholar 

  • Li, J., Galley, M., Brockett, C., Gao, J., & Dolan, B. (2016a). A diversity-promoting objective function for neural conversation models. NAACL.

    Google Scholar 

  • Li, J., Galley, M., Brockett, C., Spithourakis, G. P., Gao, J., & Dolan, B. (2016b). A persona based neural conversational model. ACL.

    Google Scholar 

  • Li, J., Monroe, W., Shu, T., Jean, S., Ritter, A., & Jurafsky, D. (2017). Adversarial learning for neural dialogue generation. arXiv:1701.06547.

  • Li, J., Deng, L., Gong, Y., & Haeb-Umbach, R. (2014). An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4), 745–777.

    Article  Google Scholar 

  • Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out: ACL-04 Workshop.

    Google Scholar 

  • Lipton, Z. C., Li, X., Gao, J., Li, L., Ahmed, F., & Deng, L. (2016). Efficient dialogue policy learning with bbq-networks. arXiv.org.

  • Lison, P. (2013). Structured probabilistic modelling for dialogue management. Department of Informatics Faculty of Mathematics and Natural Sciences University of Osloe.

    Google Scholar 

  • Liu, B., & Lane, I. (2016a). Attention-based recurrent neural network models for joint intent detection and slot filling. Interspeech.

    Google Scholar 

  • Liu, B., & Lane, I. (2016b). Attention-based recurrent neural network models for joint intent detection and slot filling. In SigDial.

    Google Scholar 

  • Liu, C.-W., Lowe, R., Serban, I. V., Noseworthy, M., Charlin, L., & Pineau, J. (2016). How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. EMNLP.

    Google Scholar 

  • Lowe, R., Pow, N., Serban, I. V., and Pineau, J. (2015b). The ubuntu dialogue corpus: A large dataset for research in unstructure multi-turn dialogue systems. In SIGDIAL 2015.

    Google Scholar 

  • Lowe, R., Pow, N., Serban, I. V., Charlin, L., and Pineau, J. (2015a). Incorporating unstructured textual knowledge sources into neural dialogue systems. In Neural Information Processing Systems Workshop on Machine Learning for Spoken Language Understanding.

    Google Scholar 

  • Mairesse, F., & Young, S. (2014). Stochastic language generation in dialogue using factored language models. Computer Linguistics.

    Article  Google Scholar 

  • Mairesse, F. and Walker, M. A. (2011). Controlling user perceptions of linguistic style: Trainable generation of personality traits. Computer Linguistics.

    Article  Google Scholar 

  • Mesnil, G., Dauphin, Y., Yao, K., Bengio, Y., Deng, L., Hakkani-Tur, D., et al. (2015). Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(3), 530–539.

    Article  Google Scholar 

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).

    Google Scholar 

  • Mizil, C. D. N. & Lee, L. (2011). Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, ACL 2011.

    Google Scholar 

  • Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with deep reinforcement learning. NIPS Deep Learning Workshop.

    Google Scholar 

  • Mrkšić, N., Séaghdha, D. Ó., Wen, T.-H., Thomson, B., & Young, S. (2016). Neural belief tracker: Data-driven dialogue state tracking. arXiv:1606.03777.

  • Oh, A. H., & Rudnicky, A. I. (2000). Stochastic language generation for spoken dialogue systems. ANLP/NAACL Workshop on Conversational Systems.

    Google Scholar 

  • Papineni, K., Roukos, S., Ward, T., & Zhu, W. (2002). Bleu: A method for automatic evaluation of machine translation. In 40th annual meeting on Association for Computational Linguistics (ACL).

    Google Scholar 

  • Passonneau, R. J., Epstein, S. L., Ligorio, T., & Gordon, J. (2011). Embedded wizardry. In SIGDIAL 2011 Conference.

    Google Scholar 

  • Peng, B., Li, X., Li, L., Gao, J., Celikyilmaz, A., Lee, S., & Wong, K.-F. (2017). Composite task-completion dialogue system via hierarchical deep reinforcement learning. arxiv:1704.03084v2.

  • Pietquin, O., Geist, M., & Chandramohan, S. (2011a). Sample efficient on-line learning of optimal dialogue policies with kalman temporal differences. In IJCAI 2011, Barcelona, Spain.

    Google Scholar 

  • Pietquin, O., Geist, M., Chandramohan, S., & FrezzaBuet, H. (2011b). Sample-efficient batch reinforcement learning for dialogue management optimization. ACM Transactions on Speech and Language Processing.

    Article  Google Scholar 

  • Ravuri, S., & Stolcke, A. (2015). Recurrent neural network and LSTM models for lexical utterance classification. In Sixteenth Annual Conference of the International Speech Communication Association.

    Google Scholar 

  • Ritter, A., Cherry, C., & Dolan., W. B. (2011). Data-driven response generation in social media. Empirical Methods in Natural Language Processing.

    Google Scholar 

  • Sarikaya, R., Hinton, G. E., & Ramabhadran, B. (2011). Deep belief nets for natural language call-routing. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5680–5683). IEEE.

    Google Scholar 

  • Sarikaya, R., Hinton, G. E., & Deoras, A. (2014). Application of deep belief networks for natural language understanding. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4), 778–784.

    Article  Google Scholar 

  • Schatzmann, J., Weilhammer, K., & Matt Stutle, S. Y. (2006). A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. The Knowledge Engineering Review.

    Article  Google Scholar 

  • Serban, I., Klinger, T., Tesauro, G., Talamadupula, K., Zhou, B., Bengio, Y., & Courville, A. (2016a). Multiresolution recurrent neural networks: An application to dialogue response generation. arXiv:1606.00776v2

  • Serban, I., Sordoni, A., & Bengio, Y. (2017). A hierarchical latent variable encoder-decoder model for generating dialogues. AAAI.

    Google Scholar 

  • Serban, I. V., Sordoni, A., Bengio, Y., Courville, A., & Pineau, J. (2015). Building end-to-end dialogue systems using generative hierarchical neural network models. AAAI.

    Google Scholar 

  • Serban, I. V., Sordoni, A., Bengio, Y., Courville, A., & Pineau, J. (2016b). Building end-to-end dialogue systems using generative hierarchical neural networks. AAAI.

    Google Scholar 

  • Shah, P., Hakkani-Tur, D., & Heck, L. (2016). Interactive reinforcement learning for task-oriented dialogue management. SIGDIAL.

    Google Scholar 

  • Shang, L., Lu, Z., & Li, H. (2015). Neural responding machine for short text conversation. ACL-IJCNLP.

    Google Scholar 

  • Simonnet, E., Camelin, N., Deléglise, P., & Estève, Y. (2015). Exploring the use of attention-based recurrent neural networks for spoken language understanding. In Machine Learning for Spoken Language Understanding and Interaction NIPS 2015 Workshop (SLUNIPS 2015).

    Google Scholar 

  • Simpson, A. & Eraser, N. M. (1993). Black box and glass box evaluation of the sundial system. In Third European Conference on Speech Communication and Technology.

    Google Scholar 

  • Singh, S. P., Kearns, M. J., Litman, D. J., & Walker, M. A. (2016). Reinforcement learning for spoken dialogue systems. NIPS.

    Google Scholar 

  • Sordoni, A., Galley, M., Auli, M., Brockett, C., Ji, Y., Mitchell, M., et al. (2015a). A neural network approach to context-sensitive generation of conversational responses. In North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2015).

    Google Scholar 

  • Sordoni, A., Galley, M., Auli, M., Brockett, C., Ji, Y., Mitchell, M., Nie, J.-Y., et al. (2015b). A neural network approach to context-sensitive generation of conversational responses. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 196–205), Denver, Colorado. Association for Computational Linguistics.

    Google Scholar 

  • Stent, A. (1999). Content planning and generation in continuous-speech spoken dialog systems. In KI’99 workshop, “May I Speak Freely?”.

    Google Scholar 

  • Stent, A., Prasad, R., & Walker, M. (2004). Trainable sentence planning for complex information presentation in spoken dialog systems. ACL.

    Google Scholar 

  • Su, P.-H., Gasic, M., Mrksic, N., Rojas-Barahona, L., Ultes, S., Vandyke, D., et al. (2016). On-line active reward learning for policy optimisation in spoken dialogue systems. arXiv:1605.07669.

  • Sukhbaatar, S., Weston, J., Fergus, R., et al. (2015). End-to-end memory networks. In Advances in neural information processing systems (pp. 2440–2448).

    Google Scholar 

  • Sutton, R. S., & Singh, S. P. (1999). Between mdps and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181–211.

    Article  MathSciNet  Google Scholar 

  • Tafforeau, J., Bechet, F., Artières, T., & Favre, B. (2016). Joint syntactic and semantic analysis with a multitask deep learning framework for spoken language understanding. In Interspeech (pp. 3260–3264).

    Google Scholar 

  • Tao, C., Mou, L., Zhao, D., & Yan, R. (2017). Ruber: An unsupervised method for automatic evaluation of open-domain dialog systems. ArXiv2017.

    Google Scholar 

  • Thomson, B., & Young, S. (2010). Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems. Computer Speech and Language, 24(4), 562–588.

    Article  Google Scholar 

  • Tur, G., Deng, L., Hakkani-Tür, D., & He, X. (2012). Towards deeper understanding: Deep convex networks for semantic utterance classification. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5045–5048). IEEE.

    Google Scholar 

  • Tur, G., & Deng, L. (2011). Intent determination and spoken utterance classification, Chapter 4 in Book: Spoken language understanding. New York, NY: Wiley.

    Google Scholar 

  • Tur, G., & De Mori, R. (2011). Spoken language understanding: Systems for extracting semantic information from speech. New York: Wiley.

    Book  Google Scholar 

  • Vinyals, O., & Le, Q. (2015). A neural conversational model. arXiv:1506.05869.

  • Walker, M., Stent, A., Mairesse, F., & Prasad, R. (2007). Individual and domain adaptation in sentence planning for dialogue. Journal of Artificial Intelligence Research.

    Google Scholar 

  • Wang, Z., Stylianou, Y., Wen, T.-H., Su, P.-H., & Young, S. (2015). Learning domain-independent dialogue policies via ontology parameterisation. In SIGDAIL.

    Google Scholar 

  • Wen, T.-H., Gasic, M., Mrksic, N., Rojas-Barahona, L. M., Pei-Hao, P., Ultes, S., et al. (2016a). A network-based end-to-end trainable task-oriented dialogue system. arXiv.

    Google Scholar 

  • Wen, T.-H., Gasic, M., Mrksic, N., Rojas-Barahona, L. M., Su, P.-H., Ultes, S., et al. (2016b). A network-based end-to-end trainable task-oriented dialogue system. arXiv:1604.04562.

  • Wen, T.-H., Gasic, M., Mrksic, N., Su, P.-H., Vandyke, D., & Young, S. (2015a). Semantically conditioned LSTM-based natural language generation for spoken dialogue systems. EMNLP.

    Google Scholar 

  • Wen, T.-H., Gasic, M., Mrksic, N., Su, P.-H., Vandyke, D., & Young, S. (2015b). Semantically conditioned LSTM-based natural language generation for spoken dialogue systems. arXiv:1508.01745

  • Weston, J., Chopra, S., & Bordesa, A. (2015). Memory networks. In International Conference on Learning Representations (ICLR).

    Google Scholar 

  • Williams, J. D., & Zweig, G. (2016a). End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning. arXiv:1606.01269.

  • Williams, J. D., & Zweig, G. (2016b). End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning. arXiv.

    Google Scholar 

  • Williams, J. D., Raux, A., Ramachandran, D., & Black, A. W. (2013). The dialog state tracking challenge. In SIGDIAL Conference (pp. 404–413).

    Google Scholar 

  • Williams, J., Raux, A., & Handerson, M. (2016). The dialog state tracking challenge series: A review. Dialogue and Discourse, 7(3), 4–33.

    Google Scholar 

  • Xu, P., & Sarikaya, R. (2013). Convolutional neural network based triangular CRF for joint intent detection and slot filling. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (pp. 78–83). IEEE.

    Google Scholar 

  • Yao, K., Zweig, G., Hwang, M.-Y., Shi, Y., & Yu, D. (2013). Recurrent neural networks for language understanding. In INTERSPEECH (pp. 2524–2528).

    Google Scholar 

  • Yu, Z., Black, A., & Rudnicky, A. I. (2017). Learning conversational systems that interleave task and non-task content. arXiv:1703.00099v1.

  • Yu, Y., Eshghi, A., & Lemon, O. (2016). Training an adaptive dialogue policy for interactive learning of visually grounded word meanings. SIGDIAL.

    Google Scholar 

  • Yu, Z., Papangelis, A., & Rudnicky, A. (2015). Ticktock: A non-goal-oriented multimodal dialog system with engagement awareness. In AAAI Spring Symposium.

    Google Scholar 

  • Yu, D., & Deng, L. (2015). Automatic speech recognition: A deep learning approach. Berlin: Springer.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asli Celikyilmaz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Celikyilmaz, A., Deng, L., Hakkani-Tür, D. (2018). Deep Learning in Spoken and Text-Based Dialog Systems. In: Deng, L., Liu, Y. (eds) Deep Learning in Natural Language Processing. Springer, Singapore. https://doi.org/10.1007/978-981-10-5209-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-5209-5_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-5208-8

  • Online ISBN: 978-981-10-5209-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics