Skip to main content
Log in

Non-deterministic and emotional chatting machine: learning emotional conversation generation using conditional variational autoencoders

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript


Conversational responses are non-trivial for artificial conversational agents. Artificial responses should not only be meaningful and plausible, but should also (1) have an emotional context and (2) should be non-deterministic (i.e., vary given the same input). The two factors enumerated, respectively, above are involved and this is demonstrated such that previous studies have tackled them individually. This paper is the first to tackle them together. Specifically, we present two models both based upon conditional variational autoencoders. The first model learns disentangled latent representations to generate conversational responses given a specific emotion. The other model explicitly learns different emotions using a mixture of multivariate Gaussian distributions. Experiments show that our proposed models can generate more plausible and diverse conversation responses in accordance with designated emotions compared to baseline approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others




  3. 0, 1 and 2 are content scores. 0 denotes content irrelevancy, 1 denotes moderately relevant content and 2 denotes content relevancy.

  4. 0 and 1 are emotion scores. 0 denotes that the emotion in response generated by our models is inconsistent with the given emotion category, and 1 denotes that the emotion in response is consistent with the given emotion category.


  1. Asghar N, Poupart P, Hoey J, Jiang X, Mou L (2018) Affective neural response generation. In: ECIR, pp 154–166

  2. Bahdanau D, Cho K, Bengio Y (2014)Neural machine translation by jointly learning to align and translate. CoRR arXiv:abs/1409.0473

  3. Blei DM, Ng AY, Jordan MI (2001) Latent Dirichlet allocation. In: NIPS, pp 601–608

  4. Callejas Z, Griol D, López-Cózar R (2011) Predicting user mental states in spoken dialogue systems. EURASIP J Adv Signal Process 2011:6

    Article  Google Scholar 

  5. Chung J, Gülçehre Ç, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR arXiv:abs/1412.3555

  6. Clark S, Cao K (2017) Latent variable dialogue models and their diversity. In: EACL, pp 182–187

  7. Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378–382

    Article  Google Scholar 

  8. Ghosh S, Chollet M, Laksana E, Morency L, Scherer S (2017) Affect-lm: a neural language model for customizable affective text generation. In: ACL, pp 634–642

  9. Hu Z, Yang Z, Liang X, Salakhutdinov R, Xing EP (2017) Toward controlled generation of text. In: ICML, pp 1587–1596

  10. Jain U, Zhang Z, Schwing AG (2017) Creativity: generating diverse questions using variational autoencoders. In: CVPR, pp 5415–5424

  11. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. CoRR arXiv:abs/1412.6980

  12. Kingma DP, Welling M (2013) Auto-encoding variational bayes. CoRR arXiv:abs/1312.6114

  13. Li J, Galley M, Brockett C, Gao J, Dolan B (2016) A diversity-promoting objective function for neural conversation models. In: NAACL, pp 110–119

  14. Li J, Galley M, Brockett C, Spithourakis GP, Gao J, Dolan WB (2016) A persona-based neural conversation model. In: ACL

  15. Li J, Monroe W, Jurafsky D (2016) A simple, fast diverse decoding algorithm for neural generation. CoRR arXiv:abs/1611.08562

  16. Li J, Monroe W, Ritter A, Jurafsky D, Galley M, Gao J (2016) Emnlp, pp 1192–1202

  17. Li J, Sun X (2018) A syntactically constrained bidirectional-asynchronous approach for emotional conversation generation. In: EMNLP. Association for Computational Linguistics, pp 678–683

  18. Li J, Sun X, Wei X, Li C, Tao J (2019) Reinforcement learning based emotional editing constraint conversation generation. CoRR arXiv:abs/1904.08061

  19. Liu C, Lowe R, Serban I, Noseworthy M, Charlin L, Pineau J (2016) How NOT to evaluate your dialogue system: an empirical study of unsupervised evaluation metrics for dialogue response generation. In: EMNLP, pp 2122–2132

  20. Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: EMNLP, pp 1412–1421

  21. Picard RW (2002) Affective computing. Technical Report vol 1(1), pp 71–73

  22. Pittermann J, Pittermann A, Minker W (2010) Emotion recognition and adaptation in spoken dialogue systems. Int J Speech Technol 13(1):49–60

    Article  Google Scholar 

  23. Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. In: EMNLP, pp 379–389

  24. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681

    Article  Google Scholar 

  25. Shang L, Lu Z, Li H (2015) Neural responding machine for short-text conversation. In: ACL, pp 1577–1586

  26. Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. In: NIPS, pp 3483–3491

  27. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: NIPS, pp 3104–3112

  28. Wang K, Wan X (2018) Sentigan: generating sentimental texts via mixture adversarial networks. In: IJCAI, pp 4446–4452.

  29. Wang L, Schwing AG, Lazebnik S (2017) Diverse and accurate image description using a variational auto-encoder with an additive Gaussian encoding space. In: NIPS, pp 5758–5768

  30. Wiseman S, Rush AM (2016) Sequence-to-sequence learning as beam-search optimization. In: EMNLP, pp 1296–1306

  31. Xing C, Wu W, Wu Y, Liu J, Huang Y, Zhou M, Ma W (2016) Topic augmented neural response generation with a joint attention mechanism. CoRR arXiv:abs/1606.08340

  32. Xing C, Wu W, Wu Y, Liu J, Huang Y, Zhou M, Ma W (2017) Topic aware neural response generation. In: AAAI, pp 3351–3357

  33. Zhao T, Zhao R, Eskénazi M (2017)Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In: ACL, pp 654–664

  34. Zhong P, Wang D, Miao C (2019) An affect-rich neural conversational model with biased attention and weighted cross-entropy loss. In: AAAI, pp 7492–7500. AAAI Press

  35. Zhou H, Huang M, Zhang T, Zhu X, Liu B (2018) Emotional chatting machine: Emotional conversation generation with internal and external memory. In: AAAI

Download references


This work was supported by the National Natural Science Foundation of China, Grant No. 61807033, the Key Research Program of Frontier Sciences, CAS, Grant No. ZDBS-LY-JSC038. Libo Zhang was supported by Youth Innovation Promotion Association, CAS (2020111) and Outstanding Youth Scientist Project of ISCAS.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Libo Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yao, K., Zhang, L., Luo, T. et al. Non-deterministic and emotional chatting machine: learning emotional conversation generation using conditional variational autoencoders. Neural Comput & Applic 33, 5581–5589 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: