Skip to main content
Log in

Using reinforcement learning with external rewards for open-domain natural language generation

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

We propose a new approach towards emotional natural language generation using bidirectional seq2seq model. Our goal is to generate emotionally relevant language that accommodates the emotional tone of the prior context. To incorporate emotional information, we train our own embeddings appended with emotion values through valence, arousal and dominance scores. We use a reinforcement-learning framework, which is tuned using policy gradient method. Two of the internal rewards in our reinforcement learning framework, viz. Ease of Answering and Semantic Coherence are based on prior state-of-the-art. We propose a new internal reward, Emotional Intelligence, computed by minimizing the affective dissonance between the source and generated text. We also train a separate external reward analyzer to predict the rewards as well as to maximize the expected rewards (both internal and external). We evaluate the system on two common corpora used for Natural Language Generation tasks: the Cornell Movie Dialog and Yelp Restaurant Review Corpus. We report standard evaluation metrics including BLEU, ROUGE-L and perplexity as well as human evaluation to validate our approach. We demonstrate the ability of proposed model to generate emotionally appropriate responses on both corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The 10 responses are: “I don’t know.”, “I don’t know what I mean.”, “I don’t know what you’re talking about.”, “You don’t know.”, “You know what I mean.”, “You know what I’m saying.”, “You don’t know anything.”, “I am not sure.”, “I know what you mean.”, “I do not know anything.”

  2. https://www.yelp.com/dataset

References

  • Arjovsky, M., Chintala, S., Bottou, L. (2017). Wasserstein gan. arXiv:1701.07875.

  • Asghar, N., Poupart, P., Hoey, J., Jiang, X., Mou, L. (2018). Affective neural response generation. In European Conference on Information Retrieval Springer, pp 154–166.

  • Badoy, W., & Teknomo, K. (2014). Q-learning with basic emotions. CoRR abs/1609.01468.

  • Bahdanau, D., Cho, K., Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv:1409.0473.

  • Budzianowski, P., & Vulic, I. (2019). EMNLP-IJCNLP 2019 p. 15.

  • Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., Amodei, D. (2017). Deep reinforcement learning from human preferences. In Guyon, I, Luxburg, UV, Bengio, S, Wallach, H, Fergus, R, Vishwanathan, S, & Garnett, R (Eds.) Advances in Neural Information Processing Systems 30 Curran Associates, Inc., pp 4299–4307. http://papers.nips.cc/paper/7017-deep-reinforcement-learning-from-human-preferences.pdf.

  • Danescu-Niculescu-Mizil, C., Lee, L. (2011). Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, ACL 2011.

  • Dušek, O, Novikova, J., Rieser, V. (2018). Findings of the e2e nlg challenge. arXiv:1810.01170.

  • Ferreira, TC, Calixto, I., Wubben, S., Krahmer, E. (2017). Linguistic realisation as machine translation: Comparing different mt models for amr-to-text generation. In Proceedings of the 10th International Conference on Natural Language Generation, pp 1–10.

  • Gatt, A., & Krahmer, E. (2018). Survey of the state of the art in natural language generation: Core tasks, applications and evaluation. Journal of Artificial Intelligence Research, 61, 65–170.

    Article  MathSciNet  Google Scholar 

  • Ghosh, S., Chollet, M., Laksana, E., Morency, L.-P., Scherer, S. (2017). Affect-lm: A neural language model for customizable affective text generation. arXiv:1704.06851.

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems, pp 2672–2680.

  • Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A C. (2017). Improved training of wasserstein gans. In Advances in neural information processing systems, pp 5767–5777.

  • Hashimoto, C., & Sassano, M. (2018). Detecting absurd conversations from intelligent assistant logs by exploiting user feedback utterances. In Proceedings of the 2018 World Wide Web Conference, pp 147–156.

  • Huang, C., Zaiane, OR, Trabelsi, A., Dziri, N. (2018). Automatic dialogue generation with expressed emotions. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp 49–54.

  • Jaques, N., Gu, S., Turner, R.E., Eck, D. (2016). Tuning recurrent neural networks with reinforcement learning. arXiv:1611.02796.

  • Kaelbling, L.P., Littman, M.L., Moore, A.W. (1996). Reinforcement learning: A survey. Journal of artificial intelligence research, 4, 237–285.

    Article  Google Scholar 

  • Keshtkar, F., & Inkpen, D. (2011). A pattern-based model for generating text to express emotion. In International Conference on Affective Computing and Intelligent Interaction (Springer), pp 11–21.

  • Kuperman, V., Estes, Z., Brysbaert, M., Warriner, A.B. (2014). Emotion and language: Valence and arousal affect word recognition. Journal of Experimental Psychology: General, 143(3), 1065.

    Article  Google Scholar 

  • Li, J., Galley, M., Brockett, C., Gao, J., Dolan, B. (2016a). A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Association for Computational Linguistics), pp 110–119. http://aclweb.org/anthology/N16-1014.

  • Li, J., Galley, M., Brockett, C., Spithourakis, G.P., Gao, J., Dolan, B. (2016). A persona-based neural conversation model. arXiv:1603.06155.

  • Li, J., Monroe, W., Ritter, A., Jurafsky, D., Galley, M., Gao, J. (2016). Deep reinforcement learning for dialogue generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (Association for Computational Linguistics), pp 1192–1202. http://www.aclweb.org/anthology/D16-1127.

  • Li, J., Monroe, W., Shi, T., Jean, S., Ritter, A., Jurafsky, D. (2017). Adversarial learning for neural dialogue generation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (Association for Computational Linguistics), pp 2157–2169. http://aclweb.org/anthology/D17-1230.

  • Linzen, T. (2020). How can we accelerate progress towards human-like linguistic generalization?. arXiv:2005.00955.

  • Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), 1–167.

    Article  Google Scholar 

  • Lowe, R., Pow, N., Serban, I., Pineau, J. (2015). The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. arXiv:1506.08909.

  • Martinovski, B., & Traum, D. (2003). Breakdown in human-machine interaction: the error is the clue. In Proceedings of the ISCA tutorial and research workshop on Error handling in dialogue systems, pp 11–16.

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., & Weinberger, K.Q. (Eds.) Advances in Neural Information Processing Systems 26 (Curran Associates, Inc.), pp 3111–3119. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf.

  • Mintz, M., Bills, S., Snow, R., Jurafsky, D. (2009). Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, (ACL), pp 1003–1011.

  • Moerland, T.M., Broekens, J., Jonker, CM. (2018). Emotion in reinforcement learning agents and robots: a survey. Machine Learning, 107(2), 443–480. https://doi.org/10.1007/s10994-017-5666-0.

    Article  MathSciNet  MATH  Google Scholar 

  • Mohammad, S.M., & Turney, P.D. (2013). National Research Council, Canada 2.

  • Niu, T., & Bansal, M. (2018). Polite dialogue generation without parallel data. Transactions of the Association of Computational Linguistics, 6, 373–389.

    Article  Google Scholar 

  • Novikova, J., Dušek, O., Curry, A.C., Rieser, V. (2017). Why we need new evaluation metrics for nlg. arXiv:1707.06875.

  • Papineni, K., Roukos, S., Ward, T., Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (Association for Computational Linguistics), pp 311–318.

  • Poria, S., Majumder, N., Mihalcea, R., Hovy, E. (2019). Emotion recognition in conversation: Research challenges, datasets, and recent advances. IEEE Access, 7, 100943–100953.

    Article  Google Scholar 

  • Prendinger, H., & Ishizuka, M. (2005). The empathic companion: A character-based interface that addresses users’affective states. Applied artificial intelligence, 19(3-4), 267–285.

    Article  Google Scholar 

  • Prendinger, H., Mori, J., Ishizuka, M. (2005). Recognizing, modeling, and responding to users affective states. In International Conference on User Modeling (Springer), pp 60–69.

  • Rashkin, H., Smith, E.M., Li, M., Boureau, Y.-L. (2018). Towards empathetic open-domain conversation models: A new benchmark and dataset. arXiv:1811.00207.

  • Reiter, E., & Dale, R. (2000). Building natural language generation systems. Cambridge: Cambridge university press.

    Book  Google Scholar 

  • Rieser, V., & Lemon, O. (2009). Natural language generation as planning under uncertainty for spoken dialogue systems. In Empirical methods in natural language generation (Springer), pp 105–120.

  • Rosis, F., & Grasso, F. (2000). Affective natural language generation. affective interactions, towards a new generation of computer interfaces, in ed. a. paiva, 204-218. New York: New York: Springer-Verlag.

    Google Scholar 

  • Sankar, C., & Ravi, S. (2019). Deep reinforcement learning for modeling chit-chat dialog with discrete attributes. arXiv:1907.02848.

  • Sequeira, P., Melo, F.S., Paiva, A. (October 2014). Learning by appraising: An emotion-based approach to intrinsic reward design. Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems, 22(5), 330–349. https://doi.org/10.1177/1059712314543837.

    Article  Google Scholar 

  • Serban, I.V., Sordoni, A., Bengio, Y., Courville, A., Pineau, J. (2016). Building end-to-end dialogue systems using generative hierarchical neural network models. In Thirtieth AAAI Conference on Artificial Intelligence.

  • Sutskever, I., Vinyals, O., Le, QV. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pp 3104–3112.

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems, pp 5998–6008.

  • Venkatesh, A., Khatri, C., Ram, A., Guo, F., Gabriel, R., Nagar, A., Prasad, R., Cheng, M., Hedayatnia, B., Metallinou, A., et al. (2018). On evaluating and comparing conversational agents, (Vol. 4. arXiv:1801.03625.

  • Vinyals, O., & Le, Q. (2015). A neural conversational model. arXiv:1506.05869.

  • Warriner, A.B., Kuperman, V., Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 english lemmas. Behavior Research Methods, 45(4), 1191–1207. https://doi.org/10.3758/s13428-012-0314-x.

    Article  Google Scholar 

  • Yao, T., Pan, Y., Li, Y., Qiu, Z., Mei, T. (2017). Boosting image captioning with attributes. In Proceedings of the IEEE International Conference on Computer Vision, pp 4894–4902.

  • Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., Weston, J. (2018). Personalizing dialogue agents: I have a dog, do you have pets too?. arXiv:1801.07243.

  • Zhao, T., Zhao, R., Eskenazi, M. (2017). Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. arXiv:1703.10960.

  • Zhou, H., Huang, M., Zhang, T., Zhu, X., Liu, B. (2018). Emotional chatting machine: Emotional conversation generation with internal and external memory. In Thirty-Second AAAI Conference on Artificial Intelligence.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samira Shaikh.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Srinivasan, V., Santhanam, S. & Shaikh, S. Using reinforcement learning with external rewards for open-domain natural language generation. J Intell Inf Syst 56, 189–206 (2021). https://doi.org/10.1007/s10844-020-00626-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-020-00626-5

Keywords

Navigation