Using reinforcement learning with external rewards for open-domain natural language generation

Srinivasan, Vidhushini; Santhanam, Sashank; Shaikh, Samira

doi:10.1007/s10844-020-00626-5

Using reinforcement learning with external rewards for open-domain natural language generation

Published: 06 November 2020

Volume 56, pages 189–206, (2021)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Vidhushini Srinivasan¹,
Sashank Santhanam¹ &
Samira Shaikh ORCID: orcid.org/0000-0002-2488-9436¹

904 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

We propose a new approach towards emotional natural language generation using bidirectional seq2seq model. Our goal is to generate emotionally relevant language that accommodates the emotional tone of the prior context. To incorporate emotional information, we train our own embeddings appended with emotion values through valence, arousal and dominance scores. We use a reinforcement-learning framework, which is tuned using policy gradient method. Two of the internal rewards in our reinforcement learning framework, viz. Ease of Answering and Semantic Coherence are based on prior state-of-the-art. We propose a new internal reward, Emotional Intelligence, computed by minimizing the affective dissonance between the source and generated text. We also train a separate external reward analyzer to predict the rewards as well as to maximize the expected rewards (both internal and external). We evaluate the system on two common corpora used for Natural Language Generation tasks: the Cornell Movie Dialog and Yelp Restaurant Review Corpus. We report standard evaluation metrics including BLEU, ROUGE-L and perplexity as well as human evaluation to validate our approach. We demonstrate the ability of proposed model to generate emotionally appropriate responses on both corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey on reinforcement learning for language processing

Article Open access 03 June 2022

ECCRG: A Emotion- and Content-Controllable Response Generation Model

Babbling - The HIT-SCIR System for Emotional Conversation Generation

Notes

The 10 responses are: “I don’t know.”, “I don’t know what I mean.”, “I don’t know what you’re talking about.”, “You don’t know.”, “You know what I mean.”, “You know what I’m saying.”, “You don’t know anything.”, “I am not sure.”, “I know what you mean.”, “I do not know anything.”
https://www.yelp.com/dataset

References

Arjovsky, M., Chintala, S., Bottou, L. (2017). Wasserstein gan. arXiv:1701.07875.
Asghar, N., Poupart, P., Hoey, J., Jiang, X., Mou, L. (2018). Affective neural response generation. In European Conference on Information Retrieval Springer, pp 154–166.
Badoy, W., & Teknomo, K. (2014). Q-learning with basic emotions. CoRR abs/1609.01468.
Bahdanau, D., Cho, K., Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv:1409.0473.
Budzianowski, P., & Vulic, I. (2019). EMNLP-IJCNLP 2019 p. 15.
Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., Amodei, D. (2017). Deep reinforcement learning from human preferences. In Guyon, I, Luxburg, UV, Bengio, S, Wallach, H, Fergus, R, Vishwanathan, S, & Garnett, R (Eds.) Advances in Neural Information Processing Systems 30 Curran Associates, Inc., pp 4299–4307. http://papers.nips.cc/paper/7017-deep-reinforcement-learning-from-human-preferences.pdf.
Danescu-Niculescu-Mizil, C., Lee, L. (2011). Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, ACL 2011.
Dušek, O, Novikova, J., Rieser, V. (2018). Findings of the e2e nlg challenge. arXiv:1810.01170.
Ferreira, TC, Calixto, I., Wubben, S., Krahmer, E. (2017). Linguistic realisation as machine translation: Comparing different mt models for amr-to-text generation. In Proceedings of the 10th International Conference on Natural Language Generation, pp 1–10.
Gatt, A., & Krahmer, E. (2018). Survey of the state of the art in natural language generation: Core tasks, applications and evaluation. Journal of Artificial Intelligence Research, 61, 65–170.
Article MathSciNet Google Scholar
Ghosh, S., Chollet, M., Laksana, E., Morency, L.-P., Scherer, S. (2017). Affect-lm: A neural language model for customizable affective text generation. arXiv:1704.06851.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems, pp 2672–2680.
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A C. (2017). Improved training of wasserstein gans. In Advances in neural information processing systems, pp 5767–5777.
Hashimoto, C., & Sassano, M. (2018). Detecting absurd conversations from intelligent assistant logs by exploiting user feedback utterances. In Proceedings of the 2018 World Wide Web Conference, pp 147–156.
Huang, C., Zaiane, OR, Trabelsi, A., Dziri, N. (2018). Automatic dialogue generation with expressed emotions. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp 49–54.
Jaques, N., Gu, S., Turner, R.E., Eck, D. (2016). Tuning recurrent neural networks with reinforcement learning. arXiv:1611.02796.
Kaelbling, L.P., Littman, M.L., Moore, A.W. (1996). Reinforcement learning: A survey. Journal of artificial intelligence research, 4, 237–285.
Article Google Scholar
Keshtkar, F., & Inkpen, D. (2011). A pattern-based model for generating text to express emotion. In International Conference on Affective Computing and Intelligent Interaction (Springer), pp 11–21.
Kuperman, V., Estes, Z., Brysbaert, M., Warriner, A.B. (2014). Emotion and language: Valence and arousal affect word recognition. Journal of Experimental Psychology: General, 143(3), 1065.
Article Google Scholar
Li, J., Galley, M., Brockett, C., Gao, J., Dolan, B. (2016a). A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Association for Computational Linguistics), pp 110–119. http://aclweb.org/anthology/N16-1014.
Li, J., Galley, M., Brockett, C., Spithourakis, G.P., Gao, J., Dolan, B. (2016). A persona-based neural conversation model. arXiv:1603.06155.
Li, J., Monroe, W., Ritter, A., Jurafsky, D., Galley, M., Gao, J. (2016). Deep reinforcement learning for dialogue generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (Association for Computational Linguistics), pp 1192–1202. http://www.aclweb.org/anthology/D16-1127.
Li, J., Monroe, W., Shi, T., Jean, S., Ritter, A., Jurafsky, D. (2017). Adversarial learning for neural dialogue generation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (Association for Computational Linguistics), pp 2157–2169. http://aclweb.org/anthology/D17-1230.
Linzen, T. (2020). How can we accelerate progress towards human-like linguistic generalization?. arXiv:2005.00955.
Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), 1–167.
Article Google Scholar
Lowe, R., Pow, N., Serban, I., Pineau, J. (2015). The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. arXiv:1506.08909.
Martinovski, B., & Traum, D. (2003). Breakdown in human-machine interaction: the error is the clue. In Proceedings of the ISCA tutorial and research workshop on Error handling in dialogue systems, pp 11–16.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., & Weinberger, K.Q. (Eds.) Advances in Neural Information Processing Systems 26 (Curran Associates, Inc.), pp 3111–3119. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf.
Mintz, M., Bills, S., Snow, R., Jurafsky, D. (2009). Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, (ACL), pp 1003–1011.
Moerland, T.M., Broekens, J., Jonker, CM. (2018). Emotion in reinforcement learning agents and robots: a survey. Machine Learning, 107(2), 443–480. https://doi.org/10.1007/s10994-017-5666-0.
Article MathSciNet MATH Google Scholar
Mohammad, S.M., & Turney, P.D. (2013). National Research Council, Canada 2.
Niu, T., & Bansal, M. (2018). Polite dialogue generation without parallel data. Transactions of the Association of Computational Linguistics, 6, 373–389.
Article Google Scholar
Novikova, J., Dušek, O., Curry, A.C., Rieser, V. (2017). Why we need new evaluation metrics for nlg. arXiv:1707.06875.
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (Association for Computational Linguistics), pp 311–318.
Poria, S., Majumder, N., Mihalcea, R., Hovy, E. (2019). Emotion recognition in conversation: Research challenges, datasets, and recent advances. IEEE Access, 7, 100943–100953.
Article Google Scholar
Prendinger, H., & Ishizuka, M. (2005). The empathic companion: A character-based interface that addresses users’affective states. Applied artificial intelligence, 19(3-4), 267–285.
Article Google Scholar
Prendinger, H., Mori, J., Ishizuka, M. (2005). Recognizing, modeling, and responding to users affective states. In International Conference on User Modeling (Springer), pp 60–69.
Rashkin, H., Smith, E.M., Li, M., Boureau, Y.-L. (2018). Towards empathetic open-domain conversation models: A new benchmark and dataset. arXiv:1811.00207.
Reiter, E., & Dale, R. (2000). Building natural language generation systems. Cambridge: Cambridge university press.
Book Google Scholar
Rieser, V., & Lemon, O. (2009). Natural language generation as planning under uncertainty for spoken dialogue systems. In Empirical methods in natural language generation (Springer), pp 105–120.
Rosis, F., & Grasso, F. (2000). Affective natural language generation. affective interactions, towards a new generation of computer interfaces, in ed. a. paiva, 204-218. New York: New York: Springer-Verlag.
Google Scholar
Sankar, C., & Ravi, S. (2019). Deep reinforcement learning for modeling chit-chat dialog with discrete attributes. arXiv:1907.02848.
Sequeira, P., Melo, F.S., Paiva, A. (October 2014). Learning by appraising: An emotion-based approach to intrinsic reward design. Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems, 22(5), 330–349. https://doi.org/10.1177/1059712314543837.
Article Google Scholar
Serban, I.V., Sordoni, A., Bengio, Y., Courville, A., Pineau, J. (2016). Building end-to-end dialogue systems using generative hierarchical neural network models. In Thirtieth AAAI Conference on Artificial Intelligence.
Sutskever, I., Vinyals, O., Le, QV. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pp 3104–3112.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems, pp 5998–6008.
Venkatesh, A., Khatri, C., Ram, A., Guo, F., Gabriel, R., Nagar, A., Prasad, R., Cheng, M., Hedayatnia, B., Metallinou, A., et al. (2018). On evaluating and comparing conversational agents, (Vol. 4. arXiv:1801.03625.
Vinyals, O., & Le, Q. (2015). A neural conversational model. arXiv:1506.05869.
Warriner, A.B., Kuperman, V., Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 english lemmas. Behavior Research Methods, 45(4), 1191–1207. https://doi.org/10.3758/s13428-012-0314-x.
Article Google Scholar
Yao, T., Pan, Y., Li, Y., Qiu, Z., Mei, T. (2017). Boosting image captioning with attributes. In Proceedings of the IEEE International Conference on Computer Vision, pp 4894–4902.
Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., Weston, J. (2018). Personalizing dialogue agents: I have a dog, do you have pets too?. arXiv:1801.07243.
Zhao, T., Zhao, R., Eskenazi, M. (2017). Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. arXiv:1703.10960.
Zhou, H., Huang, M., Zhang, T., Zhu, X., Liu, B. (2018). Emotional chatting machine: Emotional conversation generation with internal and external memory. In Thirty-Second AAAI Conference on Artificial Intelligence.

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC, USA
Vidhushini Srinivasan, Sashank Santhanam & Samira Shaikh

Authors

Vidhushini Srinivasan
View author publications
You can also search for this author in PubMed Google Scholar
Sashank Santhanam
View author publications
You can also search for this author in PubMed Google Scholar
Samira Shaikh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samira Shaikh.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Srinivasan, V., Santhanam, S. & Shaikh, S. Using reinforcement learning with external rewards for open-domain natural language generation. J Intell Inf Syst 56, 189–206 (2021). https://doi.org/10.1007/s10844-020-00626-5

Download citation

Received: 22 March 2020
Revised: 13 October 2020
Accepted: 14 October 2020
Published: 06 November 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s10844-020-00626-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using reinforcement learning with external rewards for open-domain natural language generation

Abstract

Access this article

Similar content being viewed by others

Survey on reinforcement learning for language processing

ECCRG: A Emotion- and Content-Controllable Response Generation Model

Babbling - The HIT-SCIR System for Emotional Conversation Generation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using reinforcement learning with external rewards for open-domain natural language generation

Abstract

Access this article

Similar content being viewed by others

Survey on reinforcement learning for language processing

ECCRG: A Emotion- and Content-Controllable Response Generation Model

Babbling - The HIT-SCIR System for Emotional Conversation Generation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation