Evaluating Humorous Response Generation to Playful Shopping Requests

Shapira, Natalie; Kalinsky, Oren; Libov, Alex; Shani, Chen; Tolmach, Sofia

doi:10.1007/978-3-031-28238-6_53

Natalie Shapira¹⁶,
Oren Kalinsky¹⁷,
Alex Libov¹⁷,
Chen Shani¹⁸ &
…
Sofia Tolmach¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13981))

Included in the following conference series:

European Conference on Information Retrieval

1570 Accesses
1 Citations
6 Altmetric

Abstract

AI assistants are gradually becoming embedded in our lives, utilized for everyday tasks like shopping or music. In addition to the everyday utilization of AI assistants, many users engage them with playful shopping requests, gauging their ability to understand – or simply seeking amusement. However, these requests are often not being responded to in the same playful manner, causing dissatisfaction and even trust issues.

In this work, we focus on equipping AI assistants with the ability to respond in a playful manner to irrational shopping requests. We first evaluate several neural generation models, which lead to unsuitable results – showing that this task is non-trivial. We devise a simple, yet effective, solution, that utilizes a knowledge graph to generate template-based responses grounded with commonsense. While the commonsense-aware solution is slightly less diverse than the generative models, it provides better responses to playful requests. This emphasizes the gap in commonsense exhibited by neural language models.

N. Shapira and C. Shani—Work was done during an internship at Amazon.

Except for the first author, the rest of the authors follow the ABC of surnames.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://registry.opendata.aws/shopping-humor-generation/.
2.
The full list is included in the code repository. T5 had 95 prompts, and GPT-2 had 89 (the prompts that were suffix-based are irrelevant to GPT-2 that attends to the prefix. Top-K = 50, Top-P = 0.95, Beam width = 10, Max length GPT-2 = 50 T5-3B = 20.
3.
The full list of relations, templates, and filtering logic is included in the code repository.
4.
The dataset of non-shoppable items and responses are included in the code repository.
5.
Workers were paid 5 cents per generated non-shoppable item.
6.
Preliminary experiments showed that annotators tended to rank responses with a discourse issue as worse than the baseline response (–1/–2).

References

Amin, M., Burghardt, M.: A survey on approaches to computational humor generation. In: Proceedings of the The 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 29–41 (2020)
Google Scholar
Binsted, K., Ritchie, G.: Computational rules for generating punning riddles (1997)
Google Scholar
Dybala, P., Ptaszynski, M., Higuchi, S., Rzepka, R., Araki, K.: Humor prevails!-implementing a joke generator into a conversational system. In: Wobcke, W., Zhang, M. (eds.) Australasian Joint Conference on Artificial Intelligence. LNCS, vol. 5360, pp. 214–225. Springer, Cham (2008). https://doi.org/10.1007/978-3-540-89378-3_21
Goldberg, K., Roeder, T., Gupta, D., Perkins, C.: Eigentaste: a constant time collaborative filtering algorithm. Inform. Retriev. 4(2), 133–151 (2001)
Google Scholar
Goldberg, Y.: Neural network methods for natural language processing. Synth. Lect. Hum. Lang. Technol. 10(1), 1–309 (2017)
Article Google Scholar
Hessel, J., et al.: Do androids laugh at electric sheep? humor “understanding” benchmarks from the new yorker caption contest. arXiv preprint arXiv:2209.06293 (2022)
Kirk, H.R., et al.: Bias out-of-the-box: an empirical analysis of intersectional occupational biases in popular generative language models. Adv. Neural Inform. Process. Syst. 34 (2021)
Google Scholar
Le, M., Boureau, Y.L., Nickel, M.: Revisiting the evaluation of theory of mind through question answering. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5872–5877 (2019)
Google Scholar
Liang, P.P., Wu, C., Morency, L.P., Salakhutdinov, R.: Towards understanding and mitigating social biases in language models. In: International Conference on Machine Learning, pp. 6565–6576. PMLR (2021)
Google Scholar
Lin, B.Y., et al.: Commongen: A constrained text generation challenge for generative commonsense reasoning. arXiv preprint arXiv:1911.03705 (2019)
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586 (2021)
Luo, F., Li, S., Yang, P., Chang, B., Sui, Z., Sun, X., et al.: Pun-gan: generative adversarial network for pun generation. arXiv preprint arXiv:1910.10950 (2019)
Mehrabi, N., Zhou, P., Morstatter, F., Pujara, J., Ren, X., Galstyan, A.: Lawyers are dishonest? quantifying representational harms in commonsense knowledge resources. arXiv preprint arXiv:2103.11320 (2021)
Nadeem, M., Bethke, A., Reddy, S.: Stereoset: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456 (2020)
Petrovic, S., Matthews, D.: Unsupervised joke generation from big data. In: ACL (2), pp. 228–232 (2013)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019)
Ren, H., Yang, Q.: Neural joke generation. Final Project Reports of Course CS224n (2017)
Google Scholar
Sakaguchi, K., Bras, R.L., Bhagavatula, C., Choi, Y.: Winogrande: an adversarial winograd schema challenge at scale. Commun. ACM 64(9), 99–106 (2021)
Article Google Scholar
Sap, M., LeBras, R., Fried, D., Choi, Y.: Neural theory-of-mind? on the limits of social intelligence in large LMS. arXiv preprint arXiv:2210.13312 (2022)
Sap, M., Rashkin, H., Chen, D., LeBras, R., Choi, Y.: Socialiqa: commonsense reasoning about social interactions. arXiv preprint arXiv:1904.09728 (2019)
Shani, C., Libov, A., Tolmach, S., Lewin-Eytan, L., Maarek, Y., Shahaf, D.: “alexa, what do you do for fun?” characterizing playful requests with virtual assistants. arXiv preprint arXiv:2105.05571 (2021)
Speer, R., Chin, J., Havasi, C.: Conceptnet 5.5: an open multilingual graph of general knowledge. In: Thirty-first AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Stock, O., Strapparava, C.: Hahacronym: humorous agents for humorous acronyms (2003)
Google Scholar
Stock, O., Strapparava, C.: Hahacronym: a computational humor system. In: Proceedings of the ACL Interactive Poster and Demonstration Sessions, pp. 113–116 (2005)
Google Scholar
Talmor, A., et al.: Commonsenseqa 2.0: Exposing the limits of AI through gamification. arXiv preprint arXiv:2201.05320 (2022)
Tevet, G., Habib, G., Shwartz, V., Berant, J.: Evaluating text gans as language models. arXiv preprint arXiv:1810.12686 (2018)
Valitutti, A.: How many jokes are really funny? In: Human-Machine Interaction in Translation: Proceedings of the 8th International NLPCS Workshop, vol. 41, p. 189. Samfundslitteratur (2011)
Google Scholar
Valitutti, A., Doucet, A., Toivanen, J.M., Toivonen, H.: Computational generation and dissection of lexical replacement humor. Natl. Lang. Eng. 22(5), 727–749 (2016)
Article Google Scholar
Weidinger, L., et al.: Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359 (2021)
Winters, T., Nys, V., Schreye, D.D.: Automatic joke generation: learning humor from examples. In: Streitz, N., Konomi, S. (eds.) International Conference on Distributed, Ambient, and Pervasive Interactions. LNCS, vol. 10922, pp. 360–377. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91131-1_28
Yu, Z., Tan, J., Wan, X.: A neural approach to pun generation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1650–1660 (2018)
Google Scholar
Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A., Choi, Y.: Hellaswag: can a machine really finish your sentence? arXiv preprint arXiv:1905.07830 (2019)

Download references

Author information

Authors and Affiliations

Bar-Ilan University, Ramat Gan, Israel
Natalie Shapira
Amazon Science, Tel Aviv, Israel
Oren Kalinsky, Alex Libov & Sofia Tolmach
The Hebrew University of Jerusalem, Jerusalem, Israel
Chen Shani

Authors

Natalie Shapira
View author publications
You can also search for this author in PubMed Google Scholar
Oren Kalinsky
View author publications
You can also search for this author in PubMed Google Scholar
Alex Libov
View author publications
You can also search for this author in PubMed Google Scholar
Chen Shani
View author publications
You can also search for this author in PubMed Google Scholar
Sofia Tolmach
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Natalie Shapira .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Jaap Kamps
Université Grenoble-Alpes, Saint-Martin-d’Hères, France
Lorraine Goeuriot
Università della Svizzera Italiana, Lugano, Switzerland
Fabio Crestani
University of Copenhagen, Copenhagen, Denmark
Maria Maistro
University of Tsukuba, Ibaraki, Japan
Hideo Joho
Dublin City University, Dublin, Ireland
Brian Davis
Dublin City University, Dublin, Ireland
Cathal Gurrin
Universität Regensburg, Regensburg, Germany
Udo Kruschwitz
Dublin City University, Dublin, Ireland
Annalina Caputo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shapira, N., Kalinsky, O., Libov, A., Shani, C., Tolmach, S. (2023). Evaluating Humorous Response Generation to Playful Shopping Requests. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13981. Springer, Cham. https://doi.org/10.1007/978-3-031-28238-6_53

Download citation

DOI: https://doi.org/10.1007/978-3-031-28238-6_53
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28237-9
Online ISBN: 978-3-031-28238-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics