Skip to main content
Log in

Toward joint utilization of absolute and relative bandit feedback for conversational recommendation

  • Published:
User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

Abstract

Conversational recommendation has been a promising solution for recent recommenders to address the cold-start problem suffered by traditional recommender systems. To actively elicit users’ dynamically changing preferences, conversational recommender systems periodically query the users’ preferences on item attributes and collect conversational feedback. However, most existing conversational recommender systems only enable users to provide one type of feedback, either absolute or relative. In practice, absolute feedback can be biased and imprecise due to users’ varying rating criteria. Relative feedback, in the meanwhile, suffers from its hardship to reveal the absolute user attitudes. Hence, asking only one type of questions throughout the whole conversation may not efficiently elicit users’ preferences of high accuracy. Moreover, many existing conversational recommender systems only allow users to provide binary feedback, which can be noisy when users do not have a particular inclination. To address the above issues, we propose a generalized conversational recommendation framework, hybrid rating-comparison conversational recommender system. The system can seamlessly ask absolute and relative questions and incorporate both types of feedback with possible neutral responses. While it is promising to utilize different types of feedback together, it can be difficult to build a joint model incorporating them as they bear different interpretations of users’ preferences. To ensure relative feedback can be effectively leveraged, we first propose a bandit algorithm, RelativeConUCB. On the basis of it, we further propose a new bandit algorithm, ArcUCB, to utilize jointly absolute and relative feedback with possible neutral responses for preference elicitation. The experiments on both synthetic and real-world datasets validate the advantage of our proposed methods, in comparison with existing bandit algorithms in conversational recommender systems

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. It needs to clarify that ‘iteration’ considers interaction with multiple users, while ‘round’ refers to interaction with a specific user.

  2. Note that Fig. 2a is slightly different from Fig. 2 in our previous SIGIR paper. It is due to a small bug in our previous implementation, which has a negative impact on the final results of the Difference-type algorithms.

  3. http://www.lastfm.com.

  4. http://www.grouplens.org.

  5. http://vi.sualize.us.

  6. https://www.bibsonomy.org.

  7. http://www.imdb.com.

  8. http://www.rottentomatoes.com.

  9. It is a slight abuse of notion, as \(\varvec{\theta }_u\) usually stands for the estimated user feature.

References

  • Agrawal, S., Jia, R.: Optimistic posterior sampling for reinforcement learning: Worst-case regret bounds. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1184–1194. Curran Associates Inc., Red Hook, NIPS’17 (2017)

  • Aliannejadi, M., Zamani, H., Crestani, F., et al.: Asking clarifying questions in open-domain information-seeking conversations. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, SIGIR’19, pp. 475–484 (2019) https://doi.org/10.1145/3331184.3331265

  • Chapelle, O., Joachims, T., Radlinski, F., et al.: Large-scale validation and analysis of interleaved search evaluation. ACM Trans. Inf. Syst. 30(1), 1–41 (2012)

    Article  Google Scholar 

  • Chen, Q., Lin, J., Zhang, Y., et al.: Towards knowledge-based recommender dialog system. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp. 1803–1813, https://doi.org/10.18653/v1/D19-1189, https://www.aclweb.org/anthology/D19-1189 (2019)

  • Chin, W.S., Yuan, B.W., Yang, M.Y., et al.: Libmf: a library for parallel matrix factorization in shared-memory systems. J. Mach. Learn. Res. 17(86), 1–5 (2016)

    MathSciNet  Google Scholar 

  • Christakopoulou, K., Beutel, A., Li, R., et al.: Q &r: A two-stage approach toward interactive recommendation. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Association for Computing Machinery, New York, KDD ’18, pp. 139–148, https://doi.org/10.1145/3219819.3219894 (2018)

  • Christakopoulou, K., Radlinski, F., Hofmann, K.: Towards conversational recommender systems. In: Krishnapuram, B., Shah, M., Smola, A.J., et al. (eds.) Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, August 13-17, 2016, pp. 815–824. ACM (2016). https://doi.org/10.1145/2939672.2939746

  • Christiano, P. F., Leike, J., Brown, T. B., et al.: Deep reinforcement learning from human preferences. In: Guyon, I., von Luxburg, U., Bengio, S., et al. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, pp. 4299–4307 (2017) https://proceedings.neurips.cc/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html

  • Cui, Z., Sato, I.: Active classification with uncertainty comparison queries. Neural Comput. 34(3), 781–803 (2022). https://doi.org/10.1162/neco_a_01473

    Article  MathSciNet  Google Scholar 

  • Das, A., Datar, M., Garg, A., et al.: Google news personalization: scalable online collaborative filtering. In: Williamson, C.L., Zurko, M.E., Patel-Schneider, P.F., et al. (eds) Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8-12, 2007, pp. 271–280. ACM (2007) https://doi.org/10.1145/1242572.1242610

  • Fu, Z., Xian, Y., Zhang, Y., et al.: Tutorial on conversational recommendation systems. In: Santos R.L.T., Marinho, L.B., Daly, E.M., et al (eds) RecSys 2020: Fourteenth ACM Conference on Recommender Systems, Virtual Event, Brazil, September 22-26, 2020, pp. 751–753. ACM (2020) https://doi.org/10.1145/3383313.3411548

  • Gao, C., Lei, W., He, X., et al.: Advances and challenges in conversational recommender systems: a survey. (2021) arXiv:2101.09459

  • Guo, H., Naeff, R., Nikulkov, A., et al.: Evaluating online bandit exploration in large-scale recommender system. In: KDD-23 Workshop on Multi-Armed Bandits and Reinforcement Learning: Advancing Decision Making in E-Commerce and Beyond (2023)

  • He, Z., Zhao, H., Yu, T., et al.: Bundle mcr: Towards conversational bundle recommendation. In: Proceedings of the 16th ACM Conference on Recommender Systems. Association for Computing Machinery, New York, RecSys ’22, pp. 288–298 (2022) https://doi.org/10.1145/3523227.3546755

  • Holladay, R., Javdani, S., Dragan, A., et al.: Active comparison based learning incorporating user uncertainty and noise. In: RSS Workshop on Model Learning for Human-Robot Communication (2016)

  • Ian, O., Benjamin, V. R., Daniel, R.: Efficient reinforcement learning via posterior sampling. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, pp. 3003–3011. Curran Associates Inc., Red Hook, NIPS’13 (2013)

  • Ignatenko, T., Kondrashov, K., Cox, M., et al.: On preference learning based on sequential bayesian optimization with pairwise comparison. (2021) arXiv:2103.13192

  • Jameson, A., Willemsen, M., Felfernig, A., et al.: Human Decision Making And Recommender Systems, 2nd edn, pp. 611–648. Springer, Germany. (2015) https://doi.org/10.1007/978-1-4899-7637-6_18

  • Jawaheer, G., Szomszor, M., Kostkova, P.: Comparison of implicit and explicit feedback from an online music recommendation service. Association for Computing Machinery, New York, HetRec ’10, pp. 47–51 (2010) https://doi.org/10.1145/1869446.1869453

  • Joachims, T., Granka, L., Pan, B., et al.: Accurately interpreting clickthrough data as implicit feedback. In: ACM SIGIR Forum, ACM New York, pp. 4–11 (2017)

  • Kalloori, S., Li, T., Ricci, F.: Item recommendation by combining relative and absolute feedback data. Association for Computing Machinery, New York, SIGIR’19, pp. 933–936 (2019) https://doi.org/10.1145/3331184.3331295

  • Kalloori, S., Ricci, F., Tkalcic, M.: Pairwise preferences based matrix factorization and nearest neighbor recommendation techniques. In: Proceedings of the 10th ACM Conference on Recommender Systems. Association for Computing Machinery, New York, RecSys ’16, pp. 143–146 (2016) https://doi.org/10.1145/2959100.2959142

  • Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)

    Article  Google Scholar 

  • Lei, W., He, X., de Rijke, M., et al.: Conversational recommendation: Formulation, methods, and evaluation. In: Huang J, Chang Y, Cheng X, et al (eds) Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020, pp. 2425–2428. ACM (2020b) https://doi.org/10.1145/3397271.3401419

  • Lei, W., He, X., Miao, Y., et al.: Estimation-action-reflection: Towards deep interaction between conversational and recommender systems. In: Proceedings of the 13th International Conference on Web Search and Data Mining. Association for Computing Machinery, New York, WSDM ’20, pp. 304–312 (2020a) https://doi.org/10.1145/3336191.3371769

  • Lei, W., Zhang, G., He, X., et al.: Interactive path reasoning on graph for conversational recommendation. In: Gupta, R., Liu, Y., Tang, J., et al (eds) KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020, pp. 2073–2083. ACM (2020c) https://dl.acm.org/doi/10.1145/3394486.3403258

  • Li, L., Chu, W., Langford, J., et al.: A contextual-bandit approach to personalized news article recommendation. In: Rappa, M., Jones, P., Freire, J., et al. (eds) Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, April 26-30, 2010, pp. 661–670. ACM (2010) https://doi.org/10.1145/1772690.1772758

  • Li, R., Kahou, S. E., Schulz, H., et al.: Towards deep conversational recommendations. In: Bengio S, Wallach HM, Larochelle H, et al (eds) Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pp. 9748–9758 (2018) https://proceedings.neurips.cc/paper/2018/hash/800de15c79c8d840f4e78d3af937d4d4-Abstract.html

  • Li, S., Lei, W., Wu, Q., et al.: Seamlessly unifying attributes and items: Conversational recommendation for cold-start users. (2020) arXiv:2005.12979

  • Li, Q., Zhao, C., Yu, T., et al.: Clustering of conversational bandits with posterior sampling for user preference learning and elicitation. User Modeling and User-Adapted Interaction pp. 1–48 (2023)

  • Pazzani, M. J., Billsus, D.: Content-based recommendation systems. In: The adaptive web, pp. 325–341. Springer (2007)

  • Prathama, F., Senjaya, W.F., Yahya, B.N., et al.: Personalized recommendation by matrix co-factorization with multiple implicit feedback on pairwise comparison. Comput. Ind. Eng. 152, 107033 (2021). https://doi.org/10.1016/j.cie.2020.107033

    Article  Google Scholar 

  • Radlinski, F., Kurup, M., Joachims, T.: How does clickthrough data reflect retrieval quality? In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 43–52 (2008)

  • Ren, X., Yin, H., Chen, T., et al.: CRSAL: conversational recommender systems with adversarial learning. ACM Trans. Inf. Syst. 38(4), 1–40 (2020)

    Article  Google Scholar 

  • Rendle, S.: Factorization machines. In: 2010 IEEE International Conference on Data Mining, IEEE, pp. 995–1000 (2010)

  • Rumelhart, D. E., Hinton, G. E., Williams, R. J.: Learning internal representations by error propagation. Tech. rep., California Univ San Diego La Jolla Inst for Cognitive Science (1985)

  • Sadigh, D., Dragan, A. D., Sastry, S., et al.: Active preference-based learning of reward functions. In: Robotics: Science and Systems (2017)

  • Saha, A., Gopalan, A.: Combinatorial bandits with relative feedback. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., et al.: (eds) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 983–993 (2019) https://proceedings.neurips.cc/paper/2019/hash/5e388103a391daabe3de1d76a6739ccd-Abstract.html

  • Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In: Platt, J. C., Koller, D., Singer, Y., et al.: (eds) Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 3-6, 2007. Curran Associates, Inc., pp. 1257–1264 (2007) https://proceedings.neurips.cc/paper/2007/hash/d7322ed717dedf1eb4e6e52a37ea7bcd-Abstract.html

  • Sui, Y., Zoghi, M., Hofmann, K., et al.: Advancements in dueling bandits. In: Lang J (ed) Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden. ijcai.org, pp. 5502–5510 (2018) https://doi.org/10.24963/ijcai.2018/776

  • Sun, Y., Zhang, Y.: Conversational recommender system. In: Collins-Thompson K, Mei Q, Davison BD, et al (eds) The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08-12, 2018, pp. 235–244. ACM (2018) https://doi.org/10.1145/3209978.3210002

  • Tucker, M., Novoseller, E., Kann, C., et al.: Preference-based learning for exoskeleton gait optimization. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 2351–2357. IEEE(2020)

  • Wang, Z., Liu, X., Li, S., et al.: Efficient explorative key-term selection strategies for conversational contextual bandits. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10288–10295 (2023)

  • Wang, Z., Xu, Q., Ma, K., et al.: Adversarial preference learning with pairwise comparisons. In: Proceedings of the 27th ACM International Conference on Multimedia. Association for Computing Machinery, New York, MM ’19, pp. 656–664, (2019) https://doi.org/10.1145/3343031.3350919

  • Wirth, C., Akrour, R., Neumann, G., et al.: A survey of preference-based reinforcement learning methods. J. Mach. Learn. Res. 18(136), 1–46 (2017)

    MathSciNet  Google Scholar 

  • Wu, J., Zhao, C., Yu, T., et al.: Clustering of Conversational Bandits for User Preference Learning and Elicitation, Association for Computing Machinery, New York, pp. 2129–2139 (2021) https://doi.org/10.1145/3459637.3482328

  • Xia, Y., Wu, J., Yu, T., et al.: User-regulation deconfounded conversational recommender system with bandit feedback. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, KDD ’23, pp. 2694–2704 (2023) https://doi.org/10.1145/3580305.3599539

  • Xie, Z., Yu, T., Zhao, C., et al.: Comparison-based conversational recommender system with relative bandit feedback. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, pp. 1400–1409 (2021) https://doi.org/10.1145/3404835.3462920

  • Xu, Y., Balakrishnan, S., Singh, A., et al.: Regression with comparisons: Escaping the curse of dimensionality with ordinal information. J. Mach. Learn. Res. 21(162), 1–54 (2020)

    MathSciNet  Google Scholar 

  • Ye, P., Doermann, D.: Combining preference and absolute judgements in a crowd-sourced setting. In: ICML Workshop, Citeseer, pp. 1–7 (2013)

  • Yu, T., Shen, Y., Jin, H.: A visual dialog augmented interactive recommender system. In: Teredesai, A., Kumar, V., Li, Y., et al. (eds.) Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019, pp. 157–165. ACM (2019) https://doi.org/10.1145/3292500.3330991

  • Yue, Y., Joachims, T.: Interactively optimizing information retrieval systems as a dueling bandits problem. In: Danyluk, A.P., Bottou, L., Littman, M.L. (eds.) Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009, ACM International Conference Proceeding Series, vol 382, pp. 1201–1208. ACM (2009) https://doi.org/10.1145/1553374.1553527

  • Zamani, H., Dumais, S., Craswell, N., et al.: Generating clarifying questions for information retrieval. In: Proceedings of The Web Conference 2020. Association for Computing Machinery, New York, WWW ’20, pp. 418–428 (2020) https://doi.org/10.1145/3366423.3380126

  • Zhang, Y., Chen, X., Ai, Q., et al.: Towards conversational search and recommendation: System ask, user respond. In: Cuzzocrea, A., Allan, J., Paton, N.W., et al. (eds.) Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, Torino, Italy, October 22-26, 2018, pp. 177–186. ACM (2018) https://doi.org/10.1145/3269206.3271776

  • Zhang, X., Xie, H., Li, H., et al.: Conversational contextual bandit: Algorithm and application. In: Huang, Y., King, I., Liu, T., et al (eds) WWW ’20: The Web Conference 2020, Taipei, Taiwan, April 20-24, 2020. ACM / IW3C2, pp. 662–672 (2020) https://doi.org/10.1145/3366423.3380148

  • Zhang, R., Yu, T., Shen, Y., et al.: Text-based interactive recommendation via constraint-augmented reinforcement learning. Adv. Neural Inf. Process. Syst. 32 (2019)

  • Zhao, C., Yu, T., Xie, Z., et al.: Knowledge-aware conversational preference elicitation with bandit feedback. In: Proceedings of the ACM Web Conference 2022. Association for Computing Machinery, New York, WWW ’22, pp. 483–492 (2022) https://doi.org/10.1145/3485447.3512152

  • Zheng, Z., Zha, H., Zhang, T., et al.: A general boosting method and its application to learning ranking functions for web search. In: Platt, J., Koller, D., Singer, Y., et al. (eds.) Advances in Neural Information Processing Systems, vol 20. Curran Associates, Inc., (2007) https://proceedings.neurips.cc/paper/2007/file/8d317bdcf4aafcfc22149d77babee96d-Paper.pdf

  • Zhou, C., Jin, Y., Wang, X., et al.: Conversational music recommendation based on bandits. In: 2020 IEEE International Conference on Knowledge Graph (ICKG), IEEE, pp. 41–48 (2020a)

  • Zhou, K., Zhao, W. X., Bian, S., et al.: Improving conversational recommender systems via knowledge graph based semantic fusion. In: Gupta, R., Liu, Y., Tang, J., et al. (eds.) KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, August 23-27, 2020, pp. 1006–1014. ACM (2020b) https://dl.acm.org/doi/10.1145/3394486.3403143

  • Zuo, J., Hu, S., Yu, T., et al.: Hierarchical conversational preference elicitation with bandit feedback. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pp. 2827–2836 (2022)

Download references

Acknowledgements

The corresponding author Shuai Li is supported by National Natural Science Foundation of China (62376154, 62006151).

Author information

Authors and Affiliations

Authors

Contributions

SL, TY conceived and designed the algorithms. YX, ZX and CZ performed the experiments under the supervision of SL, TY. All authors jointly wrote and reviewed the manuscript.

Corresponding author

Correspondence to Shuai Li.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is an extended version of our earlier work (Xie et al. 2021) appeared as a conference paper in the proceedings of SIGIR 2021.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xia, Y., Xie, Z., Yu, T. et al. Toward joint utilization of absolute and relative bandit feedback for conversational recommendation. User Model User-Adap Inter (2024). https://doi.org/10.1007/s11257-023-09388-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11257-023-09388-5

Keywords

Navigation