Improving Search Through A3C Reinforcement Learning Based Conversational Agent

  • Milan AggarwalEmail author
  • Aarushi Arora
  • Shagun Sodhani
  • Balaji Krishnamurthy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10861)


We develop a reinforcement learning based search assistant which can assist users through a sequence of actions to enable them realize their intent. Our approach caters to subjective search where user is seeking digital assets such as images which is fundamentally different from the tasks which have objective and limited search modalities. Labeled conversational data is generally not available in such search tasks, to counter this problem we propose a stochastic virtual user which impersonates a real user for training and obtaining bootstrapped agent. We develop A3C algorithm based context preserving architecture to train agent and evaluate performance on average rewards obtained by the agent while interacting with virtual user. We evaluated our system with actual humans who believed that it helped in driving their search forward with appropriate actions without being repetitive while being more engaging and easy to use compared to conventional search interface.


Subjective search Reinforcement learning Virtual user model Context aggregation 


  1. 1.
    El Asri, L., He, J., Suleman, K.: A sequence-to-sequence model for user simulation in spoken dialogue systems. arXiv preprint arXiv:1607.00070 (2016)
  2. 2.
    Bachman, P., Sordoni, A., Trischler, A.: Towards information-seeking agents. arXiv preprint arXiv:1612.02605 (2016)
  3. 3.
    Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing. NATO ASI Series, vol. 68, pp. 227–236. Springer, Heidelberg (1990). Scholar
  4. 4.
    Cuayáhuitl, H.: SimpleDS: a simple deep reinforcement learning dialogue system. In: Jokinen, K., Wilcock, G. (eds.) Dialogues with Social Robots. LNEE, vol. 999, pp. 109–118. Springer, Singapore (2017). Scholar
  5. 5.
    Cuayhuitl, H., Dethlefs, N.: Spatially-aware dialogue control using hierarchical reinforcement learning. ACM Trans. Speech Lang. Process. (TSLP) 7(3), 5 (2011)Google Scholar
  6. 6.
    Deci, E.L., Koestner, R., Ryan, R.M.: A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation. Psychol. Bull. 125, 627 (1999)CrossRefGoogle Scholar
  7. 7.
    Dodge, J., Gane, A., Zhang, X., Bordes, A., Chopra, S., Miller, A., Szlam, A., Weston, J.: Evaluating prerequisite qualities for learning end-to-end dialog systems. arXiv preprint arXiv:1511.06931 (2015)
  8. 8.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  9. 9.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  10. 10.
    Levin, E., Pieraccini, R., Eckert, W.: Learning dialogue strategies within the Markov decision process framework. In: Proceedings of the 1997 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 72–79. IEEE (1997)Google Scholar
  11. 11.
    Li, J., Galley, M., Brockett, C., Spithourakis, G.P., Gao, J., Dolan, B.: A persona-based neural conversation model. arXiv preprint arXiv:1603.06155 (2016)
  12. 12.
    Li, J., Monroe, W., Ritter, A., Galley, M., Gao, J., Jurafsky, D.: Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541 (2016)
  13. 13.
    Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)Google Scholar
  14. 14.
    Narasimhan, K., Yala, A., Barzilay, R.: Improving information extraction by acquiring external evidence with reinforcement learning. arXiv preprint arXiv:1603.07954 (2016)
  15. 15.
    Nogueira, R., Cho, K.: Task-oriented query reformulation with reinforcement learning. arXiv preprint arXiv:1704.04572 (2017)
  16. 16.
    Peng, B., Li, X., Li, L., Gao, J., Celikyilmaz, A., Lee, S., Wong, K.-F.: Composite task-completion dialogue policy learning via hierarchical deep reinforcement learning. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2221–2230 (2017)Google Scholar
  17. 17.
    Shani, G., Heckerman, D., Brafman, R.I.: An MDP-based recommender system. J. Mach. Learn. Res. 6(Sep), 1265–1295 (2005)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT Press, Cambridge (1998)Google Scholar
  19. 19.
    Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000)Google Scholar
  20. 20.
    Ultes, S., Budzianowski, P., Casanueva, I., Mrkic, N., Barahona, L.R., Pei-Hao, S., Wen, T.-H., Gaic, M., Young, S.: Domain-independent user satisfaction reward estimation for dialogue policy learning. In: Proceedings of Interspeech 2017, pp. 1721–1725 (2017)Google Scholar
  21. 21.
    Vinyals, O., Le, Q.: A neural conversational model. arXiv preprint arXiv:1506.05869 (2015)
  22. 22.
    Walker, M.A., Litman, D.J., Kamm, C.A., Abella, A.: PARADISE: a framework for evaluating spoken dialogue agents. In: Proceedings of the Eighth Conference on European Chapter of the Association for Computational Linguistics, pp. 271–280. Association for Computational Linguistics (1997)Google Scholar
  23. 23.
    Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. dissertation. Kings College, Cambridge (1989)Google Scholar
  24. 24.
    Weston, J., Chopra, S., Bordes, A.: Memory networks. arXiv preprint arXiv:1410.3916 (2014)
  25. 25.
    Wunder, M., Littman, M.L., Babes, M.: Classes of multiagent Q-learning dynamics with epsilon-greedy exploration. In: Proceedings of the 27th International Conference on Machine Learning, ICML 2010, pp. 1167–1174 (2010)Google Scholar
  26. 26.
    Zhao, T., Eskenazi, M.: Towards end-to-end learning for dialog state tracking and management using deep reinforcement learning. arXiv preprint arXiv:1606.02560 (2016)
  27. 27.
    Zhu, Y., Mottaghi, R., Kolve, E., Lim, J.J., Gupta, A., Fei-Fei, L., Farhadi, A.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE International Conference on Robotics and Automation, ICRA, pp. 3357–3364. IEEE (2017)Google Scholar
  28. 28.
    Wei, J., He, J., Chen, K., Zhou, Y., Tang, Z.: Collaborative filtering and deep learning based recommendation system for cold start items. Expert Syst. Appl. 69, 29–39 (2017)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Milan Aggarwal
    • 1
    Email author
  • Aarushi Arora
    • 2
  • Shagun Sodhani
    • 1
  • Balaji Krishnamurthy
    • 1
  1. 1.Adobe Systems Inc.NoidaIndia
  2. 2.IIT DelhiHauz KhasIndia

Personalised recommendations