Designing States, Actions, and Rewards for Using POMDP in Session Search

  • Jiyun Luo
  • Sicong Zhang
  • Xuchu Dong
  • Hui Yang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9022)


Session search is an information retrieval task that involves a sequence of queries for a complex information need. It is characterized by rich user-system interactions and temporal dependency between queries and between consecutive user behaviors. Recent efforts have been made in modeling session search using the Partially Observable Markov Decision Process (POMDP). To best utilize the POMDP model, it is crucial to find suitable definitions for its fundamental elements – States, Actions and Rewards. This paper investigates the best ways to design the states, actions, and rewards within a POMDP framework. We lay out available design options of these major components based on a variety of related work and experiment on combinations of these options over the TREC 2012 & 2013 Session datasets. We report our findings based on two evaluation aspects, retrieval accuracy and efficiency, and recommend practical design choices for using POMDP in session search.


Session Search POMDP State Action Reward 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bellman, R.: Dynamic Programming. Princeton University Press (1957)Google Scholar
  2. 2.
    Chilton, L.B., Teevan, J.: Addressing people’s information needs directly in a web search result page. In: WWW 2011, pp. 27–36Google Scholar
  3. 3.
    Cormack, G.V., Smucker, M.D., Clarke, C.L.: Efficient and effective spam filtering and re-ranking for large web datasets. Inf. Retr. 14(5), 441–465 (2011)CrossRefGoogle Scholar
  4. 4.
    Fox, S., Karnawat, K., Mydland, M., Dumais, S., White, T.: Evaluating implicit measures to improve web search. ACM Trans. Inf. Syst. 23(2), 147–168Google Scholar
  5. 5.
    Guan, D., Zhang, S., Yang, H.: Utilizing query change for session search. In: SIGIR 2013, pp. 453–462 (2013)Google Scholar
  6. 6.
    Hofmann, K., Whiteson, S., de Rijke, M.: Balancing exploration and exploitation in learning to rank online. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 251–263. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  7. 7.
    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4) (October 2002)Google Scholar
  8. 8.
    Jin, X., Sloan, M., Wang, J.: Interactive exploratory search for multi page search results. In: WWW 2013, pp. 655–666 (2013)Google Scholar
  9. 9.
    Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: ICML 1997, pp. 143–151 (1997)Google Scholar
  10. 10.
    Kaelbling, L., Littman, M., Cassandra, A.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101(1-2), 99–134 (1998)CrossRefzbMATHMathSciNetGoogle Scholar
  11. 11.
    Kanoulas, E., Carterette, B., Hall, M., Clough, P., Sanderson, M.: Overview of the trec 2012 session track. In: TREC 2012 (2012)Google Scholar
  12. 12.
    Kanoulas, E., Carterette, B., Hall, M., Clough, P., Sanderson, M.: Overview of the trec, session track. In: TREC 2013 (2013)Google Scholar
  13. 13.
    Littman, M.L.: The witness algorithm: Solving partially observable Markov decision processes. Technical report, Providence, RI, USA (1994)Google Scholar
  14. 14.
    Luo, J., Zhang, S., Yang, H.: Win-win search: Dual-agent stochastic game in session search. In: SIGIR 2014 (2014)Google Scholar
  15. 15.
    Norris, J.R.: Markov Chains. Cambridge University Press (1998)Google Scholar
  16. 16.
    Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009)CrossRefGoogle Scholar
  17. 17.
    Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. Readings in Information Retrieval 24, 5 (1997)Google Scholar
  18. 18.
    Shen, X., Tan, B., Zhai, C.: Implicit user modeling for personalized search. In: CIKM 2005, pp. 824–831 (2005)Google Scholar
  19. 19.
    Sondik, E.: The optimal control of partially observable markov processes over the infinite horizon: Discounted cost. Operations Research 26(2), 282–304 (1978)CrossRefzbMATHMathSciNetGoogle Scholar
  20. 20.
    Yuan, S., Wang, J.: Sequential selection of correlated ads by POMDPs. In: CIKM 2012, pp. 515–524 (2012)Google Scholar
  21. 21.
    Zhai, C., Lafferty, J.: Two-stage language models for information retrieval. In: SIGIR 2002, pp. 49–56 (2002)Google Scholar
  22. 22.
    Zhang, S., Luo, J., Yang, H.: A POMDP model for content-free document re-ranking. In: SIGIR 2014 (2014 )Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Jiyun Luo
    • 1
  • Sicong Zhang
    • 1
  • Xuchu Dong
    • 1
  • Hui Yang
    • 1
  1. 1.Department of Computer ScienceGeorgetown UniversityWashington DCUSA

Personalised recommendations