Advertisement

A Markovian Approach to Evaluate Session-Based IR Systems

  • David van Dijk
  • Marco Ferrante
  • Nicola Ferro
  • Evangelos KanoulasEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11437)

Abstract

We investigate a new approach for evaluating session-based information retrieval systems, based on Markov chains. In particular, we develop a new family of evaluation measures, inspired by random walks, which account for the probability of moving to the next and previous documents in a result list, to the next query in a session, and to the end of the session. We leverage this Markov chain to substitute what in existing measures is a fixed discount linked to the rank of a document or to the position of a query in a session with a stochastic average time to reach a document and the probability of actually reaching a given query. We experimentally compare our new family of measures with existing measures – namely, session DCG, Cube Test, and Expected Utility – over the TREC Dynamic Domain track, showing the flexibility of the proposed measures and the transparency in modeling the user dynamics.

Keywords

Information retrieval Evaluation Sessions Markov chains 

Notes

Acknowledgements

This research was supported by the NWO Innovational Research Incentives Scheme Vidi (016.Vidi.189.039). All content represents the opinion of the authors, which is not necessarily shared or endorsed by their respective employers and/or sponsors.

Supplementary material

482050_1_En_40_MOESM1_ESM.pdf (192 kb)
Supplementary material 1 (pdf 192 KB)

References

  1. 1.
    Allan, J., et al.: Research frontiers in information retrieval - report from the third strategic workshop on information retrieval in Lorne (SWIRL 2018). SIGIR Forum 52(1), 34–90 (2018)CrossRefGoogle Scholar
  2. 2.
    Amigó, E., Fang, H., Mizzaro, S., Zhai, C.: Report on the SIGIR 2017 workshop on axiomatic thinking for information retrieval and related tasks (ATIR). SIGIR Forum 51(3), 99–106 (2017)CrossRefGoogle Scholar
  3. 3.
    Busin, L., Mizzaro, S.: Axiometrics: an axiomatic approach to information retrieval effectiveness metrics. In: Kurland, O., Metzler, D., Lioma, C., Larsen, B., Ingwersen, P. (eds.) Proceedings of the 4th International Conference on the Theory of Information Retrieval (ICTIR 2013), pp. 22–29. ACM Press, New York (2013)Google Scholar
  4. 4.
    Carterette, B.: System effectiveness, user models, and user utility: a conceptual framework for investigation. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 903–912. ACM, New York (2011).  https://doi.org/10.1145/2009916.2010037
  5. 5.
    Chapelle, O., Metzler, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for graded relevance. In: Cheung, D.W.L., Song, I.Y., Chu, W.W., Hu, X., Lin, J.J. (eds.) Proceedings of the 18th International Conference on Information and Knowledge Management (CIKM 2009), pp. 621–630. ACM Press, New York (2009)Google Scholar
  6. 6.
    Chierichetti, F., Kumar, R., Raghavan, P.: Optimizing two-dimensional search results presentation. In: King, I., Nejdl, W., Li, H. (eds.) Proceedings of the 4th ACM International Conference on Web Searching and Data Mining (WSDM 2011), pp. 257–266. ACM Press, New York (2011)Google Scholar
  7. 7.
    Collins-Thompson, K., Callan, J.: Query expansion using random walk models. In: Herzog, O., Schek, H.J., Fuhr, N., Chowdhury, A., Teiken, W. (eds.) Proceedings of 14th International Conference on Information and Knowledge Management (CIKM 2005), pp. 704–711. ACM Press, New York (2005)Google Scholar
  8. 8.
    Daniłowicz, C., Baliński, J.: Document ranking based upon Markov chains. Inf. Process. Manag. 37(4), 623–637 (2001)CrossRefGoogle Scholar
  9. 9.
    Ferrante, M., Ferro, N., Pontarollo, S.: A general theory of IR evaluation measures. IEEE Trans. Knowl. Data Eng. (TKDE). 31(3), 409–422 (2019)CrossRefGoogle Scholar
  10. 10.
    Ferrante, M., Ferro, N., Maistro, M.: Injecting user models and time into precision via Markov chains. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, pp. 597–606. ACM, New York (2014).  https://doi.org/10.1145/2600428.2609637
  11. 11.
    Ferro, N.: What does affect the correlation among evaluation measures? ACM Trans. Inf. Syst. (TOIS) 36(2), 19:1–19:40 (2017)CrossRefGoogle Scholar
  12. 12.
    Fuhr, N.: Salton award lecture: information retrieval as engineering science. SIGIR Forum 46(2), 19–28 (2012)CrossRefGoogle Scholar
  13. 13.
    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)CrossRefGoogle Scholar
  14. 14.
    Järvelin, K., Price, S.L., Delcambre, L.M.L., Nielsen, M.L.: Discounted cumulated gain based evaluation of multiple-query IR sessions. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 4–15. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-78646-7_4. http://dl.acm.org/citation.cfm?id=1793274.1793280CrossRefGoogle Scholar
  15. 15.
    Kanoulas, E., Carterette, B., Clough, P.D., Sanderson, M.: Evaluating multi-query sessions. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 1053–1062. ACM, New York (2011).  https://doi.org/10.1145/2009916.2010056
  16. 16.
    Kendall, M.G.: Rank Correlation Methods. Griffin, Oxford (1948)zbMATHGoogle Scholar
  17. 17.
    Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for information retrieval. In: Kraft, D.H., Croft, W.B., Harper, D.J., Zobel, J. (eds.) Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001), pp. 111–119. ACM Press, New York (2001)Google Scholar
  18. 18.
    Liu, M., Liu, Y., Mao, J., Luo, C., Ma, S.: Towards designing better session search evaluation metrics. In: Collins-Thompson, K., Mei, Q., Davison, B., Liu, Y., Yilmaz, E. (eds.) Proceedings of the 41th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018), pp. 1121–1124. ACM Press, New York (2018)Google Scholar
  19. 19.
    Luo, J., Wing, C., Yang, H., Hearst, M.: The water filling model and the cube test: multi-dimensional evaluation for professional search. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, pp. 709–714. ACM, New York (2013).  https://doi.org/10.1145/2505515.2523648
  20. 20.
    Maxwell, K.T., Croft, W.B.: Compact query term selection using topically related text. In: Jones, G.J.F., Sheridan, P., Kelly, D., de Rijke, M., Sakai, T. (eds.) Proceedings of the 36th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2013), pp. 583–592. ACM Press, New York (2013)Google Scholar
  21. 21.
    Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. 27(1), 2:1–2:27 (2008).  https://doi.org/10.1145/1416950.1416952CrossRefGoogle Scholar
  22. 22.
    Smucker, M.D., Clarke, C.L.A.: Stochastic simulation of time-biased gain. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 2040–2044. ACM, New York (2012).  https://doi.org/10.1145/2396761.2398568
  23. 23.
    Smucker, M.D., Clarke, C.L.: Time-based calibration of effectiveness measures. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 95–104. ACM, New York (2012).  https://doi.org/10.1145/2348283.2348300
  24. 24.
    Tang, Z., Yang, G.H.: Investigating per topic upper bound for session search evaluation. In: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2017, pp. 185–192. ACM, New York (2017).  https://doi.org/10.1145/3121050.3121069
  25. 25.
    Yan, X., Gao, G., Su, X., Wei, H., Zhang, X., Lu, Q.: Hidden Markov model for term weighting in verbose queries. In: Catarci, T., Forner, P., Hiemstra, D., Peñas, A., Santucci, G. (eds.) CLEF 2012. LNCS, vol. 7488, pp. 82–87. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33247-0_10CrossRefGoogle Scholar
  26. 26.
    Yang, H., Frank, J., Soboroff, I.: TREC 2015 dynamic domain track overview. In: The Twenty-Forth Text REtrieval Conference (TREC 2015) Proceedings, Gaithersburg, Maryland (2016)Google Scholar
  27. 27.
    Yang, G.H., Soboroff, I.: TREC 2016 dynamic domain track overview. In: Proceedings of The Twenty-Fifth Text REtrieval Conference, TREC 2016, Gaithersburg, 15–18 November 2016Google Scholar
  28. 28.
    Yang, Y., Lad, A.: Modeling expected utility of multi-session information distillation. In: Azzopardi, L., et al. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 164–175. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-04417-5_15CrossRefGoogle Scholar
  29. 29.
    Yilmaz, E., Aslam, J.A., Robertson, S.E.: A new rank correlation coefficient for information retrieval. In: Chua, T.S., Leong, M.K., Oard, D.W., Sebastiani, F. (eds.) Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), pp. 587–594. ACM Press, New York (2008)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of AmsterdamAmsterdamThe Netherlands
  2. 2.University of PaduaPaduaItaly

Personalised recommendations