Skip to main content

Groupwise Query Performance Prediction with BERT

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13186))

Included in the following conference series:

Abstract

While large-scale pre-trained language models like BERT have advanced the state-of-the-art in IR, its application in query performance prediction (QPP) is so far based on pointwise modeling of individual queries. Meanwhile, recent studies suggest that the cross-attention modeling of a group of documents can effectively boost performances for both learning-to-rank algorithms and BERT-based re-ranking. To this end, a BERT-based groupwise QPP model is proposed, in which the ranking contexts of a list of queries are jointly modeled to predict the relative performance of individual queries. Extensive experiments on three standard TREC collections showcase effectiveness of our approach. Our code is available at https://github.com/VerdureChen/Group-QPP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ai, Q., Bi, K., Guo, J., Croft, W.B.: Learning a deep listwise context model for ranking refinement. In: SIGIR, pp. 135–144. ACM (2018)

    Google Scholar 

  2. Ai, Q., Wang, X., Bruch, S., Golbandi, N., Bendersky, M., Najork, M.: Learning groupwise multivariate scoring functions using deep neural networks. In: ICTIR, pp. 85–92. ACM (2019)

    Google Scholar 

  3. Arabzadeh, N., Bigdeli, A., Zihayat, M., Bagheri, E.: Query Performance Prediction Through Retrieval Coherency. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12657, pp. 193–200. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_15

    Chapter  Google Scholar 

  4. Arabzadeh, N., Khodabakhsh, M., Bagheri, E.: BERT-QPP: contextualized pre-trained transformers for query performance prediction. In: Demartini, G., Zuccon, G., Culpepper, J.S., Huang, Z., Tong, H. (eds.) CIKM 2021: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, 1–5 November 2021, pp. 2857–2861. ACM (2021). https://doi.org/10.1145/3459637.3482063

  5. Arabzadeh, N., Zarrinkalam, F., Jovanovic, J., Al-Obeidat, F.N., Bagheri, E.: Neural embedding-based specificity metrics for pre-retrieval query performance prediction. Inf. Process. Manag. 57(4), 102248 (2020)

    Google Scholar 

  6. Arabzadeh, N., Zarrinkalam, F., Jovanovic, J., Bagheri, E.: Neural embedding-based metrics for pre-retrieval query performance prediction. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 78–85. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_10

    Chapter  Google Scholar 

  7. Aslam, J.A., Pavlu, V.: Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 198–209. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71496-5_20

    Chapter  Google Scholar 

  8. Chen, X., Hui, K., He, B., Han, X., Sun, L., Ye, Z.: Co-bert: a context-aware BERT retrieval model incorporating local and query-specific context. CoRR abs/2104.08523 (2021). https://arxiv.org/abs/2104.08523

  9. Chen, Z., Eickhoff, C.: Poolrank: Max/min pooling-based ranking loss for listwise learning & ranking balance. CoRR abs/2108.03586 (2021). https://arxiv.org/abs/2108.03586

  10. Chifu, A., Laporte, L., Mothe, J., Ullah, M.Z.: Query performance prediction focused on summarized letor features. In: Collins-Thompson, K., Mei, Q., Davison, B.D., Liu, Y., Yilmaz, E. (eds.) The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, 08–12 July 2018, pp. 1177–1180. ACM (2018). https://doi.org/10.1145/3209978.3210121

  11. Clarke, C.L.A., Craswell, N., Soboroff, I.: Overview of the TREC 2004 terabyte track. In: Proceedings of the Thirteenth Text REtrieval Conference. NIST Special Publication, vol. 500–261, pp. 1–9. National Institute of Standards and Technology (2004)

    Google Scholar 

  12. Clarke, C.L.A., Craswell, N., Soboroff, I.: Overview of the TREC 2009 web track. In: Voorhees, E.M., Buckland, L.P. (eds.) Proceedings of The Eighteenth Text REtrieval Conference, TREC 2009, Gaithersburg, Maryland, USA, 17–20 November 2009. NIST Special Publication, vol. 500–278. National Institute of Standards and Technology (NIST) (2009). http://trec.nist.gov/pubs/trec18/papers/WEB09.OVERVIEW.pdf

  13. Craswell, N., Mitra, B., Yilmaz, E., Campos, D.: Overview of the TREC 2020 deep learning track. CoRR abs/2102.07662 (2021). https://arxiv.org/abs/2102.07662

  14. Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.M.: Overview of the TREC 2019 deep learning track. CoRR abs/2003.07820 (2020). https://arxiv.org/abs/2003.07820

  15. Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Järvelin, K., Beaulieu, M., Baeza-Yates, R.A., Myaeng, S. (eds.) SIGIR 2002: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 11–15 August 2002, Tampere, Finland, pp. 299–306. ACM (2002). https://doi.org/10.1145/564376.564429

  16. Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Precision prediction based on ranked list coherence. Inf. Retr. 9(6), 723–755 (2006)

    Google Scholar 

  17. Cummins, R., Jose, J.M., O’Riordan, C.: Improved query performance prediction using standard deviation. In: Ma, W., Nie, J., Baeza-Yates, R., Chua, T., Croft, W.B. (eds.) Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, 25–29 July 2011, pp. 1089–1090. ACM (2011). https://doi.org/10.1145/2009916.2010063

  18. Déjean, S., Ionescu, R.T., Mothe, J., Ullah, M.Z.: Forward and backward feature selection for query performance prediction. In: Hung, C., Cerný, T., Shin, D., Bechini, A. (eds.) SAC 2020: The 35th ACM/SIGAPP Symposium on Applied Computing, online event, [Brno, Czech Republic], March 30 - April 3, 2020, pp. 690–697. ACM (2020). https://doi.org/10.1145/3341105.3373904

  19. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423

  20. Diaz, F.: Performance prediction using spatial autocorrelation. In: Kraaij, W., de Vries, A.P., Clarke, C.L.A., Fuhr, N., Kando, N. (eds.) SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, 23–27 July 2007, pp. 583–590. ACM (2007). https://doi.org/10.1145/1277741.1277841

  21. Faggioli, G., Zendel, O., Culpepper, J.S., Ferro, N., Scholer, F.: An enhanced evaluation framework for query performance prediction. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12656, pp. 115–129. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72113-8_8

    Chapter  Google Scholar 

  22. Hashemi, H., Zamani, H., Croft, W.B.: Performance prediction for non-factoid question answering. In: Fang, Y., Zhang, Y., Allan, J., Balog, K., Carterette, B., Guo, J. (eds.) Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2019, Santa Clara, CA, USA, 2–5 October 2019, pp. 55–58. ACM (2019). https://doi.org/10.1145/3341981.3344249

  23. He, B., Ounis, I.: Query performance prediction. Inf. Syst. 31(7), 585–594 (2006)

    Google Scholar 

  24. He, J., Larson, M., de Rijke, M.: Using coherence-based measures to predict query difficulty. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 689–694. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_80

    Chapter  Google Scholar 

  25. Khodabakhsh, M., Bagheri, E.: Semantics-enabled query performance prediction for ad hoc table retrieval. Inf. Process. Manag. 58(1), 102399 (2021)

    Google Scholar 

  26. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, pp. 1–15 (2015)

    Google Scholar 

  27. Krikon, E., Carmel, D., Kurland, O.: Predicting the performance of passage retrieval for question answering. In: Chen, X., Lebanon, G., Wang, H., Zaki, M.J. (eds.) 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, Maui, HI, USA, October 29 - 02 November 2012, pp. 2451–2454. ACM (2012). https://doi.org/10.1145/2396761.2398664

  28. Kurland, O., Shtok, A., Carmel, D., Hummel, S.: A unified framework for post-retrieval query-performance prediction. In: Amati, G., Crestani, F. (eds.) ICTIR 2011. LNCS, vol. 6931, pp. 15–26. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23318-0_4

    Chapter  Google Scholar 

  29. Mothe, J., Tanguy, L.: Linguistic features to predict query difficulty. In: SIGIR 2005 (2005)

    Google Scholar 

  30. Nguyen, T., et al.: MS MARCO: A human generated machine reading comprehension dataset. CoRR abs/1611.09268 (2016). http://arxiv.org/abs/1611.09268

  31. Nogueira, R., Cho, K.: Passage re-ranking with BERT. CoRR abs/1901.04085 (2019)

    Google Scholar 

  32. Pang, L., Xu, J., Ai, Q., Lan, Y., Cheng, X., Wen, J.: Setrank: learning a permutation-invariant ranking model for information retrieval. In: SIGIR, pp. 499–508. ACM (2020)

    Google Scholar 

  33. Pasumarthi, R.K., Wang, X., Bendersky, M., Najork, M.: Self-attentive document interaction networks for permutation equivariant ranking. CoRR abs/1910.09676 (2019)

    Google Scholar 

  34. Pérez-Iglesias, J., Araujo, L.: Standard deviation as a query hardness estimator. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 207–212. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16321-0_21

    Chapter  Google Scholar 

  35. Raiber, F., Kurland, O.: Query-performance prediction: setting the expectations straight. In: Geva, S., Trotman, A., Bruza, P., Clarke, C.L.A., Järvelin, K. (eds.) The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, Gold Coast, QLD, Australia - 06–11 July 2014, pp. 13–22. ACM (2014). https://doi.org/10.1145/2600428.2609581

  36. Raviv, H., Kurland, O., Carmel, D.: Query performance prediction for entity retrieval. In: Geva, S., Trotman, A., Bruza, P., Clarke, C.L.A., Järvelin, K. (eds.) The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, Gold Coast, QLD, Australia - 06–11 July 2014, pp. 1099–1102. ACM (2014). https://doi.org/10.1145/2600428.2609519

  37. google research: GitHub - google-research/bert: TensorFlow code and pre-trained models for BERT. https://github.com/google-research/bert

  38. Roitman, H.: An enhanced approach to query performance prediction using reference lists. In: Kando, N., Sakai, T., Joho, H., Li, H., de Vries, A.P., White, R.W. (eds.) Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, 7–11 August 2017, pp. 869–872. ACM (2017). https://doi.org/10.1145/3077136.3080665

  39. Roitman, H.: ICTIR tutorial: Modern query performance prediction: Theory and practice. In: Balog, K., Setty, V., Lioma, C., Liu, Y., Zhang, M., Berberich, K. (eds.) ICTIR 2020: The 2020 ACM SIGIR International Conference on the Theory of Information Retrieval, Virtual Event, Norway, 14–17 September 2020, pp. 195–196. ACM (2020). https://dl.acm.org/doi/10.1145/3409256.3409813

  40. Roitman, H., Erera, S., Shalom, O.S., Weiner, B.: Enhanced mean retrieval score estimation for query performance prediction. In: Kamps, J., Kanoulas, E., de Rijke, M., Fang, H., Yilmaz, E. (eds.) Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2017, Amsterdam, The Netherlands, 1–4 October 2017, pp. 35–42. ACM (2017). https://doi.org/10.1145/3121050.3121051

  41. Roitman, H., Erera, S., Weiner, B.: Robust standard deviation estimation for query performance prediction. In: Kamps, J., Kanoulas, E., de Rijke, M., Fang, H., Yilmaz, E. (eds.) Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2017, Amsterdam, The Netherlands, 1–4 October 2017, pp. 245–248. ACM (2017). https://doi.org/10.1145/3121050.3121087

  42. Roitman, H., Kurland, O.: Query performance prediction for pseudo-feedback-based retrieval. In: Piwowarski, B., Chevalier, M., Gaussier, É., Maarek, Y., Nie, J., Scholer, F. (eds.) Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, 21–25 July 2019, pp. 1261–1264. ACM (2019). https://doi.org/10.1145/3331184.3331369

  43. Roitman, H., Mass, Y., Feigenblat, G., Shraga, R.: Query performance prediction for multifield document retrieval. In: Balog, K., Setty, V., Lioma, C., Liu, Y., Zhang, M., Berberich, K. (eds.) ICTIR 2020: The 2020 ACM SIGIR International Conference on the Theory of Information Retrieval, Virtual Event, Norway, 14–17 September 2020, pp. 49–52. ACM (2020). https://dl.acm.org/doi/10.1145/3409256.3409821

  44. Shtok, A., Kurland, O., Carmel, D.: Predicting query performance by query-drift estimation. In: Azzopardi, L., et al. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 305–312. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04417-5_30

    Chapter  Google Scholar 

  45. Shtok, A., Kurland, O., Carmel, D.: Using statistical decision theory and relevance models for query-performance prediction. In: Crestani, F., Marchand-Maillet, S., Chen, H., Efthimiadis, E.N., Savoy, J. (eds.) Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, Geneva, Switzerland, 19–23 July 2010, pp. 259–266. ACM (2010). https://doi.org/10.1145/1835449.1835494

  46. Shtok, A., Kurland, O., Carmel, D.: Query performance prediction using reference lists. ACM Trans. Inf. Syst. 34(4), 19:1–19:34 (2016). https://doi.org/10.1145/2926790

  47. Tao, Y., Wu, S.: Query performance prediction by considering score magnitude and variance together. In: Li, J., Wang, X.S., Garofalakis, M.N., Soboroff, I., Suel, T., Wang, M. (eds.) Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, 3–7 November 2014, pp. 1891–1894. ACM (2014). https://doi.org/10.1145/2661829.2661906

  48. Vinay, V., Cox, I.J., Milic-Frayling, N., Wood, K.R.: On ranking the effectiveness of searches. In: Efthimiadis, E.N., Dumais, S.T., Hawking, D., Järvelin, K. (eds.) SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, 6–11 August 2006, pp. 398–404. ACM (2006). https://doi.org/10.1145/1148170.1148239

  49. Voorhees, E.M.: Overview of the TREC 2004 robust track. In: Proceedings of the Thirteenth Text REtrieval Conference. NIST Special Publication, vol. 500–261, pp. 1–10. National Institute of Standards and Technology (2004)

    Google Scholar 

  50. Yang, P., Fang, H., Lin, J.: Anserini: enabling the use of lucene for information retrieval research. In: Kando, N., Sakai, T., Joho, H., Li, H., de Vries, A.P., White, R.W. (eds.) Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, 7–11 August 2017, pp. 1253–1256. ACM (2017). https://doi.org/10.1145/3077136.3080721

  51. Zamani, H., Croft, W.B., Culpepper, J.S.: Neural query performance prediction using weak supervision from multiple signals. In: Collins-Thompson, K., Mei, Q., Davison, B.D., Liu, Y., Yilmaz, E. (eds.) The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, 08–12 July 2018, pp. 105–114. ACM (2018). https://doi.org/10.1145/3209978.3210041

  52. Zendel, O., Culpepper, J.S., Scholer, F.: Is query performance prediction with multiple query variations harder than topic performance prediction? In: Diaz, F., Shah, C., Suel, T., Castells, P., Jones, R., Sakai, T. (eds.) SIGIR 2021: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 11–15 July 2021, pp. 1713–1717. ACM (2021). https://doi.org/10.1145/3404835.3463039

  53. Zendel, O., Shtok, A., Raiber, F., Kurland, O., Culpepper, J.S.: Information needs, queries, and query performance prediction. In: Piwowarski, B., Chevalier, M., Gaussier, É., Maarek, Y., Nie, J., Scholer, F. (eds.) Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, 21–25 July 2019, pp. 395–404. ACM (2019). https://doi.org/10.1145/3331184.3331253

  54. Zhou, Y., Croft, W.B.: Ranking robustness: a novel framework to predict query performance. In: Yu, P.S., Tsotras, V.J., Fox, E.A., Liu, B. (eds.) Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management, Arlington, Virginia, USA, November 6–11, 2006, pp. 567–574. ACM (2006). https://doi.org/10.1145/1183614.1183696

  55. Zhou, Y., Croft, W.B.: Query performance prediction in web search environments. In: Kraaij, W., de Vries, A.P., Clarke, C.L.A., Fuhr, N., Kando, N. (eds.) SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, 23–27 July 2007, pp. 543–550. ACM (2007). https://doi.org/10.1145/1277741.1277835

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ben He .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, X., He, B., Sun, L. (2022). Groupwise Query Performance Prediction with BERT. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13186. Springer, Cham. https://doi.org/10.1007/978-3-030-99739-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-99739-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99738-0

  • Online ISBN: 978-3-030-99739-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics