Groupwise Query Performance Prediction with BERT

Chen, Xiaoyang; He, Ben; Sun, Le

doi:10.1007/978-3-030-99739-7_8

Xiaoyang Chen^15,16,
Ben He^15,16 &
Le Sun¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13186))

Included in the following conference series:

European Conference on Information Retrieval

2511 Accesses
7 Citations

Abstract

While large-scale pre-trained language models like BERT have advanced the state-of-the-art in IR, its application in query performance prediction (QPP) is so far based on pointwise modeling of individual queries. Meanwhile, recent studies suggest that the cross-attention modeling of a group of documents can effectively boost performances for both learning-to-rank algorithms and BERT-based re-ranking. To this end, a BERT-based groupwise QPP model is proposed, in which the ranking contexts of a list of queries are jointly modeled to predict the relative performance of individual queries. Extensive experiments on three standard TREC collections showcase effectiveness of our approach. Our code is available at https://github.com/VerdureChen/Group-QPP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ai, Q., Bi, K., Guo, J., Croft, W.B.: Learning a deep listwise context model for ranking refinement. In: SIGIR, pp. 135–144. ACM (2018)
Google Scholar
Ai, Q., Wang, X., Bruch, S., Golbandi, N., Bendersky, M., Najork, M.: Learning groupwise multivariate scoring functions using deep neural networks. In: ICTIR, pp. 85–92. ACM (2019)
Google Scholar
Arabzadeh, N., Bigdeli, A., Zihayat, M., Bagheri, E.: Query Performance Prediction Through Retrieval Coherency. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12657, pp. 193–200. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_15
Chapter Google Scholar
Arabzadeh, N., Khodabakhsh, M., Bagheri, E.: BERT-QPP: contextualized pre-trained transformers for query performance prediction. In: Demartini, G., Zuccon, G., Culpepper, J.S., Huang, Z., Tong, H. (eds.) CIKM 2021: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, 1–5 November 2021, pp. 2857–2861. ACM (2021). https://doi.org/10.1145/3459637.3482063
Arabzadeh, N., Zarrinkalam, F., Jovanovic, J., Al-Obeidat, F.N., Bagheri, E.: Neural embedding-based specificity metrics for pre-retrieval query performance prediction. Inf. Process. Manag. 57(4), 102248 (2020)
Google Scholar
Arabzadeh, N., Zarrinkalam, F., Jovanovic, J., Bagheri, E.: Neural embedding-based metrics for pre-retrieval query performance prediction. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 78–85. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_10
Chapter Google Scholar
Aslam, J.A., Pavlu, V.: Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 198–209. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71496-5_20
Chapter Google Scholar
Chen, X., Hui, K., He, B., Han, X., Sun, L., Ye, Z.: Co-bert: a context-aware BERT retrieval model incorporating local and query-specific context. CoRR abs/2104.08523 (2021). https://arxiv.org/abs/2104.08523
Chen, Z., Eickhoff, C.: Poolrank: Max/min pooling-based ranking loss for listwise learning & ranking balance. CoRR abs/2108.03586 (2021). https://arxiv.org/abs/2108.03586
Chifu, A., Laporte, L., Mothe, J., Ullah, M.Z.: Query performance prediction focused on summarized letor features. In: Collins-Thompson, K., Mei, Q., Davison, B.D., Liu, Y., Yilmaz, E. (eds.) The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, 08–12 July 2018, pp. 1177–1180. ACM (2018). https://doi.org/10.1145/3209978.3210121
Clarke, C.L.A., Craswell, N., Soboroff, I.: Overview of the TREC 2004 terabyte track. In: Proceedings of the Thirteenth Text REtrieval Conference. NIST Special Publication, vol. 500–261, pp. 1–9. National Institute of Standards and Technology (2004)
Google Scholar
Clarke, C.L.A., Craswell, N., Soboroff, I.: Overview of the TREC 2009 web track. In: Voorhees, E.M., Buckland, L.P. (eds.) Proceedings of The Eighteenth Text REtrieval Conference, TREC 2009, Gaithersburg, Maryland, USA, 17–20 November 2009. NIST Special Publication, vol. 500–278. National Institute of Standards and Technology (NIST) (2009). http://trec.nist.gov/pubs/trec18/papers/WEB09.OVERVIEW.pdf
Craswell, N., Mitra, B., Yilmaz, E., Campos, D.: Overview of the TREC 2020 deep learning track. CoRR abs/2102.07662 (2021). https://arxiv.org/abs/2102.07662
Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.M.: Overview of the TREC 2019 deep learning track. CoRR abs/2003.07820 (2020). https://arxiv.org/abs/2003.07820
Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Järvelin, K., Beaulieu, M., Baeza-Yates, R.A., Myaeng, S. (eds.) SIGIR 2002: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 11–15 August 2002, Tampere, Finland, pp. 299–306. ACM (2002). https://doi.org/10.1145/564376.564429
Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Precision prediction based on ranked list coherence. Inf. Retr. 9(6), 723–755 (2006)
Google Scholar
Cummins, R., Jose, J.M., O’Riordan, C.: Improved query performance prediction using standard deviation. In: Ma, W., Nie, J., Baeza-Yates, R., Chua, T., Croft, W.B. (eds.) Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, 25–29 July 2011, pp. 1089–1090. ACM (2011). https://doi.org/10.1145/2009916.2010063
Déjean, S., Ionescu, R.T., Mothe, J., Ullah, M.Z.: Forward and backward feature selection for query performance prediction. In: Hung, C., Cerný, T., Shin, D., Bechini, A. (eds.) SAC 2020: The 35th ACM/SIGAPP Symposium on Applied Computing, online event, [Brno, Czech Republic], March 30 - April 3, 2020, pp. 690–697. ACM (2020). https://doi.org/10.1145/3341105.3373904
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423
Diaz, F.: Performance prediction using spatial autocorrelation. In: Kraaij, W., de Vries, A.P., Clarke, C.L.A., Fuhr, N., Kando, N. (eds.) SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, 23–27 July 2007, pp. 583–590. ACM (2007). https://doi.org/10.1145/1277741.1277841
Faggioli, G., Zendel, O., Culpepper, J.S., Ferro, N., Scholer, F.: An enhanced evaluation framework for query performance prediction. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12656, pp. 115–129. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72113-8_8
Chapter Google Scholar
Hashemi, H., Zamani, H., Croft, W.B.: Performance prediction for non-factoid question answering. In: Fang, Y., Zhang, Y., Allan, J., Balog, K., Carterette, B., Guo, J. (eds.) Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2019, Santa Clara, CA, USA, 2–5 October 2019, pp. 55–58. ACM (2019). https://doi.org/10.1145/3341981.3344249
He, B., Ounis, I.: Query performance prediction. Inf. Syst. 31(7), 585–594 (2006)
Google Scholar
He, J., Larson, M., de Rijke, M.: Using coherence-based measures to predict query difficulty. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 689–694. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_80
Chapter Google Scholar
Khodabakhsh, M., Bagheri, E.: Semantics-enabled query performance prediction for ad hoc table retrieval. Inf. Process. Manag. 58(1), 102399 (2021)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, pp. 1–15 (2015)
Google Scholar
Krikon, E., Carmel, D., Kurland, O.: Predicting the performance of passage retrieval for question answering. In: Chen, X., Lebanon, G., Wang, H., Zaki, M.J. (eds.) 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, Maui, HI, USA, October 29 - 02 November 2012, pp. 2451–2454. ACM (2012). https://doi.org/10.1145/2396761.2398664
Kurland, O., Shtok, A., Carmel, D., Hummel, S.: A unified framework for post-retrieval query-performance prediction. In: Amati, G., Crestani, F. (eds.) ICTIR 2011. LNCS, vol. 6931, pp. 15–26. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23318-0_4
Chapter Google Scholar
Mothe, J., Tanguy, L.: Linguistic features to predict query difficulty. In: SIGIR 2005 (2005)
Google Scholar
Nguyen, T., et al.: MS MARCO: A human generated machine reading comprehension dataset. CoRR abs/1611.09268 (2016). http://arxiv.org/abs/1611.09268
Nogueira, R., Cho, K.: Passage re-ranking with BERT. CoRR abs/1901.04085 (2019)
Google Scholar
Pang, L., Xu, J., Ai, Q., Lan, Y., Cheng, X., Wen, J.: Setrank: learning a permutation-invariant ranking model for information retrieval. In: SIGIR, pp. 499–508. ACM (2020)
Google Scholar
Pasumarthi, R.K., Wang, X., Bendersky, M., Najork, M.: Self-attentive document interaction networks for permutation equivariant ranking. CoRR abs/1910.09676 (2019)
Google Scholar
Pérez-Iglesias, J., Araujo, L.: Standard deviation as a query hardness estimator. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 207–212. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16321-0_21
Chapter Google Scholar
Raiber, F., Kurland, O.: Query-performance prediction: setting the expectations straight. In: Geva, S., Trotman, A., Bruza, P., Clarke, C.L.A., Järvelin, K. (eds.) The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, Gold Coast, QLD, Australia - 06–11 July 2014, pp. 13–22. ACM (2014). https://doi.org/10.1145/2600428.2609581
Raviv, H., Kurland, O., Carmel, D.: Query performance prediction for entity retrieval. In: Geva, S., Trotman, A., Bruza, P., Clarke, C.L.A., Järvelin, K. (eds.) The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, Gold Coast, QLD, Australia - 06–11 July 2014, pp. 1099–1102. ACM (2014). https://doi.org/10.1145/2600428.2609519
google research: GitHub - google-research/bert: TensorFlow code and pre-trained models for BERT. https://github.com/google-research/bert
Roitman, H.: An enhanced approach to query performance prediction using reference lists. In: Kando, N., Sakai, T., Joho, H., Li, H., de Vries, A.P., White, R.W. (eds.) Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, 7–11 August 2017, pp. 869–872. ACM (2017). https://doi.org/10.1145/3077136.3080665
Roitman, H.: ICTIR tutorial: Modern query performance prediction: Theory and practice. In: Balog, K., Setty, V., Lioma, C., Liu, Y., Zhang, M., Berberich, K. (eds.) ICTIR 2020: The 2020 ACM SIGIR International Conference on the Theory of Information Retrieval, Virtual Event, Norway, 14–17 September 2020, pp. 195–196. ACM (2020). https://dl.acm.org/doi/10.1145/3409256.3409813
Roitman, H., Erera, S., Shalom, O.S., Weiner, B.: Enhanced mean retrieval score estimation for query performance prediction. In: Kamps, J., Kanoulas, E., de Rijke, M., Fang, H., Yilmaz, E. (eds.) Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2017, Amsterdam, The Netherlands, 1–4 October 2017, pp. 35–42. ACM (2017). https://doi.org/10.1145/3121050.3121051
Roitman, H., Erera, S., Weiner, B.: Robust standard deviation estimation for query performance prediction. In: Kamps, J., Kanoulas, E., de Rijke, M., Fang, H., Yilmaz, E. (eds.) Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2017, Amsterdam, The Netherlands, 1–4 October 2017, pp. 245–248. ACM (2017). https://doi.org/10.1145/3121050.3121087
Roitman, H., Kurland, O.: Query performance prediction for pseudo-feedback-based retrieval. In: Piwowarski, B., Chevalier, M., Gaussier, É., Maarek, Y., Nie, J., Scholer, F. (eds.) Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, 21–25 July 2019, pp. 1261–1264. ACM (2019). https://doi.org/10.1145/3331184.3331369
Roitman, H., Mass, Y., Feigenblat, G., Shraga, R.: Query performance prediction for multifield document retrieval. In: Balog, K., Setty, V., Lioma, C., Liu, Y., Zhang, M., Berberich, K. (eds.) ICTIR 2020: The 2020 ACM SIGIR International Conference on the Theory of Information Retrieval, Virtual Event, Norway, 14–17 September 2020, pp. 49–52. ACM (2020). https://dl.acm.org/doi/10.1145/3409256.3409821
Shtok, A., Kurland, O., Carmel, D.: Predicting query performance by query-drift estimation. In: Azzopardi, L., et al. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 305–312. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04417-5_30
Chapter Google Scholar
Shtok, A., Kurland, O., Carmel, D.: Using statistical decision theory and relevance models for query-performance prediction. In: Crestani, F., Marchand-Maillet, S., Chen, H., Efthimiadis, E.N., Savoy, J. (eds.) Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, Geneva, Switzerland, 19–23 July 2010, pp. 259–266. ACM (2010). https://doi.org/10.1145/1835449.1835494
Shtok, A., Kurland, O., Carmel, D.: Query performance prediction using reference lists. ACM Trans. Inf. Syst. 34(4), 19:1–19:34 (2016). https://doi.org/10.1145/2926790
Tao, Y., Wu, S.: Query performance prediction by considering score magnitude and variance together. In: Li, J., Wang, X.S., Garofalakis, M.N., Soboroff, I., Suel, T., Wang, M. (eds.) Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, 3–7 November 2014, pp. 1891–1894. ACM (2014). https://doi.org/10.1145/2661829.2661906
Vinay, V., Cox, I.J., Milic-Frayling, N., Wood, K.R.: On ranking the effectiveness of searches. In: Efthimiadis, E.N., Dumais, S.T., Hawking, D., Järvelin, K. (eds.) SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, 6–11 August 2006, pp. 398–404. ACM (2006). https://doi.org/10.1145/1148170.1148239
Voorhees, E.M.: Overview of the TREC 2004 robust track. In: Proceedings of the Thirteenth Text REtrieval Conference. NIST Special Publication, vol. 500–261, pp. 1–10. National Institute of Standards and Technology (2004)
Google Scholar
Yang, P., Fang, H., Lin, J.: Anserini: enabling the use of lucene for information retrieval research. In: Kando, N., Sakai, T., Joho, H., Li, H., de Vries, A.P., White, R.W. (eds.) Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, 7–11 August 2017, pp. 1253–1256. ACM (2017). https://doi.org/10.1145/3077136.3080721
Zamani, H., Croft, W.B., Culpepper, J.S.: Neural query performance prediction using weak supervision from multiple signals. In: Collins-Thompson, K., Mei, Q., Davison, B.D., Liu, Y., Yilmaz, E. (eds.) The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, 08–12 July 2018, pp. 105–114. ACM (2018). https://doi.org/10.1145/3209978.3210041
Zendel, O., Culpepper, J.S., Scholer, F.: Is query performance prediction with multiple query variations harder than topic performance prediction? In: Diaz, F., Shah, C., Suel, T., Castells, P., Jones, R., Sakai, T. (eds.) SIGIR 2021: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 11–15 July 2021, pp. 1713–1717. ACM (2021). https://doi.org/10.1145/3404835.3463039
Zendel, O., Shtok, A., Raiber, F., Kurland, O., Culpepper, J.S.: Information needs, queries, and query performance prediction. In: Piwowarski, B., Chevalier, M., Gaussier, É., Maarek, Y., Nie, J., Scholer, F. (eds.) Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, 21–25 July 2019, pp. 395–404. ACM (2019). https://doi.org/10.1145/3331184.3331253
Zhou, Y., Croft, W.B.: Ranking robustness: a novel framework to predict query performance. In: Yu, P.S., Tsotras, V.J., Fox, E.A., Liu, B. (eds.) Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management, Arlington, Virginia, USA, November 6–11, 2006, pp. 567–574. ACM (2006). https://doi.org/10.1145/1183614.1183696
Zhou, Y., Croft, W.B.: Query performance prediction in web search environments. In: Kraaij, W., de Vries, A.P., Clarke, C.L.A., Fuhr, N., Kando, N. (eds.) SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, 23–27 July 2007, pp. 543–550. ACM (2007). https://doi.org/10.1145/1277741.1277835

Download references

Author information

Authors and Affiliations

University of Chinese Academy of Sciences, Beijing, China
Xiaoyang Chen & Ben He
Institute of Software, Chinese Academy of Sciences, Beijing, China
Xiaoyang Chen, Ben He & Le Sun

Authors

Xiaoyang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ben He
View author publications
You can also search for this author in PubMed Google Scholar
Le Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ben He .

Editor information

Editors and Affiliations

Martin Luther University Halle-Wittenberg, Halle, Germany
Matthias Hagen
Leiden University, Leiden, The Netherlands
Suzan Verberne
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Duisburg-Essen, Essen, Germany
Christin Seifert
University of Stavanger, Stavanger, Norway
Krisztian Balog
Norwegian University of Science and Technology, Trondheim, Norway
Kjetil Nørvåg
University of Stavanger, Stavanger, Norway
Vinay Setty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, X., He, B., Sun, L. (2022). Groupwise Query Performance Prediction with BERT. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13186. Springer, Cham. https://doi.org/10.1007/978-3-030-99739-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-99739-7_8
Published: 05 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99738-0
Online ISBN: 978-3-030-99739-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics