Human-Based Query Difficulty Prediction

  • Adrian-Gabriel Chifu
  • Sébastien Déjean
  • Stefano Mizzaro
  • Josiane MotheEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10193)


The purpose of an automatic query difficulty predictor is to decide whether an information retrieval system is able to provide the most appropriate answer for a current query. Researchers have investigated many types of automatic query difficulty predictors. These are mostly related to how search engines process queries and documents: they are based on the inner workings of searching/ranking system functions, and therefore they do not provide any really insightful explanation as to the reasons for the difficulty, and they neglect user-oriented aspects. In this paper we study if humans can provide useful explanations, or reasons, of why they think a query will be easy or difficult for a search engine. We run two experiments with variations in the TREC reference collection, the amount of information available about the query, and the method of annotation generation. We examine the correlation between the human prediction, the reasons they provide, the automatic prediction, and the actual system effectiveness. The main findings of this study are twofold. First, we confirm the result of previous studies stating that human predictions correlate only weakly with system effectiveness. Second, and probably more important, after analyzing the reasons given by the annotators we find that: (i) overall, the reasons seem coherent, sensible, and informative; (ii) humans have an accurate picture of some query or term characteristics; and (iii) yet, they cannot reliably predict system/query difficulty.


Free Text Query Term Free Text Comment Human Annotator Query Suggestion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bashir, S.: Combining pre-retrieval query quality predictors using genetic programming. Appl. Intell. 40(3), 525–535 (2014)CrossRefGoogle Scholar
  2. 2.
    Benzécri, J.-P., et al.: Correspondence Analysis Handbook. Marcel Dekker, New York (1992)zbMATHGoogle Scholar
  3. 3.
    Collins-Thompson, K., Macdonald, C., Bennett, P., Diaz, F., Voorhees, E.: TREC 2014 Web track overview. In: Text REtrieval Conference. NIST (2015)Google Scholar
  4. 4.
    Harman, D., Buckley, C.: The NRRC reliable information access (RIA) workshop. In: SIGIR, pp. 528–529 (2004)Google Scholar
  5. 5.
    Harman, D., Buckley, C.: Overview of the reliable information access workshop. Inf. Retr. 12(6), 615–641 (2009)CrossRefGoogle Scholar
  6. 6.
    Hauff, C.: Predicting the effectiveness of queries and retrieval systems. Ph.D. thesis (2010)Google Scholar
  7. 7.
    Hauff, C., Hiemstra, D., de Jong, F.: A survey of pre-retrieval query performance predictors. In: CIKM, pp. 1419–1420 (2008)Google Scholar
  8. 8.
    Hauff, C., Kelly, D., Azzopardi, L.: A comparison of user and system query performance predictions. In: Conference on Information and Knowledge Management, CIKM, pp. 979–988 (2010)Google Scholar
  9. 9.
    Liu, J., Kim, C.S.: Why do users perceive search tasks as difficult? Exploring difficulty in different task types. In: Symposium on Human-Computer Interaction and Information Retrieval, USA, pp. 5:1–5:10 (2013)Google Scholar
  10. 10.
    Liu, J., Kim, C.S., Creel, C.: Exploring search task difficulty reasons in different task types and user knowledge groups. Inf. Process. Manag. 51(3), 273–285 (2014)CrossRefGoogle Scholar
  11. 11.
    Mizzaro, S., Mothe, J.: Why do you think this query is difficult? A user study on human query prediction. In: SIGIR, pp. 1073–1076. ACM (2016)Google Scholar
  12. 12.
    Mothe, J., Tanguy, L.: Linguistic features to predict query difficulty. In: Conference on Research and Development in IR, SIGIR, Predicting Query Difficulty Workshop, pp. 7–10 (2005)Google Scholar
  13. 13.
    Shtok, A., Kurland, O., Carmel, D.: Predicting query performance by query-drift estimation. In: Azzopardi, L., Kazai, G., Robertson, S., Rüger, S., Shokouhi, M., Song, D., Yilmaz, E. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 305–312. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-04417-5_30 CrossRefGoogle Scholar
  14. 14.
    Spärck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)CrossRefGoogle Scholar
  15. 15.
    Tan, L., Clarke, C.L.: A family of rank similarity measures based on maximized effectiveness difference. arXiv preprint arXiv:1408.3587 (2014)

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Adrian-Gabriel Chifu
    • 1
  • Sébastien Déjean
    • 2
  • Stefano Mizzaro
    • 3
  • Josiane Mothe
    • 4
    Email author
  1. 1.LSIS - UMR 7296 CNRS, Aix-Marseille UniversitéMarseilleFrance
  2. 2.IMT UMR 5219 CNRS, Univ. de Toulouse, Univ. Paul SabatierToulouseFrance
  3. 3.University of UdineUdineItaly
  4. 4.IRIT UMR 5505 CNRS, ESPE, Univ. de Toulouse, UT2JToulouseFrance

Personalised recommendations