Understandability Biased Evaluation for Information Retrieval

  • Guido ZucconEmail author
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9626)


Although relevance is known to be a multidimensional concept, information retrieval measures mainly consider one dimension of relevance: topicality. In this paper we propose a method to integrate multiple dimensions of relevance in the evaluation of information retrieval systems. This is done within the gain-discount evaluation framework, which underlies measures like rank-biased precision (RBP), cumulative gain, and expected reciprocal rank. Albeit the proposal is general and applicable to any dimension of relevance, we study specific instantiations of the approach in the context of evaluating retrieval systems with respect to both the topicality and the understandability of retrieved documents. This leads to the formulation of understandability biased evaluation measures based on RBP. We study these measures using both simulated experiments and real human assessments. The findings show that considering both understandability and topicality in the evaluation of retrieval systems leads to claims about system effectiveness that differ from those obtained when considering topicality alone.


User Model Information Retrieval System Readability Score Gain Function Discount Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The author is thankful to Bevan Koopman, Leif Azzopardi, Joao Palotti, Peter Bruza, Alistair Moffat and Lorraine Goeuriot for their comments on the ideas proposed in this paper.


  1. 1.
    Ahmed, O.H., Sullivan, S.J., Schneiders, A.G., McCrory, P.R.: Concussion information online: evaluation of information quality, content and readability of concussion-related websites. Br. J. Sports Med. 46(9), 675–683 (2012)CrossRefGoogle Scholar
  2. 2.
    Barry, C.L.: User-defined relevance criteria: an exploratory study. JASIS 45(3), 149–159 (1994)CrossRefGoogle Scholar
  3. 3.
    Bruza, P.D., Zuccon, G., Sitbon, L.: Modelling the information seeking user by the decision they make. In: Proceedings of MUBE, pp. 5–6 (2013)Google Scholar
  4. 4.
    Carterette, B.: System effectiveness, user models, and user utility: a conceptualframework for investigation. In: Proceedings of SIGIR, pp. 903–912 (2011)Google Scholar
  5. 5.
    Clarke, C.L., Craswell, N., Soboroff, I., Ashkan, A.: A comparative analysis of cascade measures for novelty and diversity. In: Proceedings of WSDM, pp. 75–84 (2011)Google Scholar
  6. 6.
    Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I.: Novelty and diversity in information retrieval evaluation. In: Proceedings of SIGIR, pp. 659–666 (2008)Google Scholar
  7. 7.
    Collins-Thompson, K., Callan, J.: Predicting reading difficulty with statistical language models. JASIST 56(13), 1448–1462 (2005)CrossRefGoogle Scholar
  8. 8.
    Cosijn, E., Ingwersen, P.: Dimensions of relevance. IP&M 36(4), 533–550 (2000)Google Scholar
  9. 9.
    Cuadra, C.A., Katter, R.V.: Opening the black box of ‘relevance’. J. Doc. 23(4), 291–303 (1967)CrossRefGoogle Scholar
  10. 10.
    Eisenberg, M., Barry, C.: Order effects: a study of the possible influence of presentation order on user judgments of document relevance. JASIS 39(5), 293–300 (1988)CrossRefGoogle Scholar
  11. 11.
    Friedman, D.B., Hoffman-Goetz, L., Arocha, J.F.: Health literacy and the world wide web: comparing the readability of leading incident cancers on the internet. Inf. Health Soc. Care 31(1), 67–87 (2006)Google Scholar
  12. 12.
    Goeuriot, L., Jones, G., Kelly, L., Leveling, J., Hanbury, A., Müller, H., Salanterä, S., Suominen, H., Zuccon, G.: ShARe/CLEF eHealth Evaluation Lab 2013, Task 3: Informationretrieval to address patients’ questions when reading clinical reports. In: Proceedings of CLEF (2013)Google Scholar
  13. 13.
    Goeuriot, L., Kelly, L., Lee, W., Palotti, J., Pecina, P., Zuccon, G., Hanbury, A., Gareth, H.M., Jones, J.F.: ShARe, CLEF eHealth Evaluation Lab 2014, Task 3: User-centred health information retrieval. In: Proceedings of CLEF Sheffield, UK (2014)Google Scholar
  14. 14.
    Larsson, P.: Classification into readability levels: implementation andevaluation. PhD thesis, Uppsala University (2006)Google Scholar
  15. 15.
    McCallum, D.R., Peterson, J.L.: Computer-based readability indexes. In: Proceedings of the ACM Conference, pp. 44–48 (1982)Google Scholar
  16. 16.
    Mizzaro, S.: Relevance: the whole history. JASIS 48(9), 810–832 (1997)CrossRefGoogle Scholar
  17. 17.
    Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. TOIS 27(1), 2 (2008)CrossRefGoogle Scholar
  18. 18.
    Palotti, J., Zuccon, G., Goeuriot, L., Kelly, L., Hanbury, A., Jones, G.J., Lupu, M., Pecina, P.: Clef eHealth evaluation lab 2015, task 2: Retrieving informationabout medical symptoms. In: Proceedings of CLEF (2015)Google Scholar
  19. 19.
    Palotti, J., Zuccon, G., Hanbury, A.: The influence of pre-processing on the estimation of readability of web documents. In: Proceedings of CIKM (2015)Google Scholar
  20. 20.
    Rees, A.M., Schultz, D.G.: A field experimental approach to the study of relevance assessments in relation to document searching. Technical report, Case Western Reserve University (1967)Google Scholar
  21. 21.
    Robertson, S.E.: The probability ranking principle in IR. J. Doc. 33(4), 294–304 (1977)CrossRefGoogle Scholar
  22. 22.
    Sakai, T., Song, R.: Evaluating diversified search results using per-intent graded relevance. In: Proceedings of SIGIR, pp. 1043–1052 (2011)Google Scholar
  23. 23.
    Saracevic, T.: The stratified model of information retrieval interaction: extension and applications. Proceedings of ASIS, vol. 34, pp. 313–327 (1997)Google Scholar
  24. 24.
    Schamber, L., Eisenberg, M.: Relevance: the search for a definition. In: Proceedings of ASIS (1988)Google Scholar
  25. 25.
    Smucker, M.D., Clarke, C.L.: Time-based calibration of effectiveness measures. In: Proceedings of SIGIR, pp. 95–104 (2012)Google Scholar
  26. 26.
    Walsh, T.M., Volsko, T.A.: Readability assessment of internet-based consumer health information. Respir. Care 53(10), 1310–1315 (2008)Google Scholar
  27. 27.
    Wiener, R.C., Wiener-Pla, R.: Literacy, pregnancy and potential oral health changes: the internetand readability levels. Matern. Child Health J. 1–6 (2013)Google Scholar
  28. 28.
    Xu, Y.C., Chen, Z.: Relevance judgment: what do information users consider beyond topicality? JASIST 57(7), 961–973 (2006)CrossRefGoogle Scholar
  29. 29.
    Yan, X., Song, D., Li, X.: Concept-based document readability in domain specific information retrieval. In: Proceedings of CIKM, pp. 540–549 (2006)Google Scholar
  30. 30.
    Yilmaz, E., Aslam, J.A., Robertson, S.: A new rank correlation coefficient for information retrieval. In: Proceedings of SIGIR, pp. 587–594 (2008)Google Scholar
  31. 31.
    Zhai, C.X., Cohen, W.W., Lafferty, J.: Beyond independent relevance: methods and evaluation metrics forsubtopic retrieval. In: Proceedings of SIGIR, pp. 10–17 (2003)Google Scholar
  32. 32.
    Zhang, Y., Park, L.A., Moffat, A.: Click-based evidence for decaying weight distributions in search effectiveness metrics. Inf. Retrieval 13(1), 46–69 (2010)CrossRefGoogle Scholar
  33. 33.
    Zhang, Y., Zhang, J., Lease, M., Gwizdka, J.: Multidimensional relevance modeling via psychometrics and crowdsourcing. In: Proceedings of SIGIR, pp. 435–444 (2014)Google Scholar
  34. 34.
    Zuccon, G., Koopman, B.: Integrating understandability in the evaluation of consumer health search engines. Proceedings of MedIR, pp. 32–35 (2014)Google Scholar
  35. 35.
    Zuccon, G., Koopman, B., Palotti, J.: Diagnose this if you can. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 562–567. Springer, Heidelberg (2015)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Queensland University of Technology (QUT)BrisbaneAustralia

Personalised recommendations