Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

  • Guillaume Cabanac
  • Gilles Hubert
  • Mohand Boughanem
  • Claude Chrisment
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6360)


We consider Information Retrieval evaluation, especially at Trec with the trec_eval program. It appears that systems obtain scores regarding not only the relevance of retrieved documents, but also according to document names in case of ties (i.e., when they are retrieved with the same score). We consider this tie-breaking strategy as an uncontrolled parameter influencing measure scores, and argue the case for fairer tie-breaking strategies. A study of 22 Trec editions reveals significant differences between the Conventional unfair Trec’s strategy and the fairer strategies we propose. This experimental result advocates using these fairer strategies when conducting evaluations.


Relevant Document Fair Evaluation Realistic Strategy Information Retrieval Evaluation Uncontrolled Parameter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Robertson, S.: On the history of evaluation in IR. J. Inf. Sci. 34(4), 439–456 (2008)CrossRefGoogle Scholar
  2. 2.
    Harman, D.K. (ed.): TREC-1: Proceedings of the First Text REtrieval Conference, Gaithersburg, MD, USA, NIST (February 1993)Google Scholar
  3. 3.
    Voorhees, E.M., Harman, D.K.: TREC: Experiment and Evaluation in Information Retrieval. MIT Press, Cambridge (2005)Google Scholar
  4. 4.
    NIST: README file for trec_eval 8.1,
  5. 5.
    Kando, N., Kuriyama, K., Nozue, T., Eguchi, K., Kato, H., Hidaka, S.: Overview of IR Tasks at the First NTCIR Workshop. In: Proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition, NACSIS, pp. 11–44 (1999)Google Scholar
  6. 6.
    Peters, C., Braschler, M.: European Research Letter – Cross-Language System Evaluation: the CLEF Campaigns. J. Am. Soc. Inf. Sci. Technol. 52(12), 1067–1072 (2001)CrossRefGoogle Scholar
  7. 7.
    Clough, P., Sanderson, M.: The CLEF 2003 Cross Language Image Retrieval Track. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 581–593. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Voorhees, E.M.: Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness. In: SIGIR 1998: Proceedings of the 21st Annual International ACM SIGIR Conference, pp. 315–323. ACM, New York (1998)Google Scholar
  9. 9.
    Zobel, J.: How Reliable are the Results of large-scale Information Retrieval Experiments? In: SIGIR 1998: Proceedings of the 21st Annual International ACM SIGIR Conference, pp. 307–314. ACM Press, New York (1998)Google Scholar
  10. 10.
    Buckley, C., Voorhees, E.M.: Retrieval System Evaluation. In: [3], ch.  3, pp. 53–75Google Scholar
  11. 11.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (July 2008)CrossRefzbMATHGoogle Scholar
  12. 12.
    Voorhees, E.M.: Overview of the TREC 2004 Robust Track. In: Voorhees, E.M., Buckland, L.P. (eds.) TREC 2004: Proceedings of the 13th Text REtrieval Conference, Gaithersburg, MD, USA, NIST (2004)Google Scholar
  13. 13.
    Sanderson, M., Zobel, J.: Information Retrieval System Evaluation: Effort, Sensitivity, and Reliability. In: SIGIR 2005: Proceedings of the 28th annual international ACM SIGIR conference, pp. 162–169. ACM, New York (2005)Google Scholar
  14. 14.
    Hull, D.: Using Statistical Testing in the Evaluation of Retrieval Experiments. In: SIGIR 1993: Proceedings of the 16th annual international ACM SIGIR conference, pp. 329–338. ACM Press, New York (1993)Google Scholar
  15. 15.
    Raghavan, V., Bollmann, P., Jung, G.S.: A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans. Inf. Syst. 7(3), 205–229 (1989)CrossRefGoogle Scholar
  16. 16.
    McSherry, F., Najork, M.: Computing Information Retrieval Performance Measures Efficiently in the Presence of Tied Scores. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 414–421. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  17. 17.
    Di Nunzio, G.M., Ferro, N.: DIRECT: A System for Evaluating Information Access Components of Digital Libraries. In: Rauber, A., Christodoulakis, S., Tjoa, A.M. (eds.) ECDL 2005. LNCS, vol. 3652, pp. 483–484. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  18. 18.
    Joachims, T., Li, H., Liu, T.Y., Zhai, C.: Learning to Rank for Information Retrieval (LR4IR 2007). SIGIR Forum 41(2), 58–62 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Guillaume Cabanac
    • 1
  • Gilles Hubert
    • 1
  • Mohand Boughanem
    • 1
  • Claude Chrisment
    • 1
  1. 1.Université de Toulouse — IRIT UMR 5505 CNRSToulouse cedex 9

Personalised recommendations