Judging Relevance Using Magnitude Estimation

  • Eddy Maddalena
  • Stefano Mizzaro
  • Falk Scholer
  • Andrew Turpin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9022)


Magnitude estimation is a psychophysical scaling technique whereby numbers are assigned to stimuli to reflect the ratios of their perceived intensity. We report on a crowdsourcing experiment aimed at understanding if magnitude estimation can be used to gather reliable relevance judgements for documents, as is commonly required for test collection-based evaluation of information retrieval systems. Results on a small dataset show that: (i) magnitude estimation can produce relevance rankings that are consistent with more classical ordinal judgements; (ii) both an upper-bounded and an unbounded scale can be used effectively, though with some differences; (iii) the presentation order of the documents being judged has a limited effect, if any; and (iv) only a small number repeat judgements are required to obtain reliable magnitude estimation scores.


Magnitude Estimation Ordinal Scale Expert Judgement Relevance Level Relevance Score 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Eisenberg, M.: Measuring relevance judgements. Information Processing and Management 24, 373–389 (1988)CrossRefGoogle Scholar
  2. 2.
    Gescheider, G.: Psychophysics: The Fundamentals. Lawrence Erlbaum Associates, 3rd edn. (1997)Google Scholar
  3. 3.
    McGee, M.: Usability magnitude estimation. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 47(4), 691–695 (2003)CrossRefGoogle Scholar
  4. 4.
    Moskowitz, H.R.: Magnitude estimation: notes on what, how, when, and why to use it. Journal of Food Quality 1(3), 195–227 (1977)CrossRefGoogle Scholar
  5. 5.
    Sormunen, E.: Liberal relevance criteria of TREC: Counting on negligible documents? In: 25th SIGIR, pp. 324–330. ACM, New York (2002)Google Scholar
  6. 6.
    Spink, A., Greisdorf, H.: Regions and levels: Measuring and mapping users’ relevance judgments. JASIST 52(2), 161–173 (2001)CrossRefGoogle Scholar
  7. 7.
    Stevens, S.S.: A metric for the social consensus. Science 151(3710), 530–541 (1966)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Eddy Maddalena
    • 1
  • Stefano Mizzaro
    • 1
  • Falk Scholer
    • 2
  • Andrew Turpin
    • 3
  1. 1.University of UdineUdineItaly
  2. 2.RMIT UniversityMelbourneAustralia
  3. 3.University of MelbourneMelbourneAustralia

Personalised recommendations