An Intrinsic Framework of Information Retrieval Evaluation Measures

Giner, Fernando

doi:10.1007/978-3-031-47721-8_47

Fernando Giner ORCID: orcid.org/0000-0002-9161-0458¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 822))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

178 Accesses

Abstract

Information retrieval (IR) evaluation measures are cornerstones for determining the suitability and task performance efficiency of retrieval systems. Their metric and scale properties enable to compare one system against another to establish differences or similarities. Based on the representational theory of measurement, this paper determines these properties by exploiting the information contained in a retrieval measure itself. It establishes the intrinsic framework of a retrieval measure, which is the common scenario when the domain set is not explicitly specified. A method to determine the metric and scale properties of any retrieval measure is provided, requiring knowledge of only some of its attained values. The method establishes three main categories of retrieval measures according to their intrinsic properties. Some common user-oriented and system-oriented evaluation measures are classified according to the presented taxonomy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Here, the commonly used term “IR evaluation metric” collides with the mathematical term “metric”, which will be used later in this paper. To solve this issue, the rest of the paper will refer the term “IR evaluation metrics” as “IR evaluation measures”, keeping the term “metric” for its mathematical sense.
2.
Typically a SERP includes content in a non homogeneous manner, such as images, query suggestions, knowledge panels, etc. However, here, we consider the classical ordered (or unordered) list of documents since it is the common structure considered when the evaluation of ranking models is studied.
3.
The associated weak order, \(\preceq _f\), may be transformed into a total order by considering the following equivalence relation: \(\mathbf {\hat{r}_1} \sim _{f} \mathbf {\hat{r}_2} \Leftrightarrow f(\mathbf {\hat{r}_1}) = f(\mathbf {\hat{r}_2})\). Let \(\mathbf {R^{*}}\) be the set of equivalence classes, and let \(\mathbf {\hat{r}^{*}_1}\) and \(\mathbf {\hat{r}^{*}_2}\) be two elements of this set containing the individual system output rankings \(\mathbf {\hat{r}_1}\), \(\mathbf {\hat{r}_2} \in \textbf{R}\), respectively. It can be defined the following ordering on \(\mathbf {R^{*}}\): \(\mathbf {\hat{r}^{*}_1} \preceq _{f}^{*} \mathbf {\hat{r}^{*}_2} \Leftrightarrow \mathbf {\hat{r}_1} \preceq _{f} \mathbf {\hat{r}_2}\). Then, \((\mathbf {R^{*}}, \preceq _{f}^{*})\) is called the reduction or quotient of \((\textbf{R}, \preceq _{f})\), where \(\preceq _{f}^{*}\) is well-defined and \((\mathbf {R^{*}}, \preceq _{f}^{*})\) is a totally ordered set [72].
4.
Imagine hypothetical beings living on the surface of a two-dimensional Euclidean space, \(\mathbb {R}^2\), ignorant of the surrounding three-dimensional space (but with a sense of Euclidean distance). These beings are local observers, whose view reaches only a two coordinated environment. The geometrical elements of this surface capable of being observed or measured by these beings (essentially lengths) constitute what is called the intrinsic geometry of the surface. The intrinsic properties of the surface are those which depend exclusively on the surface itself.
5.
In basic algebra [36, 50, 51], f is an injective function, if f maps distinct elements to distinct elements, formally: \(f(\mathbf {\hat{r}_1}) = f(\mathbf {\hat{r}_2})\) implies \(\mathbf {\hat{r}_1} = \mathbf {\hat{r}_2}\), \(\forall \mathbf {\hat{r}_1}\), \(\mathbf {\hat{r}_2} \in \textbf{R}\).
6.
As noted in Sect. 4, the intrinsic properties of a retrieval measure deduced with this framework are based on the RTM.

References

Allan, J., Aslam, J., Belkin, N., Buckley, C., Callan, J., Croft, B., Dumais, S., Fuhr, N., Harman, D., Harper, D.J., et al.: Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent information retrieval, university of massachusetts amherst, september 2002. In: ACM SIGIR Forum. 1, pp. 31–47. ACM New York, NY, USA (2003)
Google Scholar
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retrieval 12(4), 461–486 (2009)
Article Google Scholar
Amigo, E., Gonzalo, J., Mizzaro, S.: What is my problem identifying formal tasks and metrics in data mining on the basis of measurement theory. IEEE Trans. Knowl. Data Eng. (2021)
Google Scholar
Amigó, E., Gonzalo, J., Verdejo, F.: A general evaluation measure for document organization tasks. In: Proceedings of the 36th international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 643–652 (2013)
Google Scholar
Amigó, E., Mizzaro, S.: On the nature of information access evaluation metrics: a unifying framework. Inf. Retr. J. 23(3), 318–386 (2020)
Article Google Scholar
Azzopardi, L., Thomas, P., Craswell, N.: Measuring the utility of search engine result pages: an information foraging based measure. In: The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 605–614 (2018)
Google Scholar
Baccianella, S., Esuli, A., Sebastiani, F.: Evaluation measures for ordinal regression. In: 2009 Ninth International Conference on Intelligent Systems Design and Applications, pp. 283–287. IEEE (2009)
Google Scholar
Belew, R.K.: Finding Out About: A Cognitive Perspective on Search Engine Technology and the WWW. Cambridge University Press (2000)
Google Scholar
Blair, D.C.: Information retrieval, 2nd ed. C.J. van rijsbergen. London: Butterworths. JASIS 30(6), 374–375 (1979). https://doi.org/10.1002/asi.4630300621
Bollmann, P.: Two axioms for evaluation measures in information retrieval. In: SIGIR, vol. 84, pp. 233–245. Citeseer (1984)
Google Scholar
Bollmann, P., Cherniavsky, V.S.: Measurement-theoretical investigation of the mz-metric. In: Proceedings of the 3rd Annual ACM Conference on Research and Development in Information Retrieval, pp. 256–267. Citeseer (1980)
Google Scholar
Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 25–32 (2004)
Google Scholar
Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: ACM SIGIR Forum. 2, pp. 235–242. ACM New York, NY, USA (2017)
Google Scholar
Busin, L., Mizzaro, S.: Axiometrics: An axiomatic approach to information retrieval effectiveness metrics. In: Proceedings of the 2013 Conference on the Theory of Information Retrieval, pp. 22–29 (2013)
Google Scholar
Büttcher, S., Clarke, C.L., Yeung, P.C., Soboroff, I.: Reliable information retrieval evaluation with incomplete and biased judgements. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 63–70 (2007)
Google Scholar
Carmel, D., Yom-Tov, E.: Estimating the query difficulty for information retrieval. Synth. Lect. Inf. Concepts, Retr., Serv. 2(1), 1–89 (2010)
Google Scholar
Carterette, B.: System effectiveness, user models, and user utility: a conceptual framework for investigation. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 903–912 (2011)
Google Scholar
Carterette, B.A.: Multiple testing in statistical analysis of systems-based information retrieval experiments. ACM Trans. Inf. Syst. (TOIS) 30(1), 1–34 (2012)
Article Google Scholar
Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for graded relevance. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 621–630 (2009)
Google Scholar
Cleverdon, C.W.: The significance of the cranfield tests on index languages. In: Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12 (1991)
Google Scholar
Clinchant, S., Gaussier, E.: Is document frequency important for prf? In: Conference on the Theory of Information Retrieval, pp. 89–100. Springer, Berlin (2011)
Google Scholar
Clinchant, S., Gaussier, E.: A theoretical analysis of pseudo-relevance feedback models. In: Proceedings of the 2013 Conference on the Theory of Information Retrieval, pp. 6–13 (2013)
Google Scholar
Cooper, W.S.: Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems. Am. Doc. 19(1), 30–41 (1968)
Article Google Scholar
Croft, W.B., Metzler, D., Strohman, T.: Search Engines: Information Retrieval in Practice, vol. 520. Addison-Wesley Reading (2010)
Google Scholar
Do Carmo, M.P.: Differential Geometry of Curves and Surfaces: Revised and Updated, 2nd edn. Courier Dover Publications (2016)
Google Scholar
Fang, H.: An axiomatic approach to information retrieval. Technical report (2007)
Google Scholar
Fang, H., Tao, T., Zhai, C.: A formal study of information retrieval heuristics. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–56 (2004)
Google Scholar
Fang, H., Tao, T., Zhai, C.: Diagnostic evaluation of information retrieval models. ACM Trans. Inf. Syst. (TOIS) 29(2), 1–42 (2011)
Article Google Scholar
Fang, H., Zhai, C.: An exploration of axiomatic approaches to information retrieval. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 480–487 (2005)
Google Scholar
Fang, H., Zhai, C.: Semantic term matching in axiomatic approaches to information retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 115–122 (2006)
Google Scholar
Ferrante, M., Ferro, N., Fuhr, N.: Towards meaningful statements in ir evaluation: Mapping evaluation measures to interval scales. IEEE Access 9, 136,182–136,216 (2021)
Google Scholar
Ferrante, M., Ferro, N., Fuhr, N.: Response to moffat’s comment on “towards meaningful statements in ir evaluation: Mapping evaluation measures to interval scales” (2022). https://doi.org/10.48550/ARXIV.2212.11735
Ferrante, M., Ferro, N., Pontarollo, S.: A general theory of ir evaluation measures. IEEE Trans. Knowl. Data Eng. 31(3), 409–422 (2018)
Article Google Scholar
Ferro, N., Peters, C.: Information Retrieval Evaluation in a Changing World: Lessons Learned from 20 Years of CLEF, vol. 41. Springer, Berlin (2019)
Google Scholar
Flach, P.: Performance evaluation in machine learning: the good, the bad, the ugly, and the way forward. In: Proceedings of the AAAI Conference on Artificial Intelligence, 01, pp. 9808–9814 (2019)
Google Scholar
Fraleigh, J.B.: A First Course in Abstract Algebra. Pearson Education India (2003)
Google Scholar
Fréchet, M.M.: Sur quelques points du calcul fonctionnel. Rendiconti del Circolo Matematico di Palermo (1884–1940) 22(1), 1–72 (1906)
Google Scholar
Fuhr, N.: Some common mistakes in ir evaluation, and how they can be avoided. In: ACM SIGIR Forum. 3, pp. 32–41. ACM New York, NY, USA (2018)
Google Scholar
Gaudette, L., Japkowicz, N.: Evaluation methods for ordinal classification. In: Canadian Conference on Artificial Intelligence, pp. 207–210. Springer, Berlin (2009)
Google Scholar
Gauss, C.F.: Disquisitiones Generales Circa Superficies Curvas, vol. 1. Typis Dieterichianis (1828)
Google Scholar
Giner, F.: A comment to “a general theory of ir evaluation measures” (2023). arXiv:2303.16061
Guccione, J.A.: Espacios métricos. Universidad de Buenos Aires, Texto (2018)
Google Scholar
Han, L., Roitero, K., Maddalena, E., Mizzaro, S., Demartini, G.: On transforming relevance scales. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 39–48 (2019)
Google Scholar
Hand, D.J.: Statistics and the theory of measurement. J. R. Stat. Soc. A. Stat. Soc. 159(3), 445–473 (1996)
Article Google Scholar
Harman, D.: Information retrieval evaluation. Synth. Lect. Inf. Concepts, Retr., Serv. 3(2), 1–119 (2011)
Google Scholar
Hauff, C., de Jong, F.: Retrieval system evaluation: Automatic evaluation versus incomplete judgments. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 863–864 (2010)
Google Scholar
Hausdorff, F.: Set Theory, vol. 119. American Mathematical Soc. (2005)
Google Scholar
Huibers, T.W.C.: An axiomatic theory for information retrieval. Ph.D. thesis (1996)
Google Scholar
Hull, D.: Using statistical testing in the evaluation of retrieval experiments. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 329–338 (1993)
Google Scholar
Hungerford, T.W.: Algebra, vol. 73. Springer Science & Business Media (2012)
Google Scholar
Jacobson, N.: Basic Algebra I. Courier Corporation (2012)
Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)
Article Google Scholar
Kando, N.: Information retrieval system evaluation using multi-grade relevance judgments-discussion on averageable single-numbered measures. IPSJ SIG Notes 63, 105–112 (2001)
Google Scholar
Karimzadehgan, M., Zhai, C.: Axiomatic analysis of translation language model for information retrieval. In: European Conference on Information Retrieval, pp. 268–280. Springer, Berlin (2012)
Google Scholar
Kazai, G.: Report of the inex 2003 metrics working group. In: Initiative for the Evaluation of XML Retrieval (INEX): INEX 2003 Workshop Proceedings, Dagstuhl, Germany (2004)
Google Scholar
Kazai, G., Lalmas, M.: Inex 2005 evaluation measures. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) Advances in XML Information Retrieval and Evaluation, pp. 16–29. Springer, Berlin (2006)
Google Scholar
Kekäläinen, J., Järvelin, K.: Using graded relevance assessments in ir evaluation. J. Am. Soc. Inform. Sci. Technol. 53(13), 1120–1129 (2002)
Article Google Scholar
Korfhage, R.R.: Information Storage and Retrieval. Wiley, USA (1997)
Google Scholar
Krantz, D., Luce, D., Suppes, P., Tversky, A.: Foundations of Measurement, vol. I: Additive and Polynomial Representations (1971)
Google Scholar
Krantz, D.H.: Foundations of Measurement, vol. II. Geometrical, Threshold and Probabilistic Representations (1989)
Google Scholar
Luce, D., Krantz, D., Suppes, P., Tversky, A.: Foundations of Measurement, Vol. III Representation, Axiomatization, and Invariance (1990)
Google Scholar
Maddalena, E., Mizzaro, S.: Axiometrics: Axioms of information retrieval effectiveness metrics. In: EVIA@ NTCIR (2014)
Google Scholar
Michell, J.: Measurement scales and statistics: a clash of paradigms. Psychol. Bull. 100(3), 398 (1986)
Article Google Scholar
Michell, J.: An Introduction to the Logic of Psychological Measurement. Psychology Press (2014)
Google Scholar
Moffat, A.: Seven numeric properties of effectiveness metrics. In: Asia Information Retrieval Symposium, pp. 1–12. Springer, Berlin (2013)
Google Scholar
Moffat, A.: Batch evaluation metrics in information retrieval: Measures, scales, and meaning. IEEE Access 10, 105, 564–105,577 (2022)
Google Scholar
Moffat, A., Bailey, P., Scholer, F., Thomas, P.: Incorporating user expectations and behavior into the measurement of search effectiveness. ACM Trans. Inf. Syst. (TOIS) 35(3), 1–38 (2017)
Article Google Scholar
Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. (TOIS) 27(1), 1–27 (2008)
Article Google Scholar
Montazeralghaem, A., Zamani, H., Shakery, A.: Axiomatic analysis for improving the log-logistic feedback model. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 765–768 (2016)
Google Scholar
Pollock, S.M.: Measures for the comparison of information retrieval systems. Am. Doc. 19(4), 387–397 (1968)
Article Google Scholar
Rahimi, R., Montazeralghaem, A., Shakery, A.: An axiomatic approach to corpus-based cross-language information retrieval. Inf. Retr. J. 23(3), 191–215 (2020)
Article Google Scholar
Roberts, F.S.: Measurement theory. Encycl. Math. Appl. 7 (1985)
Google Scholar
Robertson, S.: On gmap: and other transformations. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 78–83 (2006)
Google Scholar
Robertson, S.: On the history of evaluation in ir. J. Inf. Sci. 34(4), 439–456 (2008)
Article Google Scholar
Rocchio, J.: Performance indices for document retrieval systems. In: Information Storage and Retrieval p. 83 (1964)
Google Scholar
Rosset, C., Mitra, B., Xiong, C., Craswell, N., Song, X., Tiwary, S.: An axiomatic approach to regularizing neural ranking models. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 981–984 (2019)
Google Scholar
Sagara, Y.: Performance measures for ranked output retrieval systems. J. Jpn. Soc. Inf. Knowl. 12(2), 22–36 (2002)
Google Scholar
Sakai, T.: New performance metrics based on multigrade relevance: their application to question answering. In: NTCIR (2004)
Google Scholar
Sakai, T.: Metrics, statistics, tests. In: PROMISE Winter School, pp. 116–163. Springer, Berlin (2013)
Google Scholar
Sakai, T.: Statistical reform in information retrieval? In: ACM SIGIR Forum, vol. 48, pp. 3–12. ACM, New York, NY, USA (2014)
Google Scholar
Sakai, T.: On fuhr’s guideline for ir evaluation. In: ACM SIGIR Forum, vol. 54, pp. 1–8. ACM, New York, NY, USA (2021)
Google Scholar
Sakai, T., Kando, N.: On information retrieval metrics designed for evaluation with incomplete relevance assessments. Inf. Retr. 11(5), 447–470 (2008)
Article Google Scholar
Sakai, T., Oard, D.W., Kando, N.: Evaluating Information Retrieval and Access Tasks: NTCIR’s Legacy of Research Impact. Springer Nature (2021)
Google Scholar
Salton, G.: Automatic Information Organization and Retrieval (1968)
Google Scholar
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. Mcgraw-Hill (1983)
Google Scholar
Sanderson, M.: Test collection based evaluation of information retrieval systems. Found. Trends Inf. Retr. 4(4), 247–375 (2010)
Google Scholar
Savoy, J.: Statistical inference in retrieval effectiveness evaluation. Inf. Process. Manag. 33(4), 495–512 (1997)
Article Google Scholar
Sebastiani, F.: An axiomatically derived measure for the evaluation of classification algorithms. In: Proceedings of the 2015 International Conference on the Theory of Information Retrieval, pp. 11–20 (2015)
Google Scholar
Sirotkin, P.: On search engine evaluation metrics (2013). arXiv:1302.2318
Stevens, S.S.: Mathematics, Measurement, and Psychophysics. Wiley, New York (1951)
Google Scholar
Stevens, S.S., et al.: On the Theory of Scales of Measurement. Bobbs-Merrill, College Division (1946)
Google Scholar
Swets, J.A.: Information retrieval systems. Science 141(3577), 245–250 (1963)
Article Google Scholar
Urbano, J., Lima, H., Hanjalic, A.: Statistical significance testing in information retrieval: an empirical analysis of type i, type ii and type iii errors. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 505–514 (2019)
Google Scholar
Van Rijsbergen, C.J.: Foundation of evaluation. J. Doc. 30(4), 365–373 (1974)
Google Scholar
Vanbelle, S., Albert, A.: A note on the linearly weighted kappa coefficient for ordinal scales. Stat. Methodol. 6(2), 157–163 (2009)
Article MathSciNet Google Scholar
Velleman, P.F., Wilkinson, L.: Nominal, ordinal, interval, and ratio typologies are misleading. Am. Stat. 47(1), 65–72 (1993)
Google Scholar
Voorhees, E.M.: The trec 2005 robust track. In: ACM SIGIR Forum, vol. 40, pp. 41–48. ACM, New York, NY, USA (2006)
Google Scholar
Voorhees, E.M., Harman, D.K.: TREC: Experiment and Evaluation in Information Retrieval, vol. 63. Citeseer (2005)
Google Scholar
Voorhees, E.M., et al.: Overview of the trec 2003 robust retrieval track. In: Trec, pp. 69–77 (2003)
Google Scholar
Wicaksono, A.F., Moffat, A.: Metrics, user models, and satisfaction. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 654–662 (2020)
Google Scholar
Zhang, F., Liu, Y., Li, X., Zhang, M., Xu, Y., Ma, S.: Evaluating web search with a bejeweled player model. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 425–434 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

E.T.S.I. Informática UNED, C/ Juan del Rosal, 16, 28040, Madrid, Spain
Fernando Giner

Authors

Fernando Giner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fernando Giner .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

A Appendix

1.1 A.1 Formal Proofs

Proof

(Proposition 1) Symmetry is trivially verified since \(d_{f}(\mathbf {\hat{r}_1},\mathbf {\hat{r}_2})= \vert f(\mathbf {\hat{r}_1}) - f(\mathbf {\hat{r}_2}) \vert = \vert f(\mathbf {\hat{r}_2}) - f(\mathbf {\hat{r}_1}) \vert = d_{f}(\mathbf {\hat{r}_2},\mathbf {\hat{r}_1})\). Triangular inequality is also trivial, by considering the triangular inequality on the real numbers: \(\vert f(\mathbf {\hat{r}_1}) - f(\mathbf {\hat{r}_2}) \vert \le \vert f(\mathbf {\hat{r}_1}) - f(\mathbf {\hat{r}_3}) \vert + \vert f(\mathbf {\hat{r}_3}) - f(\mathbf {\hat{r}_2}) \vert \). \(\square \)

Proof

(Proposition 2) An interesting result about metric spaces [42] states the following: “Let \((\mathbf {R_2}, d_2)\) be a metric space and let \(f:\mathbf {R_1} \longrightarrow \mathbf {R_2}\) an an injective or one-to-one function, then \((\mathbf {R_1}, d_1)\) is a metric space, where \(d_1(\mathbf {\hat{r}_1}, \mathbf {\hat{r}_2}) = d_2(f(\mathbf {\hat{r}_1}), f(\mathbf {\hat{r}_2}))\), \(\forall \mathbf {\hat{r}_1}\), \(\mathbf {\hat{r}_2} \in \mathbf {R_1}\)”.

In the retrieval scenario, \((\mathbf {R_2}, d_2) = (\mathbb {R}, \vert \cdot \vert )\), which is the metric space of the real line endowed with the usual norm (the absolute value). Let f be a one-to-one IR evaluation measure; from the previous result, it follows that \((\mathbf {R_1}, d_1) = (\textbf{R}, d_f)\) is a metric space, i.e., \(d_f\) verifies the three postulates of a metric. \(\square \)

Proof

(Proposition 3) It will be seen the implication from right to left. Consider a metric ordinal scale, f, where the attained values are equally spaced.

An interval is called prime if \([\mathbf {\hat{r}_1}, \mathbf {\hat{r}_2}] = \{\mathbf {\hat{r}_1}, \mathbf {\hat{r}_2}\}\). First, it will be seen that the function, \(F(\textbf{x}, \textbf{y}) = \vert f(\textbf{x}) - f(\textbf{y}) \vert \), attains its minimum value on any prime interval.

Let \([\mathbf {\hat{r}_1}, \mathbf {\hat{r}_3}] = \{\mathbf {\hat{r}_1}, \mathbf {\hat{r}_2}, \mathbf {\hat{r}_3}\}\) be a non-prime interval, where \(\mathbf {\hat{r}_1} \preceq _f \mathbf {\hat{r}_2} \preceq _f \mathbf {\hat{r}_3}\), then it holds that \(f(\mathbf {\hat{r}_1}) \le f(\mathbf {\hat{r}_2}) \le f(\mathbf {\hat{r}_3})\) since f is an ordinal scale. It implies that \(\vert f(\mathbf {\hat{r}_3}) - f(\mathbf {\hat{r}_1}) \vert \le \vert f(\mathbf {\hat{r}_3}) - f(\mathbf {\hat{r}_2}) \vert + \vert f(\mathbf {\hat{r}_2}) - f(\mathbf {\hat{r}_1}) \vert \), i.e., the minimum value of F is not attained at \([\mathbf {\hat{r}_1}, \mathbf {\hat{r}_3}]\). In addition, it holds that the function F assign the same value for every prime interval. Given a prime interval, \([\mathbf {\hat{r}_1}, \mathbf {\hat{r}_2}]\), it can be considered one of its consecutive prime intervals, \([\mathbf {\hat{r}_2}, \mathbf {\hat{r}_3}]\), since \(\preceq _f\) is a weak order (every pair of elements is comparable). These two prime intervals verify that \(f(\mathbf {\hat{r}_1}) < f(\mathbf {\hat{r}_2}) < f(\mathbf {\hat{r}_3})\) since f is a metric, and the attained values of f are equally spaced. Thus, it can be assumed that \(F(\mathbf {\hat{r}_1},\mathbf {\hat{r}_2}) = k \in \mathbb {R}^{+}\) for any prime interval \([\mathbf {\hat{r}_1}, \mathbf {\hat{r}_2}]\).

Now, it will be seen that equally spaced intervals (not necessarily prime) are assigned equal differences. Consider any non-prime interval, \([\mathbf {\hat{r}_1}, \mathbf {\hat{r}_m}] = \{\mathbf {\hat{r}_1}, \mathbf {\hat{r}_2}, \ldots , \mathbf {\hat{r}_m}\}\). As f is a metric, then it attains different values for different elements. Thus, it can be assumed that \(f(\mathbf {\hat{r}_1}) < f(\mathbf {\hat{r}_2}) < \cdots < f(\mathbf {\hat{r}_{m-1}}) < f(\mathbf {\hat{r}_m})\). Then, every interval \([\mathbf {\hat{r}_i}, \mathbf {\hat{r}_{i+1}}]\) are prime intervals for \(i=1, \ldots m-1\) since F attain the minimum at these intervals. As \(f(\mathbf {\hat{r}_m}) - f(\mathbf {\hat{r}_1}) = f(\mathbf {\hat{r}_m}) - f(\mathbf {\hat{r}_{m-1}}) + f(\mathbf {\hat{r}_{m-1}}) - \cdots - f(\mathbf {\hat{r}_2}) + f(\mathbf {\hat{r}_2}) - f(\mathbf {\hat{r}_1})\) and \(f(\mathbf {\hat{r}_{i+1}}) - f(\mathbf {\hat{r}_i}) = k\) for \(1 \le i \le m-1\), then \(f(\mathbf {\hat{r}_1}) - f(\mathbf {\hat{r}_m}) = k \cdot m\), which only depends on the span of the interval, m, not on the considered elements. Therefore, equally spaced intervals are assigned equal differences, i.e., f is an interval scale.

Finally, it will be seen the other implication. Consider any prime interval, \([\mathbf {\hat{r}_1}, \mathbf {\hat{r}_2}]\), of \(\textbf{R}\), as f is an interval scale, then equally spaced intervals are assigned to equal differences, i.e., the value \(\vert f(\mathbf {\hat{r}_2}) - f(\mathbf {\hat{r}_1}) \vert \) is constant for every prime interval of \(\textbf{R}\). In addition, it should be an strictly positive value. To see that the attained values are equally spaced, it is sufficient to check that different elements of \(\textbf{R}\) are assigned different values of f, which is hold since f is a metric. \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Giner, F. (2024). An Intrinsic Framework of Information Retrieval Evaluation Measures. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 822. Springer, Cham. https://doi.org/10.1007/978-3-031-47721-8_47

Download citation

DOI: https://doi.org/10.1007/978-3-031-47721-8_47
Published: 10 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47720-1
Online ISBN: 978-3-031-47721-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

An Intrinsic Framework of Information Retrieval Evaluation Measures

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Formal Proofs

Proof

Proof

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation