Measuring the Variability in Effectiveness of a Retrieval System
A typical evaluation of a retrieval system involves computing an effectiveness metric, e.g. average precision, for each topic of a test collection and then using the average of the metric, e.g. mean average precision, to express the overall effectiveness. However, averages do not capture all the important aspects of effectiveness and, used alone, may not be an informative measure of systems’ effectiveness. Indeed, in addition to the average, we need to consider the variation of effectiveness across topics. We refer to this variation as the variability in effectiveness. In this paper we explore how the variance of a metric can be used as a measure of variability. We define a variability metric, and illustrate how the metric can be used in practice.
KeywordsInformation Retrieval Retrieval System Average Precision Test Collection Robust Track
Unable to display preview. Download preview PDF.
- 5.Lee, C.T., Vinay, V., Mendes Rodrigues, E., Kazai, G., Milic-Frayling, N., Ignjatovic, A.: Measuring system performance and topic discernment using generalized adaptive-weight mean. In: CIKM 2009: Proceeding of the 18th ACM conference on Information and knowledge management, pp. 2033–2036. ACM, New York (2009)CrossRefGoogle Scholar
- 6.Levene, H.: Robust test for equality of variances. Contributions to Probability and Statistics: Essays in Honor of Harold Hotteling, 278–292 (1960)Google Scholar
- 13.Webber, W., Moffat, A., Zobel, J.: Score standardization for inter-collection comparison of retrieval systems. In: SIGIR 2008: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 51–58. ACM, New York (2008)CrossRefGoogle Scholar