Using the Euclidean Distance for Retrieval Evaluation

  • Shengli Wu
  • Yaxin Bi
  • Xiaoqin Zeng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7051)


In information retrieval systems and digital libraries, retrieval result evaluation is a very important aspect. Up to now, almost all commonly used metrics such as average precision and recall level precision are ranking based metrics. In this work, we investigate if it is a good option to use a score based method, the Euclidean distance, for retrieval evaluation. Two variations of it are discussed: one uses the linear model to estimate the relation between rank and relevance in resultant lists, and the other uses a more sophisticated cubic regression model for this. Our experiments with two groups of submitted results to TREC demonstrate that the introduced new metrics have strong correlation with ranking based metrics when we consider the average of all 50 queries. On the other hand, our experiments also show that one of the variations (the linear model) has better overall quality than all those ranking based metrics involved. Another surprising finding is that a commonly used metric, average precision, may not be as good as previously thought.


Euclidean Distance Relevant Document Average Precision Information Retrieval System Relevance Score 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aslam, J.A., Montague, M.: Models for metasearch. In: Proceedings of the 24th Annual International ACM SIGIR Conference, New Orleans, Louisiana, USA, pp. 276–284 (September 2001)Google Scholar
  2. 2.
    Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: Proceedings of ACM SIGIR Conference, Athens, Greece, pp. 33–40 (July 2000)Google Scholar
  3. 3.
    Calvé, A.L., Savoy, J.: Database merging strategy based on logistic regression. Information Processing & Management 36(3), 341–359 (2000)CrossRefGoogle Scholar
  4. 4.
    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4), 442–446 (2002)CrossRefGoogle Scholar
  5. 5.
    Lee, J.H.: Analysis of multiple evidence combination. In: Proceedings of the 20th Annual International ACM SIGIR Conference, Philadelphia, Pennsylvania, USA, pp. 267–275 (July 1997)Google Scholar
  6. 6.
    Montague, M., Aslam, J.A.: Relevance score normalization for metasearch. In: Proceedings of ACM CIKM Conference, Berkeley, USA, pp. 427–433 (November 2001)Google Scholar
  7. 7.
    Sakai, T.: Evaluating evaluation metrics based on the bootstrap. In: Proceedings of ACM SIGIR Conference, Seattle, USA, pp. 525–532 (August 2006)Google Scholar
  8. 8.
    Sanderson, M., Zobel, J.: Information retrieval system evaluation: Effort, sensitivity, and reliability. In: Proceedings of ACM SIGIR Conference, Salvador, Brazil, pp. 162–169 (August 2005)Google Scholar
  9. 9.
    Wu, S., Bi, Y., McClean, S.: Regression relevance models for data fusion. In: Proceedings of the 18th International Workshop on Database and Expert Systems Applications, Regensburg, Germany, pp. 264–268 (September 2007)Google Scholar
  10. 10.
    Wu, S., Bi, Y., Zeng, X.: Retrieval result presentation and evaluation. In: Bi, Y., Williams, M.-A. (eds.) KSEM 2010. LNCS, vol. 6291, pp. 125–136. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    Wu, S., Crestani, F., Bi, Y.: Evaluating score normalization methods in data fusion. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 642–648. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  12. 12.
    Wu, S., McClean, S.: Evaluation of system measures for incomplete relevance judgment in IR. In: Larsen, H.L., Pasi, G., Ortiz-Arroyo, D., Andreasen, T., Christiansen, H. (eds.) FQAS 2006. LNCS (LNAI), vol. 4027, pp. 245–256. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Zobel, J.: How reliable are the results of large-scale information retrieval experiments. In: Proceedings of ACM SIGIR Conference, Melbourne, Australia, pp. 307–314 (August 1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Shengli Wu
    • 1
  • Yaxin Bi
    • 1
  • Xiaoqin Zeng
    • 2
  1. 1.School of Computing and MathematicsUniversity of UlsterNorthern Ireland, UK
  2. 2.College of Computer and Information EngineeringHehai UniversityNanjingChina

Personalised recommendations