Skip to main content

A Framework for Benchmarking in CBIR

Abstract

Content-based image retrieval (CBIR) has been a very active research area for more than ten years. In the last few years the number of publications and retrieval systems produced has become larger and larger. Despite this, there is still no agreed objective way in which to compare the performance of any two of these systems. This fact is blocking the further development of the field since good or promising techniques can not be identified objectively, and the potential commercial success of CBIR systems is hindered because it is hard to establish the quality of an application.

We are thus in the position in which other research areas, such as text retrieval or the database systems, found themselves several years ago. To have serious applications, as well as commercial success, objective proof of system quality is needed: in text retrieval the TREC benchmark is a widely accepted performance measure; in the transaction processing field for databases it is the TPC benchmark that has wide support.

This paper describes a framework that enables the creation of a benchmark for CBIR. Parts of this framework have already been developed and systems can be evaluated against a small, freely-available database via a web interface. Much work remains to be done with respect to making available large, diverse image databases and obtaining relevance judgments for those large databases. We also need to establish an independent body, accepted by the entire community, that would organize a benchmarking event, give out official results and update the benchmark regularly. The Benchathlon could get this role if it manages to gain the confidence of the field. This should also prevent the negative effects, e.g., “benchmarketing”, experienced with other benchmarks, such as the TPC predecessors.

This paper sets out our ideas for an open framework for performance evaluation. We hope to stimulate discussion on evaluation in image retrieval so that systems can be compared on the same grounds. We also identify query paradigms beyond query by example (QBE) that may be integrated into a benchmarking framework, and we give examples of application-based benchmarking areas.

This is a preview of subscription content, access via your institution.

References

  1. G. Beretta and R. Schettini (Eds.), “Internet Imaging III,” in SPIE Proceedings. San Jose: California, USA, 2002, Vol. 4672, (SPIE Photonics West Conference).

  2. P. Borlund and P. Ingwersen, “The development of a method for the evaluation of interactive information retrieval systems,” Journal of Documentation, Vol. 53, pp. 225–250, 1997.

    Google Scholar 

  3. C.W. Cleverdon, “Report on the testing and analysis of an investigation into the comparative efficiency pf indexing systems,” Technical Report, Aslib Cranfield Research Project, Cranfield, USA, 1962.

  4. C.W. Cleverdon, L. Mills, and M. Keen, “Factors determining the performance of indexing systems,” Technical Report, ASLIB Cranfield Research Project, Cranfield, 1966.

  5. I.J. Cox, M.L. Miller, S.M. Omohundro, and P.N. Yianilos, “Target testing and the PicHunter Bayesian multimedia retrieval system,” in Advances in Digital Libraries (ADL'96). Library of Congress, Washington, D.C., 1996, pp. 66–75.

    Google Scholar 

  6. A. Dimai, “Assessment of effectiveness of content-based image retrieval systems,” in Third International Conference on Visual Information Systems (VISUAL'99), D.P. Huijsmans and A.W.M. Smeulders (Eds.), Springer-Verlag: Amsterdam, The Netherlands, 1999.

    Google Scholar 

  7. J.G. Dy, C.E. Brodley, A. Kak, C.-R. Shyu, and L.S. Broderick, “The customized-queries approach to CBIR using EM,” in Proceedings of the 1999 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'99). Fort Collins, Colorado, USA, 1999, pp. 400–406.

  8. J.P. Eakins, B.J.M., and M.E. Graham, “Similarity retrieval of trademark images,” IEEE Multimedia Magazine April June, 53–63, 1998.

  9. N.J. Gunther and G. Beretta, “A benchmark for image retrieval using distributed systems over the internet: BIRDS-I,” Technical Report, HP Labs, Palo Alto, Technical Report HPL-2000-162, San Jose, 2001.

  10. D. Harman, “Overview of the First Text REtrieval Conference (TREC-1),” in Proceedings of the First Text REtrieval Conference (TREC-1), Washington DC, USA, 1992, pp. 1–20.

  11. D.P. Huijsmans and A.W.M. Smeulders (Eds.), “Third international conference on visual information systems (VISUAL'99),” No. 1614 in Lecture Notes in Computer Science. Springer-Verlag: Amsterdam, The Netherlands, 1999.

    Google Scholar 

  12. ICME'2001, “Proceedings of the Second International Conference on Multimedia and Exposition (ICME'2001),” Tokyo, Japan, IEEE, 2001.

    Google Scholar 

  13. C. Jörgensen, “Classifying images: Criteria for grouping as revealed in a sorting task,” in Proceedings of the6th ASIS SIG/CR Classification Research Workshop, Chicago, IL, USA, 1995, pp. 65–78.

  14. C. Jörgensen and P. Jörgensen, “Testing a vocabulary for image indexing and ground truthing,” in SPIE Proceedings, G. Beretta and R. Schettini (Eds.), San Jose: California, USA, 2002, Vol. 4672, (SPIE Photonics West Conference).

  15. M. Koskela, J. Laaksonen, S. Laakso, and E. Oja, “Evaluating the performance of content-based image retrieval systems,” in Fourth International Conference on Visual Information Systems (VISUAL'2000), R. Laurini (Ed.), Springer-Verlag, 2000.

  16. R. Laurini (Ed.), “Fourth International Conference on Visual Information Systems (VISUAL'2000),” No. 1929 in Lecture Notes in Computer Science, Springer-Verlag: Lyon, France, 2000.

    Google Scholar 

  17. C. Leung and H. Ip, “Benchmarking for content-based visual information search,” in Fourth International Conference on Visual Information Systems (VISUAL'2000), R. Laurini (Ed.), Springer-Verlag, 2000.

  18. M. Markkula and E. Sormunen, “‘Searching for photos—Journalists’ practices in pictorial IR,” in The Challenge of Image Retrieval, A Workshop and Symposium on Image Retrieval, J.P. Eakins, D.J. Harper, and J. Jose (Eds.), Newcastle upon Tyne, The British Computer Society, 1998.

  19. F. Mokhtarian, S. Abbasi, and J. Kittler, “Efficient and robust retrieval by shape content through curvature scale space,” in Image Databases and Multi-Media Search, A.W.M. Smeulders and R. Jain (Eds.), Amsterdam University Press: Amsterdam, The Netherlands, 1996, pp. 35–42.

    Google Scholar 

  20. MPEG Requirements Group, “MPEG-7: Context and objectives (version 10 Atlantic City),” Doc. ISO/IEC JTC1/SC29/WG11, International Organisation for Standardisation, 1998.

  21. H. Müller, W. Müller, S. Marchand-Maillet, D.M. Squire, and T. Pun, “Automated benchmarking in contentbased image retrieval,” in Proceedings of the Second International Conference on Multimedia and Exposition (ICME'2001),” Tokyo, Japan, IEEE, 2001a.

    Google Scholar 

  22. H. Müller, W. Müller, S. Marchand-Maillet, D.M. Squire, and T. Pun, “A web-based evaluation system for content-based image retrieval,” in Proceedings of the ACMMultimediaWorkshop on Multimedia Information Retrieval (ACM MIR 2001). The Association for Computing Machinery: Ottawa, Canada, 2001b, pp. 50–54.

    Google Scholar 

  23. H. Müller, W. Müller, D.M. Squire, S. Marchand-Maillet, and T. Pun, “Performance evaluation in contentbased image retrieval: Overview and proposals,” Pattern Recognition Letters, Vol. 22, No. 5, 2001c.

  24. W. Müller, S. Marchand-Maillet, H. Müller, and T. Pun, “Towards a fair benchmark for image browsers,” in SPIE Photonics East, Voice, Video, and Data Communications. Boston, MA, USA, 2000.

  25. W. Müller, Z. Pečenović, A.P. de Vries, D.M. Squire, H. Müller, and T. Pun, “MRML: Towards an extensible standard for multimedia querying and benchmarking—Draft proposal,” Technical Report 99.04, ComputerVision Group, Computing Centre, University of Geneva, rue Général Dufour, 24, CH-1211 Genève, Switzerland, 1999.

  26. M. Nakazato and T.S. Huang, “3D Mars: Immersive virtual reality for content-based image retrieval,” in Proceedings of the Second International Conference on Multimedia and Exposition (ICME'2001),” Tokyo, Japan, IEEE, 2001, pp. 45–48.

    Google Scholar 

  27. A.D. Narasimhalu, M.S. Kankanhalli, and J. Wu, “Benchmarking multimedia databases,” Multimedia Tools and Applications, Vol. 4, pp. 333–356, 1997.

    Google Scholar 

  28. T. Pfund and S. Marchand-Maillet, “Dynamic multimedia annotation tool,” in SPIE Proceedings, G. Beretta and R. Schettini (Eds.), San Jose: California, USA, 2002, Vol. 4672, (SPIE Photonics West Conference).

  29. P.S. Salembier and B.S. Manjunath, “Audiovisual content description and retrieval: Tools and MPEG-7 standardization techniques,” in IEEE Internation Conference on Image Processing (ICIP 2000). Vancouver, BC, Canada, 2000.

  30. G. Salton, The SMART Retrieval System, Experiments in Automatic Document Processing. Prentice Hall: Englewood Cliffs, NJ, USA, 1971.

    Google Scholar 

  31. C.-R. Shyu, A. Kak, C. Brodley, and L.S. Broderick, “Testing for human perceptual categories in a physician-in-the-loop CBIR system for medical imagery,” in IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL'99). Fort Collins, CO, USA, 1999, pp. 102–108.

  32. J.R. Smith, “Image retrieval evaluation,” in IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL'98). Santa Barbara, CA, USA, 1998, pp. 112–113.

  33. K. Sparck Jones and C. van Rijsbergen, “Report on the need for and provision of an ideal information retrieval test collection,” British Library Research and Development Report 5266, Computer Laboratory, University of Cambridge, 1975.

  34. D.M. Squire, W. Müller, and H. Müller, “Relevance feedback and term weighting schemes for contentbased image retrieval,” in Third International Conference on Visual Information Systems (VISUAL'99), D.P. Huijsmans and A.W.M. Smeulders (Eds.), Springer-Verlag: Amsterdam, The Netherlands, 1999, pp. 549–556.

    Google Scholar 

  35. D.M. Squire and T. Pun, “A comparison of human and machine assessments of image similarity for the organization of image databases,” in The 10th Scandinavian Conference on Image Analysis (SCIA'97), M. Frydrych, J. Parkkinen, and A. Visa (Eds.), Pattern Recognition Society of Finland: Lappeenranta, Finland, 1997, pp. 51–58.

    Google Scholar 

  36. E.M. Vorhees and D. Harmann, “Overview of the seventh Text REtrieval Conference (TREC-7),” in The Seventh Text Retrieval Conference. Gaithersburg, MD, USA, 1998, pp. 1–23.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Müller, H., Müller, W., Marchand-Maillet, S. et al. A Framework for Benchmarking in CBIR. Multimedia Tools and Applications 21, 55–73 (2003). https://doi.org/10.1023/A:1025034215859

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025034215859

  • evaluation
  • content-based image retrieval
  • benchmarking
  • Benchathlon
  • TREC