Skip to main content
Log in

Evaluation Method, Dataset Size or Dataset Content: How to Evaluate Algorithms for Image Matching?

  • Published:
Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Abstract

Most vision papers have to include some evaluation work in order to demonstrate that the algorithm proposed is an improvement on existing ones. Generally, these evaluation results are presented in tabular or graphical forms. Neither of these is ideal because there is no indication as to whether any performance differences are statistically significant. Moreover, the size and nature of the dataset used for evaluation will obviously have a bearing on the results, and neither of these factors are usually discussed. This paper evaluates the effectiveness of commonly used performance characterization metrics for image feature detection and description for matching problems and explores the use of statistical tests such as McNemar’s test and ANOVA as better alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://www.robots.ox.ac.uk/~vgg/research/affine/.

  2. http://www.robots.ox.ac.uk/~vgg/research/affine/.

  3. http://www.featurespace.org/.

References

  1. Abdi, H.: Bonferroni and Šidák Corrections for Multiple Comparisons. Sage, Thousand Oaks, CA (2007)

    Google Scholar 

  2. de Araújo, S.A., Kim, H.Y.: Color-ciratefi: A color-based rst-invariant template matching algorithm. In: IWSSIP-17th International Conference on Systems, Signals and Image Processing (2010)

  3. Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: Computer Vision-ECCV 2006, pp. 404–417. Springer, Berlin (2006)

  4. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser B (Methodol) 57(1), 289–300 (1995)

    MathSciNet  MATH  Google Scholar 

  5. Benjamini, Y., Yekutieli, D.: The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29(4), 1165–1188 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bostanci, E., Kanwal, N., Clark, A.: Spatial statistics of image features for performance comparison. IEEE Trans. Image Process. 23(1), 153–162 (2014). doi:10.1109/TIP.2013.2286907

    Article  MathSciNet  Google Scholar 

  7. Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: binary robust independent elementary features. Comput. Vis. ECCV 2010, 778–792 (2010)

    Google Scholar 

  8. Clark, A.F., Clark, C.: Performance Characterization in Computer Vision a Tutorial [2010-02-10]. http://peipa.essex.ac.uk/benchmark/tutorial/esser/tutorial.pdf (1999)

  9. Cornelis, N., Van Gool, L.: Fast scale invariant feature detection and matching on programmable graphics hardware. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2008. CVPRW’08, pp. 1–8. IEEE (2008)

  10. Deng, H., Mortensen, E.N., Shapiro, L., Dietterich, T.G.: Reinforcement matching using region context. In: CVPRW’06. Conference on Computer Vision and Pattern Recognition Workshop, 2006, pp. 11–11. IEEE (2006)

  11. Derrac, J., García, S., Molina, D., Herrera, F.: A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evolut. Comput. 1(1), 3–18 (2011)

    Article  Google Scholar 

  12. Durkalski, V.L., Palesch, Y.Y., Lipsitz, S.R., Rust, P.F.: Analysis of clustered matched-pair data. Stat. Med. 22(15), 2417–2428 (2003)

    Article  Google Scholar 

  13. Ehsan, S., Kanwal, N., Clark, A.F., McDonald-Maier, K.D.: Measuring the coverage of interest point detectors. In: Image Analysis and Recognition, pp. 253–261. Springer, Berlin (2011)

  14. Frothingham, R.: Rates of torsades de pointes associated with ciprofloxacin, ofloxacin, levofloxacin, gatifloxacin, and moxifloxacin. Pharmacotherapy 21(12), 1468–1472 (2001)

    Article  Google Scholar 

  15. Galen, R.S., Gambino, S.R.: Beyond Normality: The Predictive Value and Efficiency of Medical Diagnoses. Wiley, New York (1975)

    Google Scholar 

  16. Gönen, M., Panageas, K.S., Larson, S.M.: Statistical issues in analysis of diagnostic imaging experiments with multiple observations per patient1. Radiology 221(3), 763–767 (2001)

    Article  Google Scholar 

  17. Guney, M., Arica, N.: Maximally stable texture regions. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 4549–4552. IEEE (2010)

  18. Guo, X., Cao, X.: Triangle-constraint for finding more good features. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 1393–1396. IEEE (2010)

  19. Hamel, L.: Model assessment with ROC curves. In: Encyclopedia of DataWarehousing and Mining, 2nd edn., pp. 1316–1323. IGI Global Web, University of Rhode Island (2009)

  20. Hanczar, B., Hua, J., Sima, C., Weinstein, J., Bittner, M., Dougherty, E.R.: Small-sample precision of roc-related estimates. Bioinformatics 26(6), 822–830 (2010)

    Article  Google Scholar 

  21. Hartley, H.O.: The use of range in analysis of variance. Biometrika 37(3/4), 271–280 (1950)

    Article  MathSciNet  MATH  Google Scholar 

  22. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision, vol. 2. Cambridge University Press, Cambridge (2000)

    MATH  Google Scholar 

  23. Hochberg, Y., Benjamini, Y.: More powerful procedures for multiple significance testing. Stat. Med. 9(7), 811–818 (1990)

    Article  Google Scholar 

  24. von Hundelshausen, F., Sukthankar, R.: D-nets: Beyond patch-based image descriptors. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2941–2948. IEEE (2012)

  25. Issac, A., Velayutham, C.S., et al.: Saddlesurf: A saddle based interest point detector. In: Mathematical Modelling and Scientific Computation, pp. 413–420. Springer, Berlin (2012)

  26. Jain, A.K., Lee, J.E., Jin, R.: Graffiti-id: Matching and retrieval of graffiti images. In: Proceedings of the First ACM workshop on Multimedia in forensics, pp. 1–6. ACM (2009)

  27. Kanwal, N.: Motion Tracking in Video using the Best Feature Extraction Technique. Grin Publishing, Germany (2009)

    Google Scholar 

  28. Kapsalas, P., Kollias, S.: Affine morphological shape stable boundary regions (ssbr) for image representation. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 3381–3384. IEEE (2011)

  29. Kar, S.S., Ramalingam, A.: Is 30 the magic number? issues in sample size estimation. Natl. J. Commun. Med. 4(1), 175–179 (2013)

  30. Kerr, D., Coleman, S., Scotney, B.: Fesid: Finite element scale invariant detector. In: Image Analysis and Processing-ICIAP 2009, pp. 72–81. Springer, Berlin (2009)

  31. Lakemond, R., Fookes, C., Sridharan, S.: Affine adaptation of local image features using the hessian matrix. In: Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, 2009. AVSS’09. pp. 496–501. IEEE (2009)

  32. Liu, J., Zeng, G.: Description of interest regions with oriented local self-similarity. Opt. Commun. 285(10), 2549–2557 (2012)

    Article  Google Scholar 

  33. Lobo, J.M., Jiménez-Valverde, A., Real, R.: Auc: a misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr. 17(2), 145–151 (2008)

    Article  Google Scholar 

  34. Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  35. Lusted, L.B.: Signal detectability and medical decision-making. Science 171(3977), 1217–1219 (1971)

    Article  Google Scholar 

  36. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing, vol. 999. MIT Press, Cambridge, MA (1999)

    MATH  Google Scholar 

  37. Martins, P., Gatta, C., Carvalho, P.: Feature-driven maximally stable extremal regions. In: VISAPP (1), pp. 490–497 (2012)

  38. McNemar, Q.: Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2), 153–157 (1947)

    Article  Google Scholar 

  39. Medathati, N., Sivaswamy, J.: Local descriptor based on texture of projections. In: Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing, pp. 398–404. ACM (2010)

  40. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1615–1630 (2005)

  41. Pearson, E.S., Hartley, H.O.: Biometrika Tables for Statisticians,vol. 1. Cambridge University Press, Cambridge, UK (1962)

  42. Perneger, T.V.: What’s wrong with bonferroni adjustments. Br. Med. J. 316, 1236–1238 (1998). http://www.bmj.com/cgi/content/full/316/7139/1236

  43. Provost, F.J., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: ICML, vol. 98, pp. 445–453 (1998)

  44. Rassem, T.H., Khoo, B.E.: New color image histogram-based detectors. In: Visual Informatics: Sustaining Research and Innovations, pp. 151–163. Springer, Berlin (2011)

  45. Sakthivel, K., Narayanan, R.: An automated detection of glaucoma using histogram features. Int. J. Ophthalmol. 8(1), 194 (2015)

    Google Scholar 

  46. Schmid, C., Mohr, R., Bauckhage, C.: Evaluation of interest point detectors. Int. J. Comput. Vis. 37(2), 151–172 (2000)

    Article  MATH  Google Scholar 

  47. Toews, M., Wells, W.: Sift-rank: Ordinal description for invariant feature correspondence. In: CVPR 2009. IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 172–177. IEEE (2009)

  48. Tuytelaars, T., Mikolajczyk, K.: Local invariant feature detectors: a survey. Found. Trends Comput. Gr. Vis. 3(3), 177–280 (2008)

    Article  Google Scholar 

  49. Uemura, N., Okamoto, S., Yamamoto, S., Matsumura, N., Yamaguchi, S., Yamakido, M., Taniyama, K., Sasaki, N., Schlemper, R.: Helicobacter pylori infection and the development of gastric cancer. N. Engl. J. Med. 345(11), 784 (2001)

    Article  Google Scholar 

  50. Valgren, C., Lilienthal, A.: SIFT, SURF and seasons: Long-term outdoor localization using local features. In: Proceedings of the European Conference on Mobile Robots (ECMR), pp. 253–258. Citeseer (2007)

  51. Vasey, M.W., Thayer, J.F.: The continuing problem of false positives in repeated measures anova in psychophysiology: A multivariate solution. Psychophysiology 24(4), 479–486 (1987)

    Article  Google Scholar 

  52. Wang, X.Y., Niu, P.P., Yang, H.Y., Chen, L.L.: Affine invariant image watermarking using intensity probability density-based harris laplace detector. J. Vis. Commun. Image Represent. 23(6), 892–907 (2012)

    Article  Google Scholar 

  53. Wellner, B., McCallum, A., Peng, F., Hay, M.: An integrated, conditional model of information extraction and coreference with application to citation matching. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 593–601. AUAI Press (2004)

  54. Zhang, M., Zhang, Y., Wang, J.: Eliminating false matches using geometric context. In: Contemporary Research on E-business Technology and Strategy, pp. 325–334. Springer, Berlin (2012)

  55. Zhu, Q., Liu, X., Cai, C., Liu, Q.: Image local invariant feature description fusing multiple information. In: Fifth International Conference on Machine Vision (ICMV 12), pp. 87830E–87830E. International Society for Optics and Photonics (2013)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nadia Kanwal.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kanwal, N., Bostanci, E. & Clark, A.F. Evaluation Method, Dataset Size or Dataset Content: How to Evaluate Algorithms for Image Matching?. J Math Imaging Vis 55, 378–400 (2016). https://doi.org/10.1007/s10851-015-0626-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10851-015-0626-4

Keywords

Navigation