Abstract
Most vision papers have to include some evaluation work in order to demonstrate that the algorithm proposed is an improvement on existing ones. Generally, these evaluation results are presented in tabular or graphical forms. Neither of these is ideal because there is no indication as to whether any performance differences are statistically significant. Moreover, the size and nature of the dataset used for evaluation will obviously have a bearing on the results, and neither of these factors are usually discussed. This paper evaluates the effectiveness of commonly used performance characterization metrics for image feature detection and description for matching problems and explores the use of statistical tests such as McNemar’s test and ANOVA as better alternatives.
Similar content being viewed by others
References
Abdi, H.: Bonferroni and Šidák Corrections for Multiple Comparisons. Sage, Thousand Oaks, CA (2007)
de Araújo, S.A., Kim, H.Y.: Color-ciratefi: A color-based rst-invariant template matching algorithm. In: IWSSIP-17th International Conference on Systems, Signals and Image Processing (2010)
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: Computer Vision-ECCV 2006, pp. 404–417. Springer, Berlin (2006)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser B (Methodol) 57(1), 289–300 (1995)
Benjamini, Y., Yekutieli, D.: The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29(4), 1165–1188 (2001)
Bostanci, E., Kanwal, N., Clark, A.: Spatial statistics of image features for performance comparison. IEEE Trans. Image Process. 23(1), 153–162 (2014). doi:10.1109/TIP.2013.2286907
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: binary robust independent elementary features. Comput. Vis. ECCV 2010, 778–792 (2010)
Clark, A.F., Clark, C.: Performance Characterization in Computer Vision a Tutorial [2010-02-10]. http://peipa.essex.ac.uk/benchmark/tutorial/esser/tutorial.pdf (1999)
Cornelis, N., Van Gool, L.: Fast scale invariant feature detection and matching on programmable graphics hardware. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2008. CVPRW’08, pp. 1–8. IEEE (2008)
Deng, H., Mortensen, E.N., Shapiro, L., Dietterich, T.G.: Reinforcement matching using region context. In: CVPRW’06. Conference on Computer Vision and Pattern Recognition Workshop, 2006, pp. 11–11. IEEE (2006)
Derrac, J., García, S., Molina, D., Herrera, F.: A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evolut. Comput. 1(1), 3–18 (2011)
Durkalski, V.L., Palesch, Y.Y., Lipsitz, S.R., Rust, P.F.: Analysis of clustered matched-pair data. Stat. Med. 22(15), 2417–2428 (2003)
Ehsan, S., Kanwal, N., Clark, A.F., McDonald-Maier, K.D.: Measuring the coverage of interest point detectors. In: Image Analysis and Recognition, pp. 253–261. Springer, Berlin (2011)
Frothingham, R.: Rates of torsades de pointes associated with ciprofloxacin, ofloxacin, levofloxacin, gatifloxacin, and moxifloxacin. Pharmacotherapy 21(12), 1468–1472 (2001)
Galen, R.S., Gambino, S.R.: Beyond Normality: The Predictive Value and Efficiency of Medical Diagnoses. Wiley, New York (1975)
Gönen, M., Panageas, K.S., Larson, S.M.: Statistical issues in analysis of diagnostic imaging experiments with multiple observations per patient1. Radiology 221(3), 763–767 (2001)
Guney, M., Arica, N.: Maximally stable texture regions. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 4549–4552. IEEE (2010)
Guo, X., Cao, X.: Triangle-constraint for finding more good features. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 1393–1396. IEEE (2010)
Hamel, L.: Model assessment with ROC curves. In: Encyclopedia of DataWarehousing and Mining, 2nd edn., pp. 1316–1323. IGI Global Web, University of Rhode Island (2009)
Hanczar, B., Hua, J., Sima, C., Weinstein, J., Bittner, M., Dougherty, E.R.: Small-sample precision of roc-related estimates. Bioinformatics 26(6), 822–830 (2010)
Hartley, H.O.: The use of range in analysis of variance. Biometrika 37(3/4), 271–280 (1950)
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision, vol. 2. Cambridge University Press, Cambridge (2000)
Hochberg, Y., Benjamini, Y.: More powerful procedures for multiple significance testing. Stat. Med. 9(7), 811–818 (1990)
von Hundelshausen, F., Sukthankar, R.: D-nets: Beyond patch-based image descriptors. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2941–2948. IEEE (2012)
Issac, A., Velayutham, C.S., et al.: Saddlesurf: A saddle based interest point detector. In: Mathematical Modelling and Scientific Computation, pp. 413–420. Springer, Berlin (2012)
Jain, A.K., Lee, J.E., Jin, R.: Graffiti-id: Matching and retrieval of graffiti images. In: Proceedings of the First ACM workshop on Multimedia in forensics, pp. 1–6. ACM (2009)
Kanwal, N.: Motion Tracking in Video using the Best Feature Extraction Technique. Grin Publishing, Germany (2009)
Kapsalas, P., Kollias, S.: Affine morphological shape stable boundary regions (ssbr) for image representation. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 3381–3384. IEEE (2011)
Kar, S.S., Ramalingam, A.: Is 30 the magic number? issues in sample size estimation. Natl. J. Commun. Med. 4(1), 175–179 (2013)
Kerr, D., Coleman, S., Scotney, B.: Fesid: Finite element scale invariant detector. In: Image Analysis and Processing-ICIAP 2009, pp. 72–81. Springer, Berlin (2009)
Lakemond, R., Fookes, C., Sridharan, S.: Affine adaptation of local image features using the hessian matrix. In: Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, 2009. AVSS’09. pp. 496–501. IEEE (2009)
Liu, J., Zeng, G.: Description of interest regions with oriented local self-similarity. Opt. Commun. 285(10), 2549–2557 (2012)
Lobo, J.M., Jiménez-Valverde, A., Real, R.: Auc: a misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr. 17(2), 145–151 (2008)
Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Lusted, L.B.: Signal detectability and medical decision-making. Science 171(3977), 1217–1219 (1971)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing, vol. 999. MIT Press, Cambridge, MA (1999)
Martins, P., Gatta, C., Carvalho, P.: Feature-driven maximally stable extremal regions. In: VISAPP (1), pp. 490–497 (2012)
McNemar, Q.: Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2), 153–157 (1947)
Medathati, N., Sivaswamy, J.: Local descriptor based on texture of projections. In: Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing, pp. 398–404. ACM (2010)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1615–1630 (2005)
Pearson, E.S., Hartley, H.O.: Biometrika Tables for Statisticians,vol. 1. Cambridge University Press, Cambridge, UK (1962)
Perneger, T.V.: What’s wrong with bonferroni adjustments. Br. Med. J. 316, 1236–1238 (1998). http://www.bmj.com/cgi/content/full/316/7139/1236
Provost, F.J., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: ICML, vol. 98, pp. 445–453 (1998)
Rassem, T.H., Khoo, B.E.: New color image histogram-based detectors. In: Visual Informatics: Sustaining Research and Innovations, pp. 151–163. Springer, Berlin (2011)
Sakthivel, K., Narayanan, R.: An automated detection of glaucoma using histogram features. Int. J. Ophthalmol. 8(1), 194 (2015)
Schmid, C., Mohr, R., Bauckhage, C.: Evaluation of interest point detectors. Int. J. Comput. Vis. 37(2), 151–172 (2000)
Toews, M., Wells, W.: Sift-rank: Ordinal description for invariant feature correspondence. In: CVPR 2009. IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 172–177. IEEE (2009)
Tuytelaars, T., Mikolajczyk, K.: Local invariant feature detectors: a survey. Found. Trends Comput. Gr. Vis. 3(3), 177–280 (2008)
Uemura, N., Okamoto, S., Yamamoto, S., Matsumura, N., Yamaguchi, S., Yamakido, M., Taniyama, K., Sasaki, N., Schlemper, R.: Helicobacter pylori infection and the development of gastric cancer. N. Engl. J. Med. 345(11), 784 (2001)
Valgren, C., Lilienthal, A.: SIFT, SURF and seasons: Long-term outdoor localization using local features. In: Proceedings of the European Conference on Mobile Robots (ECMR), pp. 253–258. Citeseer (2007)
Vasey, M.W., Thayer, J.F.: The continuing problem of false positives in repeated measures anova in psychophysiology: A multivariate solution. Psychophysiology 24(4), 479–486 (1987)
Wang, X.Y., Niu, P.P., Yang, H.Y., Chen, L.L.: Affine invariant image watermarking using intensity probability density-based harris laplace detector. J. Vis. Commun. Image Represent. 23(6), 892–907 (2012)
Wellner, B., McCallum, A., Peng, F., Hay, M.: An integrated, conditional model of information extraction and coreference with application to citation matching. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 593–601. AUAI Press (2004)
Zhang, M., Zhang, Y., Wang, J.: Eliminating false matches using geometric context. In: Contemporary Research on E-business Technology and Strategy, pp. 325–334. Springer, Berlin (2012)
Zhu, Q., Liu, X., Cai, C., Liu, Q.: Image local invariant feature description fusing multiple information. In: Fifth International Conference on Machine Vision (ICMV 12), pp. 87830E–87830E. International Society for Optics and Photonics (2013)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kanwal, N., Bostanci, E. & Clark, A.F. Evaluation Method, Dataset Size or Dataset Content: How to Evaluate Algorithms for Image Matching?. J Math Imaging Vis 55, 378–400 (2016). https://doi.org/10.1007/s10851-015-0626-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10851-015-0626-4