Evaluation Method, Dataset Size or Dataset Content: How to Evaluate Algorithms for Image Matching?

Kanwal, Nadia; Bostanci, Erkan; Clark, Adrian F.

doi:10.1007/s10851-015-0626-4

Evaluation Method, Dataset Size or Dataset Content: How to Evaluate Algorithms for Image Matching?

Published: 29 January 2016

Volume 55, pages 378–400, (2016)
Cite this article

Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Nadia Kanwal¹,
Erkan Bostanci² &
Adrian F. Clark³

862 Accesses
4 Citations
Explore all metrics

Abstract

Most vision papers have to include some evaluation work in order to demonstrate that the algorithm proposed is an improvement on existing ones. Generally, these evaluation results are presented in tabular or graphical forms. Neither of these is ideal because there is no indication as to whether any performance differences are statistically significant. Moreover, the size and nature of the dataset used for evaluation will obviously have a bearing on the results, and neither of these factors are usually discussed. This paper evaluates the effectiveness of commonly used performance characterization metrics for image feature detection and description for matching problems and explores the use of statistical tests such as McNemar’s test and ANOVA as better alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature Evaluation with High-Resolution Images

SeLibCV: A Service Library for Computer Vision Researchers

Better than SIFT?

Article 17 May 2015

Notes

References

Abdi, H.: Bonferroni and Šidák Corrections for Multiple Comparisons. Sage, Thousand Oaks, CA (2007)
Google Scholar
de Araújo, S.A., Kim, H.Y.: Color-ciratefi: A color-based rst-invariant template matching algorithm. In: IWSSIP-17th International Conference on Systems, Signals and Image Processing (2010)
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: Computer Vision-ECCV 2006, pp. 404–417. Springer, Berlin (2006)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser B (Methodol) 57(1), 289–300 (1995)
MathSciNet MATH Google Scholar
Benjamini, Y., Yekutieli, D.: The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29(4), 1165–1188 (2001)
Article MathSciNet MATH Google Scholar
Bostanci, E., Kanwal, N., Clark, A.: Spatial statistics of image features for performance comparison. IEEE Trans. Image Process. 23(1), 153–162 (2014). doi:10.1109/TIP.2013.2286907
Article MathSciNet Google Scholar
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: binary robust independent elementary features. Comput. Vis. ECCV 2010, 778–792 (2010)
Google Scholar
Clark, A.F., Clark, C.: Performance Characterization in Computer Vision a Tutorial [2010-02-10]. http://peipa.essex.ac.uk/benchmark/tutorial/esser/tutorial.pdf (1999)
Cornelis, N., Van Gool, L.: Fast scale invariant feature detection and matching on programmable graphics hardware. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2008. CVPRW’08, pp. 1–8. IEEE (2008)
Deng, H., Mortensen, E.N., Shapiro, L., Dietterich, T.G.: Reinforcement matching using region context. In: CVPRW’06. Conference on Computer Vision and Pattern Recognition Workshop, 2006, pp. 11–11. IEEE (2006)
Derrac, J., García, S., Molina, D., Herrera, F.: A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evolut. Comput. 1(1), 3–18 (2011)
Article Google Scholar
Durkalski, V.L., Palesch, Y.Y., Lipsitz, S.R., Rust, P.F.: Analysis of clustered matched-pair data. Stat. Med. 22(15), 2417–2428 (2003)
Article Google Scholar
Ehsan, S., Kanwal, N., Clark, A.F., McDonald-Maier, K.D.: Measuring the coverage of interest point detectors. In: Image Analysis and Recognition, pp. 253–261. Springer, Berlin (2011)
Frothingham, R.: Rates of torsades de pointes associated with ciprofloxacin, ofloxacin, levofloxacin, gatifloxacin, and moxifloxacin. Pharmacotherapy 21(12), 1468–1472 (2001)
Article Google Scholar
Galen, R.S., Gambino, S.R.: Beyond Normality: The Predictive Value and Efficiency of Medical Diagnoses. Wiley, New York (1975)
Google Scholar
Gönen, M., Panageas, K.S., Larson, S.M.: Statistical issues in analysis of diagnostic imaging experiments with multiple observations per patient1. Radiology 221(3), 763–767 (2001)
Article Google Scholar
Guney, M., Arica, N.: Maximally stable texture regions. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 4549–4552. IEEE (2010)
Guo, X., Cao, X.: Triangle-constraint for finding more good features. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 1393–1396. IEEE (2010)
Hamel, L.: Model assessment with ROC curves. In: Encyclopedia of DataWarehousing and Mining, 2nd edn., pp. 1316–1323. IGI Global Web, University of Rhode Island (2009)
Hanczar, B., Hua, J., Sima, C., Weinstein, J., Bittner, M., Dougherty, E.R.: Small-sample precision of roc-related estimates. Bioinformatics 26(6), 822–830 (2010)
Article Google Scholar
Hartley, H.O.: The use of range in analysis of variance. Biometrika 37(3/4), 271–280 (1950)
Article MathSciNet MATH Google Scholar
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision, vol. 2. Cambridge University Press, Cambridge (2000)
MATH Google Scholar
Hochberg, Y., Benjamini, Y.: More powerful procedures for multiple significance testing. Stat. Med. 9(7), 811–818 (1990)
Article Google Scholar
von Hundelshausen, F., Sukthankar, R.: D-nets: Beyond patch-based image descriptors. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2941–2948. IEEE (2012)
Issac, A., Velayutham, C.S., et al.: Saddlesurf: A saddle based interest point detector. In: Mathematical Modelling and Scientific Computation, pp. 413–420. Springer, Berlin (2012)
Jain, A.K., Lee, J.E., Jin, R.: Graffiti-id: Matching and retrieval of graffiti images. In: Proceedings of the First ACM workshop on Multimedia in forensics, pp. 1–6. ACM (2009)
Kanwal, N.: Motion Tracking in Video using the Best Feature Extraction Technique. Grin Publishing, Germany (2009)
Google Scholar
Kapsalas, P., Kollias, S.: Affine morphological shape stable boundary regions (ssbr) for image representation. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 3381–3384. IEEE (2011)
Kar, S.S., Ramalingam, A.: Is 30 the magic number? issues in sample size estimation. Natl. J. Commun. Med. 4(1), 175–179 (2013)
Kerr, D., Coleman, S., Scotney, B.: Fesid: Finite element scale invariant detector. In: Image Analysis and Processing-ICIAP 2009, pp. 72–81. Springer, Berlin (2009)
Lakemond, R., Fookes, C., Sridharan, S.: Affine adaptation of local image features using the hessian matrix. In: Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, 2009. AVSS’09. pp. 496–501. IEEE (2009)
Liu, J., Zeng, G.: Description of interest regions with oriented local self-similarity. Opt. Commun. 285(10), 2549–2557 (2012)
Article Google Scholar
Lobo, J.M., Jiménez-Valverde, A., Real, R.: Auc: a misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr. 17(2), 145–151 (2008)
Article Google Scholar
Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Lusted, L.B.: Signal detectability and medical decision-making. Science 171(3977), 1217–1219 (1971)
Article Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing, vol. 999. MIT Press, Cambridge, MA (1999)
MATH Google Scholar
Martins, P., Gatta, C., Carvalho, P.: Feature-driven maximally stable extremal regions. In: VISAPP (1), pp. 490–497 (2012)
McNemar, Q.: Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2), 153–157 (1947)
Article Google Scholar
Medathati, N., Sivaswamy, J.: Local descriptor based on texture of projections. In: Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing, pp. 398–404. ACM (2010)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1615–1630 (2005)
Pearson, E.S., Hartley, H.O.: Biometrika Tables for Statisticians,vol. 1. Cambridge University Press, Cambridge, UK (1962)
Perneger, T.V.: What’s wrong with bonferroni adjustments. Br. Med. J. 316, 1236–1238 (1998). http://www.bmj.com/cgi/content/full/316/7139/1236
Provost, F.J., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: ICML, vol. 98, pp. 445–453 (1998)
Rassem, T.H., Khoo, B.E.: New color image histogram-based detectors. In: Visual Informatics: Sustaining Research and Innovations, pp. 151–163. Springer, Berlin (2011)
Sakthivel, K., Narayanan, R.: An automated detection of glaucoma using histogram features. Int. J. Ophthalmol. 8(1), 194 (2015)
Google Scholar
Schmid, C., Mohr, R., Bauckhage, C.: Evaluation of interest point detectors. Int. J. Comput. Vis. 37(2), 151–172 (2000)
Article MATH Google Scholar
Toews, M., Wells, W.: Sift-rank: Ordinal description for invariant feature correspondence. In: CVPR 2009. IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 172–177. IEEE (2009)
Tuytelaars, T., Mikolajczyk, K.: Local invariant feature detectors: a survey. Found. Trends Comput. Gr. Vis. 3(3), 177–280 (2008)
Article Google Scholar
Uemura, N., Okamoto, S., Yamamoto, S., Matsumura, N., Yamaguchi, S., Yamakido, M., Taniyama, K., Sasaki, N., Schlemper, R.: Helicobacter pylori infection and the development of gastric cancer. N. Engl. J. Med. 345(11), 784 (2001)
Article Google Scholar
Valgren, C., Lilienthal, A.: SIFT, SURF and seasons: Long-term outdoor localization using local features. In: Proceedings of the European Conference on Mobile Robots (ECMR), pp. 253–258. Citeseer (2007)
Vasey, M.W., Thayer, J.F.: The continuing problem of false positives in repeated measures anova in psychophysiology: A multivariate solution. Psychophysiology 24(4), 479–486 (1987)
Article Google Scholar
Wang, X.Y., Niu, P.P., Yang, H.Y., Chen, L.L.: Affine invariant image watermarking using intensity probability density-based harris laplace detector. J. Vis. Commun. Image Represent. 23(6), 892–907 (2012)
Article Google Scholar
Wellner, B., McCallum, A., Peng, F., Hay, M.: An integrated, conditional model of information extraction and coreference with application to citation matching. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 593–601. AUAI Press (2004)
Zhang, M., Zhang, Y., Wang, J.: Eliminating false matches using geometric context. In: Contemporary Research on E-business Technology and Strategy, pp. 325–334. Springer, Berlin (2012)
Zhu, Q., Liu, X., Cai, C., Liu, Q.: Image local invariant feature description fusing multiple information. In: Fifth International Conference on Machine Vision (ICMV 12), pp. 87830E–87830E. International Society for Optics and Photonics (2013)

Download references

Author information

Authors and Affiliations

Lahore College for Women University, Lahore, Pakistan
Nadia Kanwal
Computer Engineering Department, Ankara University, Ankara, Turkey
Erkan Bostanci
School of Computer Science & Electronic Engineering, University of Essex, Colchester, UK
Adrian F. Clark

Authors

Nadia Kanwal
View author publications
You can also search for this author in PubMed Google Scholar
Erkan Bostanci
View author publications
You can also search for this author in PubMed Google Scholar
Adrian F. Clark
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nadia Kanwal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kanwal, N., Bostanci, E. & Clark, A.F. Evaluation Method, Dataset Size or Dataset Content: How to Evaluate Algorithms for Image Matching?. J Math Imaging Vis 55, 378–400 (2016). https://doi.org/10.1007/s10851-015-0626-4

Download citation

Received: 17 November 2014
Accepted: 21 December 2015
Published: 29 January 2016
Issue Date: July 2016
DOI: https://doi.org/10.1007/s10851-015-0626-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation Method, Dataset Size or Dataset Content: How to Evaluate Algorithms for Image Matching?

Abstract

Access this article

Similar content being viewed by others

Feature Evaluation with High-Resolution Images

SeLibCV: A Service Library for Computer Vision Researchers

Better than SIFT?

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluation Method, Dataset Size or Dataset Content: How to Evaluate Algorithms for Image Matching?

Abstract

Access this article

Similar content being viewed by others

Feature Evaluation with High-Resolution Images

SeLibCV: A Service Library for Computer Vision Researchers

Better than SIFT?

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation