Image Patch Matching Using Convolutional Descriptors with Euclidean Distance

  • Iaroslav Melekhov
  • Juho Kannala
  • Esa Rahtu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10118)


In this work we propose a neural network based image descriptor suitable for image patch matching, which is an important task in many computer vision applications. Our approach is influenced by recent success of deep convolutional neural networks (CNNs) in object detection and classification tasks. We develop a model which maps the raw input patch to a low dimensional feature vector so that the distance between representations is small for similar patches and large otherwise. As a distance metric we utilize \(L_2\) norm, i.e. Euclidean distance, which is fast to evaluate and used in most popular hand-crafted descriptors, such as SIFT. According to the results, our approach outperforms state-of-the-art \(L_2\)-based descriptors and can be considered as a direct replacement of SIFT. In addition, we conducted experiments with batch normalization and histogram equalization as a preprocessing method of the input data. The results confirm that these techniques further improve the performance of the proposed descriptor. Finally, we show promising preliminary results by appending our CNNs with recently proposed spatial transformer networks and provide a visualisation and interpretation of their impact.


Interest Point Image Patch Convolutional Neural Network Histogram Equalization Interest Point Detector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)CrossRefGoogle Scholar
  2. 2.
    Tola, E., Lepetit, V., Fua, P.: A fast local descriptor for dense matching. In: Proceedings of Computer Vision and Pattern Recognition (2008)Google Scholar
  3. 3.
    Hua, G., Brown, M., Winder, S.: Discriminant learning of local image descriptors. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2010)Google Scholar
  4. 4.
    Trzcinski, T., Christoudias, C.M., Lepetit, V., Fua, P.: Learning image descriptors with the boosting-trick. In: NIPS, pp. 278–286 (2012)Google Scholar
  5. 5.
    Trzcinski, T., Christoudias, M., Fua, P., Lepetit, V.: Boosting binary keypoint descriptors. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2013, pp. 2874–2881. IEEE Computer Society (2013)Google Scholar
  6. 6.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Learning local feature descriptors using convex optimisation. IEEE Trans. Pattern Anal. Mach. Intell. (2014)Google Scholar
  7. 7.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 25, pp. 1097–1105 (2012)Google Scholar
  8. 8.
    Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  9. 9.
    Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: Matchnet: unifying feature and metric learning for patch-based matching. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  10. 10.
    Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. Comput. Vis. Pattern Recogn. 1, 539–546 (2005)Google Scholar
  11. 11.
    Goesele, M., Snavely, N., Curless, B., Hoppe, H., Seitz, S.M.: Multi-view stereo for community photo collections. In: Proceedings of the 11th International Conference on Computer Vision (ICCV 2007), pp. 265–270. IEEE (2007)Google Scholar
  12. 12.
    Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. Int. J. Comput. Vis. 80, 189–210 (2008)CrossRefGoogle Scholar
  13. 13.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28, 2017–2025 (2015)Google Scholar
  14. 14.
    Strecha, C., Bronstein, A., Bronstein, M., Fua, P.: LDAHash: improved matching with smaller descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 34, 66–78 (2012)CrossRefGoogle Scholar
  15. 15.
    Jahrer, M., Grabner, M., Bischof, H.: Learned local descriptors for recognition and matching. In: Computer Vision Winter Workshop (2008)Google Scholar
  16. 16.
    Osendorfer, C., Bayer, J., Urban, S., Smagt, P.: Convolutional neural networks learn compact local image descriptors. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013. LNCS, vol. 8228, pp. 624–630. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-42051-1_77 CrossRefGoogle Scholar
  17. 17.
    Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  18. 18.
    Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. (IJRR) (2013)Google Scholar
  19. 19.
    Lin, T.Y., Cui, Y., Belongie, S., Hays, J.: Learning deep representations for ground-to-aerial geolocalization. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  20. 20.
    Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P., Moreno-Noguer, F.: Discriminative learning of deep convolutional feature point descriptors. In: International Conference on Computer Vision (2015)Google Scholar
  21. 21.
    Mobahi, H., Collobert, R., Weston, J.: Deep learning from temporal coherence in video. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, pp. 737–744. ACM (2009)Google Scholar
  22. 22.
    Hadsell, R., Sumit, C., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. Comput. Vis. Pattern Recogn. 2, 1735–6919 (2006)Google Scholar
  23. 23.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167 (2015)Google Scholar
  24. 24.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)Google Scholar
  25. 25.
    LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989)CrossRefGoogle Scholar
  26. 26.
    Zeiler, M.D.: ADADELTA: an adaptive learning rate method. CoRR abs/1212.5701 (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceAalto UniversityEspooFinland
  2. 2.Center for Machine Vision ResearchUniversity of OuluOuluFinland

Personalised recommendations