Weakly Supervised Deep Metric Learning for Template Matching

  • Davit BuniatyanEmail author
  • Sergiy Popovych
  • Dodam Ih
  • Thomas Macrina
  • Jonathan Zung
  • H. Sebastian Seung
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 943)


Template matching by normalized cross correlation (NCC) is widely used for finding image correspondences. NCCNet improves the robustness of this algorithm by transforming image features with siamese convolutional nets trained to maximize the contrast between NCC values of true and false matches. The main technical contribution is a weakly supervised learning algorithm for the training. Unlike fully supervised approaches to metric learning, the method can improve upon vanilla NCC without receiving locations of true matches during training. The improvement is quantified through patches of brain images from serial section electron microscopy. Relative to a parameter-tuned bandpass filter, siamese convolutional nets significantly reduce false matches. The improved accuracy of the method could be essential for connectomics, because emerging petascale datasets may require billions of template matches during assembly. Our method is also expected to generalize to other computer vision applications that use template matching to find image correspondences.


Metric learning Weak supervision Siamese convolutional neural networks Normalized cross correlation 



This work has been supported by AWS Machine Learning Award and the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/Interior Business Center (DoI/IBC) contract number D16PC0005. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/IBC, or the U.S. Government.


  1. 1.
    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Computer vision–ECCV 2006, pp. 404–417 (2006)Google Scholar
  2. 2.
    Berg, A.C., Malik, J.: Geometric blur for template matching. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1, p. I. IEEE (2001)Google Scholar
  3. 3.
    Bromley, J., Bentz, J.W., Bottou, L., Guyon, I., LeCun, Y., Moore, C., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. IJPRAI 7(4), 669–688 (1993)Google Scholar
  4. 4.
    Brown, M., Hua, G., Winder, S.: Discriminative learning of local image descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 43–57 (2011)CrossRefGoogle Scholar
  5. 5.
    Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 539–546. IEEE (2005)Google Scholar
  6. 6.
    Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., Brox, T.: FlowNet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)Google Scholar
  7. 7.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)Google Scholar
  9. 9.
    Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: MatchNet: unifying feature and metric learning for patch-based matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3279–3286 (2015)Google Scholar
  10. 10.
    Hegde, V., Zadeh, R.: FusionNet: 3D object classification using multiple data representations. arXiv preprint arXiv:1607.05695 (2016)
  11. 11.
    Heo, Y.S., Lee, K.M., Lee, S.U.: Robust stereo matching using adaptive normalized cross-correlation. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 807–822 (2011)CrossRefGoogle Scholar
  12. 12.
    Kulis, B., et al.: Metric learning: a survey. Found. Trends® Mach. Learn. 5(4), 287–364 (2013)CrossRefGoogle Scholar
  13. 13.
    Kumar, B.G., Carneiro, G., Reid, I., et al.: Learning local image descriptors with deep Siamese and triplet convolutional networks by minimising global loss functions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5385–5394 (2016)Google Scholar
  14. 14.
    Lenc, K., Vedaldi, A.: Learning covariant feature detectors. In: Computer Vision–ECCV 2016 Workshops, pp. 100–117. Springer, Heidelberg (2016)Google Scholar
  15. 15.
    Lewis, J.P.: Fast template matching. In: Vision Interface, vol. 95, pp. 15–19 (1995)Google Scholar
  16. 16.
    Lichtman, J.W., Pfister, H., Shavit, N.: The big data challenges of connectomics. Nat. Neurosci. 17(11), 1448–1454 (2014)CrossRefGoogle Scholar
  17. 17.
    Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2011)CrossRefGoogle Scholar
  18. 18.
    Long, J.L., Zhang, N., Darrell, T.: Do convnets learn correspondence? In: Advances in Neural Information Processing Systems, pp. 1601–1609 (2014)Google Scholar
  19. 19.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  20. 20.
    Luo, J., Konofagou, E.E.: A fast normalized cross-correlation calculation method for motion estimation. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 57(6), 1347–1357 (2010)CrossRefGoogle Scholar
  21. 21.
    Pathak, D., Girshick, R., Dollár, P., Darrell, T., Hariharan, B.: Learning features by watching objects move. arXiv preprint arXiv:1612.06370 (2016)
  22. 22.
    Preibisch, S., Saalfeld, S., Rohlfing, T., Tomancak, P.: Bead-based mosaicing of single plane illumination microscopy images using geometric local descriptor matching. In: SPIE Medical Imaging, p. 72592S. International Society for Optics and Photonics (2009)Google Scholar
  23. 23.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer, Heidelberg (2015)Google Scholar
  24. 24.
    Saalfeld, S., Fetter, R., Cardona, A., Tomancak, P.: Elastic volume reconstruction from series of ultra-thin microscopy sections. Nat. Methods 9(7), 717–720 (2012)CrossRefGoogle Scholar
  25. 25.
    Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P., Moreno-Noguer, F.: Discriminative learning of deep convolutional feature point descriptors. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 118–126 (2015)Google Scholar
  26. 26.
    Smeulders, A.W.M., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)CrossRefGoogle Scholar
  27. 27.
    Subramaniam, A., Chatterjee, M., Mittal, A.: Deep neural networks with inexact matching for person re-identification. In: Advances in Neural Information Processing Systems, pp. 2667–2675 (2016)Google Scholar
  28. 28.
    Tulyakov, S., Ivanov, A., Fleuret, F.: Weakly supervised learning of deep metrics for stereo reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1339–1348 (2017)Google Scholar
  29. 29.
    Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.S.: End-to-end representation learning for correlation filter based tracking. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5000–5008. IEEE (2017)Google Scholar
  30. 30.
    Yang, L., Jin, R.: Distance metric learning: a comprehensive survey. Michigan State Univ. 2(2), 4 (2006)Google Scholar
  31. 31.
    Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: Lift: learned invariant feature transform. In: European Conference on Computer Vision, pp. 467–483. Springer, Heidelberg (2016)Google Scholar
  32. 32.
    Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2015)Google Scholar
  33. 33.
    Zheng, Z., Lauritzen, J.S., Perlman, E., Robinson, C.G., Nichols, M., Milkie, D., Torrens, O., Price, J., Fisher, C.B., Sharifi, N., Calle-Schuler, S.A., Kmecova, L., Ali, I.J., Karsh, B., Trautman, E.T., Bogovic, J., Hanslovsky, P., Jefferis, G.S.X.E., Kazhdan, M., Khairy, K., Saalfeld, S., Fetter, R.D., Bock, D.D.: A complete electron microscopy volume of the brain of adult drosophila melanogaster. bioRxiv (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Davit Buniatyan
    • 1
    Email author
  • Sergiy Popovych
    • 1
  • Dodam Ih
    • 1
  • Thomas Macrina
    • 1
  • Jonathan Zung
    • 1
  • H. Sebastian Seung
    • 1
  1. 1.Princeton UniversityPrincetonUSA

Personalised recommendations