Weakly Supervised Learning of Dense Semantic Correspondences and Segmentation

  • Nikolai UferEmail author
  • Kam To Lui
  • Katja Schwarz
  • Paul Warkentin
  • Björn Ommer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11824)


Finding semantic correspondences is a challenging problem. With the breakthrough of CNNs stronger features are available for tasks like classification but not specifically for the requirements of semantic matching. In the following we present a weakly supervised learning approach which generates stronger features by encoding far more context than previous methods. First, we generate more suitable training data using a geometrically informed correspondence mining method which is less prone to spurious matches and requires only image category labels as supervision. Second, we introduce a new convolutional layer which is a learned mixture of differently strided convolutions and allows the network to encode much more context while preserving matching accuracy at the same time. The strong geometric encoding on the feature side enables us to learn a semantic flow network, which generates more natural deformations than parametric transformation based models and is able to predict foreground regions at the same time. Our semantic flow network outperforms current state-of-the-art on several semantic matching benchmarks and the learned features show astonishing performance regarding simple nearest neighbor matching.



This work has been supported in part by the DFG grand OM81/1-1 and a hardware donation from NVIDIA Corporation.

Supplementary material

480714_1_En_32_MOESM1_ESM.pdf (4.1 mb)
Supplementary material 1 (pdf 4223 KB)


  1. 1.
    Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition ofdeformations. TPAMI 11(6), 567–585 (1989)CrossRefGoogle Scholar
  2. 2.
    Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.: Detect what you can: detecting and representing objects using holistic models and body parts. In: CVPR (2014)Google Scholar
  3. 3.
    Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: CVRP (2005)Google Scholar
  4. 4.
    Choy, C.B., Gwak, J., Savarese, S., Chandraker, M.: Universal correspondence network. In: NeurIPS (2016)Google Scholar
  5. 5.
    Dai, J., et al.: Deformable convolutional networks. In: ICCV (2017)Google Scholar
  6. 6.
    Dalal, N., Triggs, W.: Histograms of oriented gradients for human detection. In: CVPR (2004)Google Scholar
  7. 7.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  8. 8.
    Dosovitskiy, A., et al.: Flownet: learning optical flow with convolutional networks. In: ICCV (2015)Google Scholar
  9. 9.
    Eigenstetter, A., Takami, M., Ommer, B.: Randomized max-margin compositions for visual recognition. In: CVPR (2014)Google Scholar
  10. 10.
    Faktor, A., Irani, M.: Co-segmentation by composition. In: ICCV (2013)Google Scholar
  11. 11.
    Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. TPAMI 28(4), 594–611 (2006)CrossRefGoogle Scholar
  12. 12.
    Ham, B., Cho, M., Schmid, C., Ponce, J.: Proposal flow. In: CVPR (2016)Google Scholar
  13. 13.
    Han, K., et al.: Scnet: learning semantic correspondence. In: ICCV (2017)Google Scholar
  14. 14.
    Hannah, M.J.: Computer matching of areas in stereo images (1974)Google Scholar
  15. 15.
    Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)zbMATHGoogle Scholar
  16. 16.
    Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NeurIPS (2015)Google Scholar
  17. 17.
    Jeon, S., Kim, S., Min, D., Sohn, K.: Parn: pyramidal affine regression networks for dense semantic correspondence. In: ECCV (2018)Google Scholar
  18. 18.
    Joulin, A., Bach, F., Ponce, J.: Discriminative clustering for image co-segmentation. In: CVPR (2010)Google Scholar
  19. 19.
    Kanazawa, A., Jacobs, D.W., Chandraker, M.: Warpnet: weakly supervised matching for single-view reconstruction. In: CVPR (2016)Google Scholar
  20. 20.
    Kim, J., Liu, C., Sha, F., Grauman, K.: Deformable spatial pyramid matching for fast dense correspondences. In: CVRP (2013)Google Scholar
  21. 21.
    Kim, S., Lin, S., Jeon, S.R., Min, D., Sohn, K.: Recurrent transformer networks for semantic correspondence. In: NeurIPS (2018)Google Scholar
  22. 22.
    Kim, S., Min, D., Ham, B., Jeon, S., Lin, S., Sohn, K.: Fcss: fully convolutional self-similarity for dense semantic correspondence. In: CVPR (2017)Google Scholar
  23. 23.
    Kim, S., Min, D., Ham, B., Lin, S., Sohn, K.: Fcss: fully convolutional self-similarity for dense semantic correspondence. In: TPAMI (2018)Google Scholar
  24. 24.
    Kim, S., Min, D., Lin, S., Sohn, K.: Dctm: discrete-continuous transformation matching for semantic flow. In: ICCV (2017)Google Scholar
  25. 25.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014).
  26. 26.
    Kolmogorov, V.: Convergent tree-reweighted message passing for energyminimization. TPAMI 28(10), 1568–1583 (2006)CrossRefGoogle Scholar
  27. 27.
    Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NeurIPS (2011)Google Scholar
  28. 28.
    Krizhevsky, A., Sutskever, I., Geoffrey E., H.: Imagenet classification with deep convolutional neural networks. In: NeurIPS (2012)Google Scholar
  29. 29.
    Li, W., Hosseini Jafari, O., Rother, C.: Deep object co-segmentation. In: ACCV (2018)Google Scholar
  30. 30.
    Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. TPAMI 33(5), 978–994 (2011)CrossRefGoogle Scholar
  31. 31.
    Long, J.L., Zhang, N., Darrell, T.: Do convnets learn correspondence? In: NeurIPS (2014)Google Scholar
  32. 32.
    Lorenz, D., Bereska, L., Milbich, T., Ommer, B.: Unsupervised part-based disentangling of object shape and appearance. In: CVPR (2019)Google Scholar
  33. 33.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)CrossRefGoogle Scholar
  34. 34.
    Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: NeurIPS (2017)Google Scholar
  35. 35.
    Monroy, A., Ommer, B.: Beyond bounding-boxes: learning object shape by model-driven grouping. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 580–593. Springer, Heidelberg (2012). Scholar
  36. 36.
    Novotny, D., Larlus, D., Vedaldi, A.: Anchornet: a weakly supervised network to learn geometry-sensitive features for semantic matching. In: CVPR (2017)Google Scholar
  37. 37.
    Rocco, I., Arandjelovi, R., Inria, J.S.: Convolutional neural network architecture for geometric matching. In: CVPR (2017)Google Scholar
  38. 38.
    Rocco, I., Arandjelović, R., Sivic, J.: End-to-end weakly-supervised semantic alignment. In: CVPR (2018)Google Scholar
  39. 39.
    Rubio, J.C., Serrat, J., López, A., Paragios, N.: Unsupervised co-segmentation through region matching. In: CVPR (2012)Google Scholar
  40. 40.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  41. 41.
    Szeliski, R., et al.: Image alignment and stitching: a tutorial. Found. Trends® Comput. Graph. Vis. 2(1), 1–104 (2007)zbMATHGoogle Scholar
  42. 42.
    Taniai, T., Sinha, S.N., Sato, Y.: Joint recovery of dense correspondence and cosegmentation in two images. In: CVPR (2016)Google Scholar
  43. 43.
    Torresani, L., Kolmogorov, V., Rother, C.: A dual decomposition approach to feature correspondence. TPAMI 35(2), 259–271 (2013)CrossRefGoogle Scholar
  44. 44.
    Ufer, N., Ommer, B.: Deep semantic feature matching. In: CVPR (2017)Google Scholar
  45. 45.
    Wang, S., Luo, L., Zhang, N., Li, J.: Autoscaler: scale-attention networks for visual correspondence. arXiv preprint arXiv:1611.05837 (2016)
  46. 46.
    Yarlagadda, P., Ommer, B.: From meaningful contours to discriminative object shape. In: ECCV (2012)Google Scholar
  47. 47.
    Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
  48. 48.
    Zhou, T., Lee, Y.J., Yu, S., Efros, A.: Flowweb: joint image set alignment by weaving consistent pixel-wise correspondences. In: CVPR (2015)Google Scholar
  49. 49.
    Zhou, T., Krahenbuhl, P., Aubry, M., Huang, Q., Efros, A.A.: Learning dense correspondences via 3D-guided cycle consistency. In: CVPR (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Nikolai Ufer
    • 1
    Email author
  • Kam To Lui
    • 1
  • Katja Schwarz
    • 1
  • Paul Warkentin
    • 1
  • Björn Ommer
    • 1
  1. 1.Heidelberg University, HCI/IWRHeidelbergGermany

Personalised recommendations