Weakly Supervised Learning of Dense Semantic Correspondences and Segmentation

Ufer, Nikolai; Lui, Kam To; Schwarz, Katja; Warkentin, Paul; Ommer, Björn

doi:10.1007/978-3-030-33676-9_32

Nikolai Ufer¹¹,
Kam To Lui¹¹,
Katja Schwarz¹¹,
Paul Warkentin¹¹ &
…
Björn Ommer¹¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11824))

Included in the following conference series:

German Conference on Pattern Recognition

1865 Accesses
3 Citations

Abstract

Finding semantic correspondences is a challenging problem. With the breakthrough of CNNs stronger features are available for tasks like classification but not specifically for the requirements of semantic matching. In the following we present a weakly supervised learning approach which generates stronger features by encoding far more context than previous methods. First, we generate more suitable training data using a geometrically informed correspondence mining method which is less prone to spurious matches and requires only image category labels as supervision. Second, we introduce a new convolutional layer which is a learned mixture of differently strided convolutions and allows the network to encode much more context while preserving matching accuracy at the same time. The strong geometric encoding on the feature side enables us to learn a semantic flow network, which generates more natural deformations than parametric transformation based models and is able to predict foreground regions at the same time. Our semantic flow network outperforms current state-of-the-art on several semantic matching benchmarks and the learned features show astonishing performance regarding simple nearest neighbor matching.

N. Ufer and K. T. Lui—Both authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition ofdeformations. TPAMI 11(6), 567–585 (1989)
Article Google Scholar
Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.: Detect what you can: detecting and representing objects using holistic models and body parts. In: CVPR (2014)
Google Scholar
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: CVRP (2005)
Google Scholar
Choy, C.B., Gwak, J., Savarese, S., Chandraker, M.: Universal correspondence network. In: NeurIPS (2016)
Google Scholar
Dai, J., et al.: Deformable convolutional networks. In: ICCV (2017)
Google Scholar
Dalal, N., Triggs, W.: Histograms of oriented gradients for human detection. In: CVPR (2004)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Dosovitskiy, A., et al.: Flownet: learning optical flow with convolutional networks. In: ICCV (2015)
Google Scholar
Eigenstetter, A., Takami, M., Ommer, B.: Randomized max-margin compositions for visual recognition. In: CVPR (2014)
Google Scholar
Faktor, A., Irani, M.: Co-segmentation by composition. In: ICCV (2013)
Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. TPAMI 28(4), 594–611 (2006)
Article Google Scholar
Ham, B., Cho, M., Schmid, C., Ponce, J.: Proposal flow. In: CVPR (2016)
Google Scholar
Han, K., et al.: Scnet: learning semantic correspondence. In: ICCV (2017)
Google Scholar
Hannah, M.J.: Computer matching of areas in stereo images (1974)
Google Scholar
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)
MATH Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NeurIPS (2015)
Google Scholar
Jeon, S., Kim, S., Min, D., Sohn, K.: Parn: pyramidal affine regression networks for dense semantic correspondence. In: ECCV (2018)
Google Scholar
Joulin, A., Bach, F., Ponce, J.: Discriminative clustering for image co-segmentation. In: CVPR (2010)
Google Scholar
Kanazawa, A., Jacobs, D.W., Chandraker, M.: Warpnet: weakly supervised matching for single-view reconstruction. In: CVPR (2016)
Google Scholar
Kim, J., Liu, C., Sha, F., Grauman, K.: Deformable spatial pyramid matching for fast dense correspondences. In: CVRP (2013)
Google Scholar
Kim, S., Lin, S., Jeon, S.R., Min, D., Sohn, K.: Recurrent transformer networks for semantic correspondence. In: NeurIPS (2018)
Google Scholar
Kim, S., Min, D., Ham, B., Jeon, S., Lin, S., Sohn, K.: Fcss: fully convolutional self-similarity for dense semantic correspondence. In: CVPR (2017)
Google Scholar
Kim, S., Min, D., Ham, B., Lin, S., Sohn, K.: Fcss: fully convolutional self-similarity for dense semantic correspondence. In: TPAMI (2018)
Google Scholar
Kim, S., Min, D., Lin, S., Sohn, K.: Dctm: discrete-continuous transformation matching for semantic flow. In: ICCV (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980
Kolmogorov, V.: Convergent tree-reweighted message passing for energyminimization. TPAMI 28(10), 1568–1583 (2006)
Article Google Scholar
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NeurIPS (2011)
Google Scholar
Krizhevsky, A., Sutskever, I., Geoffrey E., H.: Imagenet classification with deep convolutional neural networks. In: NeurIPS (2012)
Google Scholar
Li, W., Hosseini Jafari, O., Rother, C.: Deep object co-segmentation. In: ACCV (2018)
Google Scholar
Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. TPAMI 33(5), 978–994 (2011)
Article Google Scholar
Long, J.L., Zhang, N., Darrell, T.: Do convnets learn correspondence? In: NeurIPS (2014)
Google Scholar
Lorenz, D., Bereska, L., Milbich, T., Ommer, B.: Unsupervised part-based disentangling of object shape and appearance. In: CVPR (2019)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Article Google Scholar
Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: NeurIPS (2017)
Google Scholar
Monroy, A., Ommer, B.: Beyond bounding-boxes: learning object shape by model-driven grouping. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 580–593. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_42
Chapter Google Scholar
Novotny, D., Larlus, D., Vedaldi, A.: Anchornet: a weakly supervised network to learn geometry-sensitive features for semantic matching. In: CVPR (2017)
Google Scholar
Rocco, I., Arandjelovi, R., Inria, J.S.: Convolutional neural network architecture for geometric matching. In: CVPR (2017)
Google Scholar
Rocco, I., Arandjelović, R., Sivic, J.: End-to-end weakly-supervised semantic alignment. In: CVPR (2018)
Google Scholar
Rubio, J.C., Serrat, J., López, A., Paragios, N.: Unsupervised co-segmentation through region matching. In: CVPR (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szeliski, R., et al.: Image alignment and stitching: a tutorial. Found. Trends® Comput. Graph. Vis. 2(1), 1–104 (2007)
MATH Google Scholar
Taniai, T., Sinha, S.N., Sato, Y.: Joint recovery of dense correspondence and cosegmentation in two images. In: CVPR (2016)
Google Scholar
Torresani, L., Kolmogorov, V., Rother, C.: A dual decomposition approach to feature correspondence. TPAMI 35(2), 259–271 (2013)
Article Google Scholar
Ufer, N., Ommer, B.: Deep semantic feature matching. In: CVPR (2017)
Google Scholar
Wang, S., Luo, L., Zhang, N., Li, J.: Autoscaler: scale-attention networks for visual correspondence. arXiv preprint arXiv:1611.05837 (2016)
Yarlagadda, P., Ommer, B.: From meaningful contours to discriminative object shape. In: ECCV (2012)
Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Zhou, T., Lee, Y.J., Yu, S., Efros, A.: Flowweb: joint image set alignment by weaving consistent pixel-wise correspondences. In: CVPR (2015)
Google Scholar
Zhou, T., Krahenbuhl, P., Aubry, M., Huang, Q., Efros, A.A.: Learning dense correspondences via 3D-guided cycle consistency. In: CVPR (2016)
Google Scholar

Download references

Acknowledgment

This work has been supported in part by the DFG grand OM81/1-1 and a hardware donation from NVIDIA Corporation.

Author information

Authors and Affiliations

Heidelberg University, HCI/IWR, Heidelberg, Germany
Nikolai Ufer, Kam To Lui, Katja Schwarz, Paul Warkentin & Björn Ommer

Authors

Nikolai Ufer
View author publications
You can also search for this author in PubMed Google Scholar
Kam To Lui
View author publications
You can also search for this author in PubMed Google Scholar
Katja Schwarz
View author publications
You can also search for this author in PubMed Google Scholar
Paul Warkentin
View author publications
You can also search for this author in PubMed Google Scholar
Björn Ommer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikolai Ufer .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
University of Hamburg, Hamburg, Germany
Simone Frintrop
University of Münster, Münster, Germany
Xiaoyi Jiang

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4223 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ufer, N., Lui, K.T., Schwarz, K., Warkentin, P., Ommer, B. (2019). Weakly Supervised Learning of Dense Semantic Correspondences and Segmentation. In: Fink, G., Frintrop, S., Jiang, X. (eds) Pattern Recognition. DAGM GCPR 2019. Lecture Notes in Computer Science(), vol 11824. Springer, Cham. https://doi.org/10.1007/978-3-030-33676-9_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-33676-9_32
Published: 25 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33675-2
Online ISBN: 978-3-030-33676-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics