Skip to main content

Weakly Supervised Learning of Dense Semantic Correspondences and Segmentation

  • Conference paper
  • First Online:
Pattern Recognition (DAGM GCPR 2019)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11824))

Included in the following conference series:

Abstract

Finding semantic correspondences is a challenging problem. With the breakthrough of CNNs stronger features are available for tasks like classification but not specifically for the requirements of semantic matching. In the following we present a weakly supervised learning approach which generates stronger features by encoding far more context than previous methods. First, we generate more suitable training data using a geometrically informed correspondence mining method which is less prone to spurious matches and requires only image category labels as supervision. Second, we introduce a new convolutional layer which is a learned mixture of differently strided convolutions and allows the network to encode much more context while preserving matching accuracy at the same time. The strong geometric encoding on the feature side enables us to learn a semantic flow network, which generates more natural deformations than parametric transformation based models and is able to predict foreground regions at the same time. Our semantic flow network outperforms current state-of-the-art on several semantic matching benchmarks and the learned features show astonishing performance regarding simple nearest neighbor matching.

N. Ufer and K. T. Lui—Both authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition ofdeformations. TPAMI 11(6), 567–585 (1989)

    Article  Google Scholar 

  2. Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.: Detect what you can: detecting and representing objects using holistic models and body parts. In: CVPR (2014)

    Google Scholar 

  3. Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: CVRP (2005)

    Google Scholar 

  4. Choy, C.B., Gwak, J., Savarese, S., Chandraker, M.: Universal correspondence network. In: NeurIPS (2016)

    Google Scholar 

  5. Dai, J., et al.: Deformable convolutional networks. In: ICCV (2017)

    Google Scholar 

  6. Dalal, N., Triggs, W.: Histograms of oriented gradients for human detection. In: CVPR (2004)

    Google Scholar 

  7. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)

    Google Scholar 

  8. Dosovitskiy, A., et al.: Flownet: learning optical flow with convolutional networks. In: ICCV (2015)

    Google Scholar 

  9. Eigenstetter, A., Takami, M., Ommer, B.: Randomized max-margin compositions for visual recognition. In: CVPR (2014)

    Google Scholar 

  10. Faktor, A., Irani, M.: Co-segmentation by composition. In: ICCV (2013)

    Google Scholar 

  11. Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. TPAMI 28(4), 594–611 (2006)

    Article  Google Scholar 

  12. Ham, B., Cho, M., Schmid, C., Ponce, J.: Proposal flow. In: CVPR (2016)

    Google Scholar 

  13. Han, K., et al.: Scnet: learning semantic correspondence. In: ICCV (2017)

    Google Scholar 

  14. Hannah, M.J.: Computer matching of areas in stereo images (1974)

    Google Scholar 

  15. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)

    MATH  Google Scholar 

  16. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NeurIPS (2015)

    Google Scholar 

  17. Jeon, S., Kim, S., Min, D., Sohn, K.: Parn: pyramidal affine regression networks for dense semantic correspondence. In: ECCV (2018)

    Google Scholar 

  18. Joulin, A., Bach, F., Ponce, J.: Discriminative clustering for image co-segmentation. In: CVPR (2010)

    Google Scholar 

  19. Kanazawa, A., Jacobs, D.W., Chandraker, M.: Warpnet: weakly supervised matching for single-view reconstruction. In: CVPR (2016)

    Google Scholar 

  20. Kim, J., Liu, C., Sha, F., Grauman, K.: Deformable spatial pyramid matching for fast dense correspondences. In: CVRP (2013)

    Google Scholar 

  21. Kim, S., Lin, S., Jeon, S.R., Min, D., Sohn, K.: Recurrent transformer networks for semantic correspondence. In: NeurIPS (2018)

    Google Scholar 

  22. Kim, S., Min, D., Ham, B., Jeon, S., Lin, S., Sohn, K.: Fcss: fully convolutional self-similarity for dense semantic correspondence. In: CVPR (2017)

    Google Scholar 

  23. Kim, S., Min, D., Ham, B., Lin, S., Sohn, K.: Fcss: fully convolutional self-similarity for dense semantic correspondence. In: TPAMI (2018)

    Google Scholar 

  24. Kim, S., Min, D., Lin, S., Sohn, K.: Dctm: discrete-continuous transformation matching for semantic flow. In: ICCV (2017)

    Google Scholar 

  25. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980

  26. Kolmogorov, V.: Convergent tree-reweighted message passing for energyminimization. TPAMI 28(10), 1568–1583 (2006)

    Article  Google Scholar 

  27. Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NeurIPS (2011)

    Google Scholar 

  28. Krizhevsky, A., Sutskever, I., Geoffrey E., H.: Imagenet classification with deep convolutional neural networks. In: NeurIPS (2012)

    Google Scholar 

  29. Li, W., Hosseini Jafari, O., Rother, C.: Deep object co-segmentation. In: ACCV (2018)

    Google Scholar 

  30. Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. TPAMI 33(5), 978–994 (2011)

    Article  Google Scholar 

  31. Long, J.L., Zhang, N., Darrell, T.: Do convnets learn correspondence? In: NeurIPS (2014)

    Google Scholar 

  32. Lorenz, D., Bereska, L., Milbich, T., Ommer, B.: Unsupervised part-based disentangling of object shape and appearance. In: CVPR (2019)

    Google Scholar 

  33. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)

    Article  Google Scholar 

  34. Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: NeurIPS (2017)

    Google Scholar 

  35. Monroy, A., Ommer, B.: Beyond bounding-boxes: learning object shape by model-driven grouping. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 580–593. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_42

    Chapter  Google Scholar 

  36. Novotny, D., Larlus, D., Vedaldi, A.: Anchornet: a weakly supervised network to learn geometry-sensitive features for semantic matching. In: CVPR (2017)

    Google Scholar 

  37. Rocco, I., Arandjelovi, R., Inria, J.S.: Convolutional neural network architecture for geometric matching. In: CVPR (2017)

    Google Scholar 

  38. Rocco, I., Arandjelović, R., Sivic, J.: End-to-end weakly-supervised semantic alignment. In: CVPR (2018)

    Google Scholar 

  39. Rubio, J.C., Serrat, J., López, A., Paragios, N.: Unsupervised co-segmentation through region matching. In: CVPR (2012)

    Google Scholar 

  40. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  41. Szeliski, R., et al.: Image alignment and stitching: a tutorial. Found. Trends® Comput. Graph. Vis. 2(1), 1–104 (2007)

    MATH  Google Scholar 

  42. Taniai, T., Sinha, S.N., Sato, Y.: Joint recovery of dense correspondence and cosegmentation in two images. In: CVPR (2016)

    Google Scholar 

  43. Torresani, L., Kolmogorov, V., Rother, C.: A dual decomposition approach to feature correspondence. TPAMI 35(2), 259–271 (2013)

    Article  Google Scholar 

  44. Ufer, N., Ommer, B.: Deep semantic feature matching. In: CVPR (2017)

    Google Scholar 

  45. Wang, S., Luo, L., Zhang, N., Li, J.: Autoscaler: scale-attention networks for visual correspondence. arXiv preprint arXiv:1611.05837 (2016)

  46. Yarlagadda, P., Ommer, B.: From meaningful contours to discriminative object shape. In: ECCV (2012)

    Google Scholar 

  47. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)

  48. Zhou, T., Lee, Y.J., Yu, S., Efros, A.: Flowweb: joint image set alignment by weaving consistent pixel-wise correspondences. In: CVPR (2015)

    Google Scholar 

  49. Zhou, T., Krahenbuhl, P., Aubry, M., Huang, Q., Efros, A.A.: Learning dense correspondences via 3D-guided cycle consistency. In: CVPR (2016)

    Google Scholar 

Download references

Acknowledgment

This work has been supported in part by the DFG grand OM81/1-1 and a hardware donation from NVIDIA Corporation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikolai Ufer .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4223 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ufer, N., Lui, K.T., Schwarz, K., Warkentin, P., Ommer, B. (2019). Weakly Supervised Learning of Dense Semantic Correspondences and Segmentation. In: Fink, G., Frintrop, S., Jiang, X. (eds) Pattern Recognition. DAGM GCPR 2019. Lecture Notes in Computer Science(), vol 11824. Springer, Cham. https://doi.org/10.1007/978-3-030-33676-9_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33676-9_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33675-2

  • Online ISBN: 978-3-030-33676-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics