PARN: Pyramidal Affine Regression Networks for Dense Semantic Correspondence

  • Sangryul Jeon
  • Seungryong Kim
  • Dongbo Min
  • Kwanghoon SohnEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11210)


This paper presents a deep architecture for dense semantic correspondence, called pyramidal affine regression networks (PARN), that estimates locally-varying affine transformation fields across images. To deal with intra-class appearance and shape variations that commonly exist among different instances within the same object category, we leverage a pyramidal model where affine transformation fields are progressively estimated in a coarse-to-fine manner so that the smoothness constraint is naturally imposed within deep networks. PARN estimates residual affine transformations at each level and composes them to estimate final affine transformations. Furthermore, to overcome the limitations of insufficient training data for semantic correspondence, we propose a novel weakly-supervised training scheme that generates progressive supervisions by leveraging a correspondence consistency across image pairs. Our method is fully learnable in an end-to-end manner and does not require quantizing infinite continuous affine transformation fields. To the best of our knowledge, it is the first work that attempts to estimate dense affine transformation fields in a coarse-to-fine manner within deep networks. Experimental results demonstrate that PARN outperforms the state-of-the-art methods for dense semantic correspondence on various benchmarks.


Dense semantic correspondence Hierarchical graph model 



This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT (NRF-2017M3C4A7069370).

Supplementary material

474211_1_En_22_MOESM1_ESM.pdf (26 mb)
Supplementary material 1 (pdf 26661 KB)


  1. 1.
    HaCohen, Y., Shechtman, E., Goldman, D.B., Lischinski, D.: Non-rigid dense correspondence with applications for image enhancement. ACM Trans. Graph. (TOG) 30(4), 70 (2011)CrossRefGoogle Scholar
  2. 2.
    Liu, C., Yuen, J., Torralba, A.: Sift flow: dense correspondence across scenes and its applications. IEEE Trans. PAMI 33(5), 815–830 (2011)Google Scholar
  3. 3.
    Kim, J., Liu, C., Sha, F., Grauman, K.: Deformable spatial pyramid matching for fast dense correspondences. In: CVPR (2013)Google Scholar
  4. 4.
    Yang, H., Lin, W.Y., Lu, J.: Daisy filter flow: a generalized discrete approach to dense correspondences. In: CVPR (2014)Google Scholar
  5. 5.
    Zhou, T., Lee, Y.J., Yu, S.X., Efros, A.A.: Flowweb: joint image set alignment by weaving consistent, pixel-wise correspondences. In: CVPR (2015)Google Scholar
  6. 6.
    Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 47(1), 7–42 (2002)CrossRefGoogle Scholar
  7. 7.
    Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012). Scholar
  8. 8.
    Kim, S., Min, D., Ham, B., Jeon, S., Lin, S., Sohn, K.: FCSS: fully convolutional self-similarity for dense semantic correspondence. In: CVPR (2017)Google Scholar
  9. 9.
    Choy, C.B., Gwak, J., Savarese, S., Chandraker, M.: Universal correspondence network. In: NIPS (2016)Google Scholar
  10. 10.
    Kim, S., Min, D., Lin, S., Sohn, K.: DCTM: discrete-continuous transformation matching for semantic flow. In: ICCV (2017)Google Scholar
  11. 11.
    DeTone, D., Malisiewicz, T., Rabinovich, A.: Deep image homography estimation. arXiv preprint arXiv:1606.03798 (2016)
  12. 12.
    Rocco, I., Arandjelović, R., Sivic, J.: Convolutional neural network architecture for geometric matching. In: CVPR (2017)Google Scholar
  13. 13.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: NIPS (2015)Google Scholar
  14. 14.
    Lin, C.H., Lucey, S.: Inverse compositional spatial transformer networks. In: CVPR (2017)Google Scholar
  15. 15.
    Schneider, N., Piewak, F., Stiller, C., Franke, U.: RegNet: multimodal sensor registration using deep neural networks. In: IV (2017)Google Scholar
  16. 16.
    Dai, J., et al.: Deformable convolutional networks. In: ICCV (2017)Google Scholar
  17. 17.
    Hur, J., Lim, H., Park, C., Ahn, S.C.: Generalized deformable spatial pyramid: geometry-preserving dense correspondence estimation. In: CVPR (2015)Google Scholar
  18. 18.
    Tola, E., Lepetit, V., Fua, P.: Daisy: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans. PAMI 32(5), 815–830 (2010)CrossRefGoogle Scholar
  19. 19.
    Taniai, T., Sinha, S.N., Sato, Y.: Joint recovery of dense correspondence and cosegmentation in two images. In: CVPR (2016)Google Scholar
  20. 20.
    Ham, B., Cho, M., Schmid, C., Ponce, J.: Proposal flow: semantic correspondences from object proposals. IEEE Trans. PAMI 40, 1711–1725 (2017)CrossRefGoogle Scholar
  21. 21.
    Li, F.F., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. PAMI 28(4), 594–611 (2006)CrossRefGoogle Scholar
  22. 22.
    Yang, F., Li, X., Cheng, H., Li, J., Chen, L.: Object-aware dense semantic correspondence. In: CVPR (2017)Google Scholar
  23. 23.
    Bristow, H., Valmadre, J., Lucey, S.: Dense semantic correspondence where every pixel is a classifier. In: ICCV (2015)Google Scholar
  24. 24.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Zhou, T., Krahenbuhl, P., Aubry, M., Huang, Q., Efros, A.A.: Learning dense correspondence via 3D-guided cycle consistency. In: CVPR (2016)Google Scholar
  26. 26.
  27. 27.
    Novotny, D., Larlus, D., Vedaldi, A.: Anchornet: a weakly supervised network to learn geometry-sensitive features for semantic matching. In: CVPR (2017)Google Scholar
  28. 28.
    Hassner, T., Mayzels, V., Zelnik-Manor, L.: On sifts and their scales. In: CVPR (2012)Google Scholar
  29. 29.
    Qiu, W., Wang, X., Bai, X., Yuille, A., Tu, Z.: Scale-space sift flow. In: WACV (2014)Google Scholar
  30. 30.
    Ham, B., Cho, M., Schmid, C., Ponce, J.: Proposal flow. In: CVPR (2016)Google Scholar
  31. 31.
    Han, K., et al.: SCNet: learning semantic correspondence. In: ICCV (2017)Google Scholar
  32. 32.
    Li, Y., Min, D., Brown, M.S., Do, M.N., Lu, J.: SPM-BP: Sped-up patchmatch belief propagation for continuous MRFs. In: ICCV (2015)Google Scholar
  33. 33.
    Hosni, A., Rhemann, C., Bleyer, M., Rother, C., Gelautz, M.: Fast cost-volume filtering for visual correspondence and beyond. IEEE Trans. PAMI 35(2), 504–511 (2013)Google Scholar
  34. 34.
    Lu, J., Yang, H., Min, D., Do, M.N.: Patchmatch filter: efficient edge-aware filtering meets randomized search for fast correspondence field estimation. In: CVPR (2013)Google Scholar
  35. 35.
    Revaud, J., Weinzaepfel, P., Harchaoui, Z., Schmid, C.: Deepmatcing: Hierarchical deformable dense matching. IJCV 120, 300–323 (2015)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Ranjan, A., Black, M.J.: Optical flow estimation using a spatial pyramid network. In: CVPR (2017)Google Scholar
  37. 37.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  38. 38.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  39. 39.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)Google Scholar
  41. 41.
    Fischer, P., et al.: FlowNet: learning optical flow with convolutional networks. In: ICCV (2015)Google Scholar
  42. 42.
    Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: ICCV (2015)Google Scholar
  43. 43.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisseman, A.: The pascal visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)CrossRefGoogle Scholar
  44. 44.
    Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasum, R., Yuille, A.: Detect what you can: detecting and representing objects using holistic models and body parts. In: CVPR (2014)Google Scholar
  45. 45.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  46. 46.
    Torr, P.H., Zisserman, A.: MLESAC: a new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78(1), 138–156 (2000)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Sangryul Jeon
    • 1
  • Seungryong Kim
    • 1
  • Dongbo Min
    • 2
  • Kwanghoon Sohn
    • 1
    Email author
  1. 1.Yonsei UniversitySeoulSouth Korea
  2. 2.Ewha Womans UniversitySeoulSouth Korea

Personalised recommendations