1 Correction to: International Journal of Computer Vision https://doi.org/10.1007/s11263-022-01602-y

After publication of original article, it came to author's attention that the last line of abstract should be removed. The correct abstract can be found below. The original article has been corrected.


Abstract


Dense correspondence across semantically related images has been extensively studied, but still faces two challenges: (1) large variations in appearance, scale and pose exist even for objects from the same category, and (2) labeling pixel-level dense correspondences is labor intensive and infeasible to scale. Most existing methods focus on designing various matching modules using fully-supervised ImageNet pretrained networks. On the other hand, while a variety of self-supervised approaches are proposed to explicitly measure image-level similarities, correspondence matching the pixel level remains under-explored. In this work, we propose a multi-level contrastive learning approach for semantic matching, which does not rely on any ImageNet pretrained model. We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects, while the performance can be further enhanced by regularizing cross-instance cycle-consistency at intermediate feature levels. Experimental results on the PF-PASCAL, PF-WILLOW, and SPair-71k benchmark datasets demonstrate that our method performs favorably against the state-of-the-art approaches.