Abstract
A metric for natural image patches is an important tool for analyzing images. An efficient means of learning one is to train a deep network to map an image patch to a vector space, in which the Euclidean distance reflects patch similarity. Previous attempts learned such an embedding in a supervised manner, requiring the availability of many annotated images. In this paper, we present an unsupervised embedding of natural image patches, avoiding the need for annotated images. The key idea is that the similarity of two patches can be learned from the prevalence of their spatial proximity in natural images. Clearly, relying on this simple principle, many spatially nearby pairs are outliers. However, as we show, these outliers do not harm the convergence of the metric learning. We show that our unsupervised embedding approach is more effective than a supervised one or one that uses deep patch representations. Moreover, we show that it naturally lends itself to an efficient self-supervised domain adaptation technique onto a target domain that contains a common foreground object.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Matviychuk, Y.; Hughes, S. M. Exploring the manifold of image patches. In: Proceedings of Bridges, 339–342, 2015.
Shi, K.; Zhu, S.-C. Mapping natural image patches by explicit and implicit manifolds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–7, 2007.
Fried, O.; Avidan, S.; Cohen-Or, D. Patch2Vec: Globally consistent image patch representation. Computer Graphics Forum Vol. 36, No. 7, 183–194, 2017.
Doersch, C.; Gupta, A.; Efros, A. A. Unsupervised visual representation learning by context prediction. In: Proceedings of the IEEE International Conference on Computer Vision, 1422–1430, 2015.
Julesz, B. Textons, the elements of texture perception, and their interactions. Nature Vol. 290, No. 5802, 91–97, 1981.
Randen, T.; Husoy, J. H. Filtering for texture classification: A comparative study. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 21, No. 4, 291–310, 1999.
Gabor, D. Theory of communication. Part 1: The analysis of information. Journal of the Institution of Electrical Engineers — Part III: Radio and Communication Engineering Vol. 93, No. 26, 429–441, 1946.
De Bonet, J. S.; Viola, P. Texture recognition using a non-parametric multi-scale statistical model. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 641–647, 1998.
Heeger, D. J.; Bergen, J. R. Pyramid-based texture analysis/synthesis. In: Proceedings of the IEEE International Conference on Image Processing 648–650, 1995.
Varma, M.; Zisserman, A. Texture classification: Are filter banks necessary? In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, II–691, 2003.
Barnes, C.; Zhang, F. L. A survey of the state-of-the-art in patch-based synthesis. Computational Visual Media Vol. 3, No. 1, 3–20, 2017.
Žbontar, J.; LeCun, Y. Stereo matching by training a convolutional neural network to compare image patches. The Journal of Machine Learning Research Vol. 17, No. 1, 2287–2318, 2016.
Simo-Serra, E.; Trulls, E.; Ferraz, L.; Kokkinos, I.; Fua, P.; Moreno-Noguer, F. Discriminative learning of deep convolutional feature point descriptors. In: Proceedings of the IEEE International Conference on Computer Vision, 118–126, 2015.
Hu, S.-M.; Zhang, F.-L.; Wang, M.; Martin, R. R.; Wang, J. PatchNet: A patch-based image representation for interactive library-driven image editing. ACM Transactions on Graphics Vol. 32, No. 6, Article No. 196, 2013.
Barnes, C.; Zhang, F.-L.; Lou, L.; Wu, X.; Hu, S.-M. PatchTable: Efficient patch queries for large datasets and applications. ACM Transactions on Graphics Vol. 34, No. 4, Article No. 97, 2015.
Cimpoi, M.; Maji, S.; Vedaldi, A. Deep filter banks for texture recognition and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3828–3836, 2015.
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440, 2015.
Isola, P.; Zoran, D.; Krishnan, D.; Adelson, E. H. Learning visual groups from co-occurrences in space and time. arXiv preprint arXiv:1511.06811, 2015.
Wang X.; Gupta, A. Unsupervised learning of visual representations using videos. In: Proceeding of the IEEE International Conference on Computer Vision, 2794–2802, 2015.
Pathak, D.; Krähenbühl, P.; Donahue, J.; Darrell, T.; Efros, A. A. Context encoders: Feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2536–2544, 2016.
Ben-David, S.; Blitzer, J.; Crammer, K.; Kulesza, A.; Pereira, F.; Vaughan, J. W. A theory of learning from different domains. Machine Learning Vol. 79, Nos. 1–2, 151–175, 2010.
Chen, M.; Xu, Z.; Weinberger, K. Q.; Sha, F. Marginalized denoising autoencoders for domain adaptation. In: Proceedings of the 29th International Conference on Machine Learning, 1627–1634, 2012.
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S. A. et al. ImageNet large scale visual recognition challenge. International Journal of Computer Vision Vol. 115, No. 3, 211–252, 2015.
Oquab, M.; Bottou, L.; Laptev, I.; Sivic, J. Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1717–1724, 2014.
Razavian, A. S.; Azizpour, H.; Sullivan, J.; Carlsson, S. CNN features off-the-shelf: An astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 806–813, 2014.
Patel, V. M.; Gopalan, R.; Li, R. N.; Chellappa, R. Visual domain adaptation: A survey of recent advances. IEEE Signal Processing Magazine Vol. 32, No. 3, 53–69, 2015.
Ganin, Y.; Lempitsky, V. Unsupervised domain adaptation by backpropagation. In: Proceedings of the 32nd International Conference on Machine Learning, Vol. 37, 1180–1189, 2015.
Kodirov, E.; Xiang, T.; Fu, Z.; Gong, S. Unsupervised domain adaptation for zero-shot learning. In: Proceedings of the IEEE International Conference on Computer Visio, 2452–2460, 2015.
Szegedy, C.; Liu, W.; Jia, Y. Q.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–9, 2015.
Bagon, S. Matlab wrapper for graph cut. 2006. Available at http://www.wisdom.weizmann.ac.il/~bagon/matlab.html.
Arbeláez, P.; Maire, M.; Fowlkes, C.; Malik, J. Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 33, No. 5, 898–916, 2011.
Rubinstein, M.; Joulin, A.; Kopf, J.; Liu, C. Unsupervised joint object discovery and segmentation in Internet images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1939–1946, 2013.
Wang, M.; Lai, Y.; Liang, Y.; Martin, R. R.; Hu, S.-M. BiggerPicture: Data-driven image extrapolation using graph matching. ACM Transactions on Graphics Vol. 33, No. 6, Article No. 173, 2014.
Author information
Authors and Affiliations
Corresponding author
Additional information
Dov Danon is a Ph.D. student at the School of Computer Science, Tel-Aviv University. He received his B.Sc. (summa cum laude) degree in computer science and mathematics from the Ben Gurion of the Negev in 2007 and M.Sc. degree in computer science from Tel-Aviv University in 2016. His research interests include machine learning and, in particular, unsupervised learning in image processing.
Hadar Averbuch-Elor is a Ph.D. student at the School of Electrical Engineering, Tel-Aviv University, and a research scientist at Amazon. She received her B.Sc. (cum laude) degree in electrical engineering from the Technion in 2012. She worked as a computer vision algorithm developer in the defense industry from 2011 to 2015. Her research interests include computer vision and computer graphics, focusing on unstructured image collections and unsupervised techniques.
Ohad Fried is a postdoctoral research scholar at the School of Computer Science, Stanford University, and a fellow in the Brown Institute for Media Innovation. He received his B.Sc. (magna cum laude) degree in computer science and computational biology and M.Sc. (cum laude) degree in computer science, both from the Hebrew University, in 2010 and 2012 respectively. He received his Ph.D. degree from the Department of Computer Science at Princeton University in 2017. Currently, his main interests are visual communication methods at the intersection of graphics, vision, and HCI.
Daniel Cohen-Or is a professor at the School of Computer Science, Tel-Aviv University. He received his B.Sc. (cum laude) degree in mathematics and computer science and M.Sc. (cum laude) degree in computer science, both from Ben-Gurion University, in 1985 and 1986, respectively. He received his Ph.D. degree from the Department of Computer Science at the State University of New York at Stony Brook in 1991. He received the 2005 Eurographics Outstanding Technical Contributions Award. In 2015, he was named a Thomson Reuters Highly Cited Researcher. Currently, his main interests are in a few areas: image synthesis, analysis and reconstruction, motion and transformations, shapes, and surfaces.
Electronic supplementary material
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.
About this article
Cite this article
Danon, D., Averbuch-Elor, H., Fried, O. et al. Unsupervised natural image patch learning. Comp. Visual Media 5, 229–237 (2019). https://doi.org/10.1007/s41095-019-0147-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41095-019-0147-y