Abstract
Existing perceptual similarity metrics assume an image and its reference are well aligned. As a result, these metrics are often sensitive to a small alignment error that is imperceptible to the human eyes. This paper studies the effect of small misalignment, specifically a small shift between the input and reference image, on existing metrics, and accordingly develops a shift-tolerant similarity metric. This paper builds upon LPIPS, a widely used learned perceptual similarity metric, and explores architectural design considerations to make it robust against imperceptible misalignment. Specifically, we study a wide spectrum of neural network elements, such as anti-aliasing filtering, pooling, striding, padding, and skip connection, and discuss their roles in making a robust metric. Based on our studies, we develop a new deep neural network-based perceptual similarity metric. Our experiments show that our metric is tolerant to imperceptible shifts while being consistent with the human similarity judgment. Code is available at https://tinyurl.com/5n85r28r.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: International Conference on Machine Learning. vol. 80, pp. 274–283 (2018)
Azulay, A., Weiss, Y.: Why do deep convolutional networks generalize so poorly to small image transformations? J. Mach. Learn. Res. 20(184), 1–25 (2019)
Bhardwaj, S., Fischer, I., Ballé, J., Chinen, T.: An unsupervised information-theoretic perceptual quality metric. In: Advances in Neural Information Processing Systems, pp. 13–24 (2020)
CLIC: Workshop and challenge on learned image compression (2021). http://www.compression.cc/2021/
Czolbe, S., Krause, O., Cox, I., Igel, C.: A loss function for generative neural networks based on Watson’s perceptual model. In: Advances in Neural Information Processing Systems, pp. 2051–2061 (2020)
Ding, K., Ma, K., Wang, S., Simoncelli, E.P.: Image quality assessment: Unifying structure and texture similarity. IEEE Trans. Pattern Anal. Mach. Intell., 1 (2020)
Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: Advances in Neural Information Processing Systems, pp. 658–666 (2016)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. Int. J. Comput. Vision 88(2), 303–338 (2010)
Jinjin, G., Haoming, C., Haoyu, C., Xiaoxing, Y., Ren, J.S., Chao, D.: PIPAL: a large-scale image quality assessment dataset for perceptual image restoration. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 633–651. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_37
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hénaff, O.J., Simoncelli, E.P.: Geodesics of learned representations. In: International Conference on Learning Representations (2016)
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \(<\)0.5mb model size. arXiv/1602.07360 (2016)
Islam, M.A., Jia, S., Bruce, N.D.B.: How much position information do convolutional neural networks encode? In: International Conference on Learning Representations (2020)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Kayhan, O.S., Gemert, J.C.v.: On translation invariance in CNNs: convolutional layers can exploit absolute spatial location. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 14262–14273 (2020)
Kettunen, M., Härkönen, E., Lehtinen, J.: E-LPIPS: robust perceptual image similarity via random transformation ensembles. arXiv/1906.03973 (2019)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1106–1114 (2012)
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. arXiv/1609.04802 (2016)
Lee, J., Yang, J., Wang, Z.: What does CNN shift invariance look like? a visualization study. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12539, pp. 196–210. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-68238-5_15
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Ma, K., Duanmu, Z., Wang, Z.: Geometric transformation invariant image quality assessment using convolutional neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6732–6736 (2018)
Medin, D.L., Goldstone, R.L., Gentner, D.: Respects for similarity. Psychol. Rev. 100(2), 254 (1993)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: International Conference on Machine Learning, pp. 807–814 (2010)
Niklaus, S., Mai, L., Liu, F.: Video frame interpolation via adaptive separable convolution. In: IEEE International Conference on Computer Vision, pp. 261–270 (2017)
Prashnani, E., Cai, H., Mostofi, Y., Sen, P.: Pieapp: perceptual image-error assessment through pairwise preference. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1808–1817 (2018)
Sajjadi, M.S.M., Schölkopf, B., Hirsch, M.: EnhanceNet: single image super-resolution through automated texture synthesis. arXiv/1612.07919 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Tversky, A.: Features of similarity. Psychol. Rev. 84(4), 327 (1977)
Vasconcelos, C., Larochelle, H., Dumoulin, V., Romijnders, R., Roux, N.L., Goroshin, R.: Impact of aliasing on generalization in deep convolutional networks. In: IEEE International Conference on Computer Vision (2021)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Wang, Z., Simoncelli, E.P.: Translation insensitive image similarity in complex wavelet domain. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. ii–573 (2005)
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: Asilomar Conference on Signals, Systems & Computers, vol. 2, pp. 1398–1402. IEEE (2003)
Watson, A.B.: DCT quantization matrices visually optimized for individual images. In: Human vision, visual processing, and digital display IV, vol. 1913, pp. 202–216. International Society for Optics and Photonics (1993)
Xiao, C., Zhu, J.Y., Li, B., He, W., Liu, M., Song, D.: Spatially transformed adversarial examples. In: International Conference on Learning Representations (2018)
Zhang, L., Zhang, L., Mou, X., Zhang, D.: Fsim: a feature similarity index for image quality assessment. IEEE Trans. Image Process. 20(8), 2378–2386 (2011)
Zhang, R.: Making convolutional networks shift-invariant again. In: International Conference on Machine Learning, pp. 7324–7334 (2019)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 597–613. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_36
Zou, X., Xiao, F., Yu, Z., Lee, Y.J.: Delving deeper into anti-aliasing in convnets. In: British Machine Vision Conference (2020)
Acknowledgments
Figure 1 uses frames from https://www.youtube.com/watch?v=jW7pFhkVNYY under a Creative Commons license.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ghildyal, A., Liu, F. (2022). Shift-Tolerant Perceptual Similarity Metric. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13678. Springer, Cham. https://doi.org/10.1007/978-3-031-19797-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-19797-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19796-3
Online ISBN: 978-3-031-19797-0
eBook Packages: Computer ScienceComputer Science (R0)