Abstract
Learning-based approaches have made substantial progress in capturing spatially-varying bidirectional reflectance distribution functions (SVBRDFs) from a single image with unknown lighting and geometry. However, most existing networks only consider per-pixel losses which limit their capability to recover local features such as smooth glossy regions. A few generative adversarial networks use multiple discriminators for different parameter maps, increasing network complexity. We present a novel end-to-end generative adversarial network (GAN) to recover appearance from a single picture of a nearly-flat surface lit by flash. We use a single unified adversarial framework for each parameter map. An attention module guides the network to focus on details of the maps. Furthermore, the SVBRDF map loss is combined to prevent paying excess attention to specular highlights. We demonstrate and evaluate our method on both public datasets and real data. Quantitative analysis and visual comparisons indicate that our method achieves better results than the state-of-the-art in most cases.

References
Weyrich, T.; Lawrence, J.; Lensch, H. P. A.; Rusinkiewicz, S.; Zickler, T. Principles of appearance acquisition and representation. Foundations and Trends® in Computer Graphics and Vision Vol. 4, No. 2, 75–191, 2009.
Dorsey, J.; Rushmeier, H.; Sillion, F. Digital Modeling of Material Appearance. Amsterdam: Elsevier, 2010.
Weinmann, M.; Klein, R. Advances in geometry and reflectance acquisition (course notes). In: Proceedings of the SIGGRAPH Asia 2015 Courses, Article No. 1, 2015.
Guarnera, D.; Guarnera, G. C.; Ghosh, A.; Denk, C.; Glencross, M. BRDF representation and acquisition. Computer Graphics Forum Vol. 35, No. 2, 625–650, 2016.
Dong, Y. Deep appearance modeling: A survey. Visual Informatics Vol. 3, No. 2, 59–68, 2019.
Deschaintre, V.; Drettakis, G.; Bousseau, A. Guided fine-tuning for large-scale material transfer. Computer Graphics Forum Vol. 39, No. 4, 91–105, 2020.
Wang, Z.; Yu, X.; Lu, M.; Wang, Q.; Qian, C.; Xu, F. Single image portrait relighting via explicit multiple reflectance channel modeling. ACM Transactions on Graphics Vol. 39, No. 6, Article No. 220, 2020.
Li, X.; Dong, Y.; Peers, P.; Tong, X. Modeling surface appearance from a single photograph using self-augmented convolutional neural networks. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 45, 2017.
Deschaintre, V.; Aittala, M.; Durand, F.; Drettakis, G.; Bousseau, A. Single-image SVBRDF capture with a rendering-aware deep network. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 128, 2018.
Li, Z.; Sunkavalli, K.; Chandraker, M. Materials for masses: SVBRDF acquisition with a single mobile phone image. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11207. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 74–90, 2018.
Ye, W. J.; Li, X.; Dong, Y.; Peers, P.; Tong, X. Single image surface appearance modeling with self-augmented CNNs and inexact supervision. Computer Graphics Forum Vol. 37, No. 7, 201–211, 2018.
Gao, D.; Li, X.; Dong, Y.; Peers, P.; Xu, K.; Tong, X. Deep inverse rendering for high-resolution SVBRDF estimation from an arbitrary number of images. ACM Transactions on Graphics Vol. 38, No. 4, Article No. 134, 2019.
Guo, Y.; Smith, C.; Hašan, M.; Sunkavalli, K.; Zhao, S. MaterialGAN: Reflectance capture using a generative SVBRDF model. arXiv preprint arXiv:2010.00114, 2020.
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In: Proceedings of the Advances in Neural Information Processing Systems 27, 2014.
Zhou, X. L.; Kalantari, N. K. Adversarial single-image SVBRDF estimation with hybrid training. Computer-Graphics Forum Vol. 40, No. 2, 315–325, 2021.
Guo, J.; Lai, S. C.; Tao, C. Z.; Cai, Y. L.; Wang, L.; Guo, Y. W.; Yan, L. Q. Highlight-aware two-stream network for single-image SVBRDF acquisition. ACM Transactions on Graphics Vol. 40, No. 4, Article No. 123, 2021.
Chandraker, M. On shape and material recovery from motion. In: Computer Vision — ECCV 2014. Lecture Notes in Computer Science, Vol. 8695. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 202–217, 2014.
Hui, Z.; Sankaranarayanan, A. C. A dictionary-based approach for estimating shape and spatially-varying reflectance. In: Proceedings of the IEEE International Conference on Computational Photography, 1–9, 2015.
Riviere, J.; Peers, P.; Ghosh, A. Mobile surface reflectometry. Computer Graphics Forum Vol. 35, No. 1, 191–202, 2016.
Xia, R.; Dong, Y.; Peers, P.; Tong, X. Recovering shape and spatially-varying surface reflectance under unknown illumination. ACM Transactions on Graphics Vol. 35, No. 6, Article No. 187, 2016.
Boivin, S.; Gagalowicz, A. Inverse rendering from a single image. In: Proceedings of the Conference on Colour in Graphics, Imaging, and Vision, 268–277, 2002. Available at https://www.dgp.toronto.edu/~boivin/pubs/cgiv2002.pdf.
Aittala, M.; Weyrich, T.; Lehtinen, J. Two-shot SVBRDF capture for stationary materials. ACM Transactions on Graphics Vol. 34, No. 4, Article No. 110, 2015.
Xu, Z. X.; Nielsen, J. B.; Yu, J. Y.; Jensen, H. W.; Ramamoorthi, R. Minimal BRDF sampling for two-shot near-field reflectance acquisition. ACM Transactions on Graphics Vol. 35, No. 6, Article No. 188, 2016.
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention — MICCAI 2015. Lecture Notes in Computer Science, Vol. 9351. Navab, N.; Hornegger, J.; Wells, W.; Frangi, A. Eds. Springer Cham, 234–241, 2015.
Deschaintre, V.; Aittala, M.; Durand, F.; Drettakis, G.; Bousseau, A. Flexible SVBRDF capture with a multi-image deep network. Computer Graphics Forum Vol. 38, No. 4, 1–13, 2019.
Li, Z. Q.; Xu, Z. X.; Ramamoorthi, R.; Sunkavalli, K.; Chandraker, M. Learning to reconstruct shape and spatially-varying reflectance from a single image. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 269, 2018.
Zhao, Y. Z.; Wang, B. B.; Xu, Y. N.; Zeng, Z.; Wang, L.; Holzschuch, N. Joint SVBRDF recovery and synthesis from a single image using an unsupervised generative adversarial network. In: Proceedings of the Eurographics Symposium on Rendering — DL-only Track, 53–66, 2020.
Asselin, L. P.; Laurendeau, D.; Lalonde, J. F. Deep SVBRDF estimation on real materials. In: Proceedings of the International Conference on 3D Vision, 1157–1166, 2021.
Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. M. Analyzing and improving the image quality of StyleGAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8107–8116, 2020.
Maas, A. L.; Hannun, A. Y.; Ng, A. Y. Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the 30th International Conference on Machine Learning, 3–8, 2013.
Zhou, B. L.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Object detectors emerge in deep scene CNNs. arXiv preprint arXiv:1412.6856, 2014.
Isola, P.; Zhu, J. Y.; Zhou, T. H.; Efros, A. A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5967–5976, 2017.
Nguyen, T. V.; Zhao, Q.; Yan, S. C. Attentive systems: A survey. International Journal of Computer Vision Vol. 126, No. 1, 86–110, 2018.
Chaudhari, S.; Mithal, V.; Polatkan, G.; Ramanath, R. An attentive survey of attention models. ACM Transactions on Intelligent Systems and Technology Vol. 12, No. 5, Article No. 53, 2021.
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
Chorowski, J.; Bahdanau, D.; Serdyuk, D.; Cho, K.; Bengio, Y. Attention-based models for speech recognition. arXiv preprint arXiv:1506.07503, 2015.
Mnih, V.; Heess, N.; Graves, A.; Kavukcuoglu, K. Recurrent models of visual attention. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 2, 2204–2212, 2014.
Maini, R.; Aggarwal, H. A comprehensive review of image enhancement techniques. arXiv preprint arXiv:1003.4053, 2010.
Cook, R. L.; Torrance, K. E. A reflectance model for computer graphics. ACM Transactions on Graphics Vol. 1, No. 1, 7–24, 1982.
Aittala, M.; Aila, T. M.; Lehtinen, J. Reflectance modeling by neural texture synthesis. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 65, 2016.
Team T. TensorFlow: Large-scale machine learning on heterogeneous systems. 2015. Available at https://www.tensorflow.org/.
Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference for Learning Representations, 2015.
Fu, G.; Zhang, Q.; Zhu, L.; Li, P.; Xiao, C. X. A multitask network for joint specular highlight detection and removal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7748–7757, 2021.
Acknowledgements
The authors would like to thank Jie Guo from Nanjing University for his kind help with the comparison. Ying Song was partially supported by the National Natural Science Foundation of China (No. 61602416) and Shaoxing Science and Technology Plan Project (No. 2020B41006).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Zeqi Shi is a master student at the School of Information Science and Technology of Zhejiang Sci-Tech University. He received his B.S. degree from Zhejiang Sci-Tech University in 2019. His research interests include deep learning and computer graphics.
Xiangyu Lin is a lecturer in the School of Information Science and Technology of Zhejiang Sci-Tech University. He obtained his B.S. and Ph.D. degrees in electronic information technology and instruments from Zhejiang University. His main research interests are image processing and machine learning.
Ying Song is an associate professor in the School of Information Science and Technology of Zhejiang Sci-Tech University. She obtained her B.S. and Ph.D. degrees in computer science and technology from Zhejiang University. Her main research interests are appearance modeling and realistic rendering.
Electronic supplementary material
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.
About this article
Cite this article
Shi, Z., Lin, X. & Song, Y. An attention-embedded GAN for SVBRDF recovery from a single image. Comp. Visual Media 9, 551–561 (2023). https://doi.org/10.1007/s41095-022-0289-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41095-022-0289-1
Keywords
- spatially-varying bidirectional reflectance distribution function (SVBRDF)
- appearance capture
- generative adversarial network (GAN)
- attention mechanism