Transform Your Smartphone into a DSLR Camera: Learning the ISP in the Wild

Shekhar Tripathi, Ardhendu; Danelljan, Martin; Shukla, Samarth; Timofte, Radu; Van Gool, Luc

doi:10.1007/978-3-031-20068-7_36

Ardhendu Shekhar Tripathi¹²,
Martin Danelljan¹²,
Samarth Shukla¹²,
Radu Timofte^12,13 &
…
Luc Van Gool^12,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13666))

Included in the following conference series:

European Conference on Computer Vision

1879 Accesses
1 Citations

Abstract

We propose a trainable Image Signal Processing (ISP) framework that produces DSLR quality images given RAW images captured by a smartphone. To address the color misalignments between training image pairs, we employ a color-conditional ISP network and optimize a novel parametric color mapping between each input RAW and reference DSLR image. During inference, we predict the target color image by designing a color prediction network with efficient Global Context Transformer modules. The latter effectively leverage global information to learn consistent color and tone mappings. We further propose a robust masked aligned loss to identify and discard regions with inaccurate motion estimation during training. Lastly, we introduce the ISP in the Wild (ISPW) dataset, consisting of weakly paired phone RAW and DSLR sRGB images. We extensively evaluate our method, setting a new state-of-the-art on two datasets. The code is available at https://github.com/4rdhendu/TransformPhone2DSLR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdelhamed, A., Lin, S., Brown, M.S.: A high-quality denoising dataset for smartphone cameras. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Deep burst super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021. pp. 9209–9218. Computer Vision Foundation/IEEE (2021)
Google Scholar
Bychkovsky, V., Paris, S., Chan, E., Durand, F.: Learning photographic global tonal adjustment with a database of input/output image pairs. In: The Twenty-Fourth IEEE Conference on Computer Vision and Pattern Recognition (2011)
Google Scholar
Dai, L., Liu, X., Li, C., Chen, J.: AWNet: attentive wavelet network for image ISP. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12537, pp. 185–201. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-67070-2_11
Chapter Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981). https://doi.org/10.1145/358669.358692, http://doi.acm.org/10.1145/358669.358692
Ignatov, A., Gool, L.V., Timofte, R.: Replacing mobile camera ISP with a single deep learning model. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2020, Seattle, WA, USA, 14–19 June 2020, pp. 2275–2285. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPRW50498.2020.00276
Ignatov, A., Kobyshev, N., Timofte, R., Vanhoey, K., Van Gool, L.: Dslr-quality photos on mobile devices with deep convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3277–3285 (2017)
Google Scholar
Ignatov, A., et al.: AIM 2020 challenge on learned image signal processing pipeline. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12537, pp. 152–170. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-67070-2_9
Chapter Google Scholar
Jaegle, A., et al.: Perceiver IO: a general architecture for structured inputs & outputs. CoRR abs/2107.14795 (2021). http://arxiv.org/abs/2107.14795
Jaegle, A., Gimeno, F., Brock, A., Zisserman, A., Vinyals, O., Carreira, J.: Perceiver: general perception with iterative attention. CoRR abs/2103.03206 (2021). http://arxiv.org/abs/2103.03206
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Chapter Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980
Liu, P., Zhang, H., Lian, W., Zuo, W.: Multi-level wavelet convolutional neural networks. IEEE Access 7, 74973–74985 (2019)
Article Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision, Kerkyra, Corfu, Greece, September 20–25, 1999. pp. 1150–1157. IEEE Computer Society (1999). DOI: https://doi.org/10.1109/ICCV.1999.790410,http://doi.org/10.1109/ICCV.1999.790410
Meister, S., Hur, J., Roth, S.: Unflow: Unsupervised learning of optical flow with a bidirectional census loss. In: McIlraith, S.A., Weinberger, K.Q. (eds.) Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2–7, 2018. pp. 7251–7259. AAAI Press (2018), www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16502
Plotz, T., Roth, S.: Benchmarking denoising algorithms with real photographs. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1586–1595 (2017)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015–18th International Conference Munich, Germany, October 5–9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer (2015). https://doi.org/10.1007/978-3-319-24574-4_28,http://doi.org/10.1007/978-3-319-24574-4_28
Sun, D., Yang, X., Liu, M., Kautz, J.: Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. CoRR abs/1709.02371 (2017), http://arxiv.org/abs/1709.02371
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA. pp. 5998–6008 (2017)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004), 10.1109/TIP.2003.819861, http://doi.org/10.1109/TIP.2003.819861
Xing, Y., Qian, Z., Chen, Q.: Invertible image signal processing. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19–25, 2021. pp. 6287–6296. Computer Vision Foundation / IEEE (2021)
Google Scholar
Zamir, S.W., Arora, A., Khan, S.H., Hayat, M., Khan, F.S., Yang, M., Shao, L.: Cycleisp: Real image restoration via improved data synthesis. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020. pp. 2693–2702. Computer Vision Foundation / IEEE (2020). DOI: 10.1109/CVPR42600.2020.00277
Google Scholar
Zhang, X., Chen, Q., Ng, R., Koltun, V.: Zoom to learn, learn to zoom. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019. pp. 3762–3770. Computer Vision Foundation / IEEE (2019). DOI: 10.1109/CVPR.2019.00388
Google Scholar
Zhang, Z., Wang, H., Liu, M., Wang, R., Zhang, J., Zuo, W.: Learning raw-to-srgb mappings with inaccurately aligned supervision. CoRR abs/2108.08119 (2021), http://arxiv.org/abs/2108.08119

Download references

Acknowledgement

This work was supported by the ETH Zürich Fund (OK), Huawei Technologies Oy (Finland) and Alexander von Humboldt Foundation.

Author information

Authors and Affiliations

ETH Zurich, Zürich, Switzerland
Ardhendu Shekhar Tripathi, Martin Danelljan, Samarth Shukla, Radu Timofte & Luc Van Gool
University of Wurzburg, Würzburg, Germany
Radu Timofte
KU Leuven, Leuven, Belgium
Luc Van Gool

Authors

Ardhendu Shekhar Tripathi
View author publications
You can also search for this author in PubMed Google Scholar
Martin Danelljan
View author publications
You can also search for this author in PubMed Google Scholar
Samarth Shukla
View author publications
You can also search for this author in PubMed Google Scholar
Radu Timofte
View author publications
You can also search for this author in PubMed Google Scholar
Luc Van Gool
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ardhendu Shekhar Tripathi .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 16579 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shekhar Tripathi, A., Danelljan, M., Shukla, S., Timofte, R., Van Gool, L. (2022). Transform Your Smartphone into a DSLR Camera: Learning the ISP in the Wild. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13666. Springer, Cham. https://doi.org/10.1007/978-3-031-20068-7_36

Download citation

DOI: https://doi.org/10.1007/978-3-031-20068-7_36
Published: 11 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20067-0
Online ISBN: 978-3-031-20068-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Transform Your Smartphone into a DSLR Camera: Learning the ISP in the Wild