ColorFormer: Image Colorization via Color Memory Assisted Hybrid-Attention Transformer

Ji, Xiaozhong; Jiang, Boyuan; Luo, Donghao; Tao, Guangpin; Chu, Wenqing; Xie, Zhifeng; Wang, Chengjie; Tai, Ying

doi:10.1007/978-3-031-19787-1_2

Xiaozhong Ji¹²,
Boyuan Jiang¹²,
Donghao Luo¹²,
Guangpin Tao¹²,
Wenqing Chu¹²,
Zhifeng Xie¹³,
Chengjie Wang¹² &
…
Ying Tai¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13676))

Included in the following conference series:

European Conference on Computer Vision

2377 Accesses
6 Citations

Abstract

Automatic image colorization is a challenging task that attracts a lot of research interest. Previous methods employing deep neural networks have produced impressive results. However, these colorization images are still unsatisfactory and far from practical applications. The reason is that semantic consistency and color richness are two key elements ignored by existing methods. In this work, we propose an automatic image colorization method via color memory assisted hybrid-attention transformer, namely ColorFormer. Our network consists of a transformer-based encoder and a color memory decoder. The core module of the encoder is our proposed global-local hybrid attention operation, which improves the ability to capture global receptive field dependencies. With the strong power to model contextual semantic information of grayscale image in different scenes, our network can produce semantic-consistent colorization results. In decoder part, we design a color memory module which stores various semantic-color mapping for image-adaptive queries. The queried color priors are used as reference to help the decoder produce more vivid and diverse results. Experimental results show that our method can generate more realistic and semantically matched color images compared with state-of-the-art methods. Moreover, owing to the proposed end-to-end architecture, the inference speed reaches 40 FPS on a V100 GPU, which meets the real-time requirement.

X. Ji and B. Jiang—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.boredpanda.com/colorized-history-black-and-white-pictures-restored-in-color/.

References

Antic, J.: A deep learning based project for colorizing and restoring old images (2018)
Google Scholar
Anwar, S., Tahir, M., Li, C., Mian, A., Khan, F.S., Muzaffar, A.W.: Image colorization: a survey and dataset. arXiv preprint arXiv:2008.10774 (2020)
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Caesar, H., Uijlings, J., Ferrari, V.: COCO-Stuff: thing and stuff classes in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1209–1218 (2018)
Google Scholar
Cheng, Z., Yang, Q., Sheng, B.: Deep colorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 415–423 (2015)
Google Scholar
Chu, X., et al.: Twins: revisiting the design of spatial attention in vision transformers. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Deng, J.: A large-scale hierarchical image database. In: Proceedings of IEEE Computer Vision and Pattern Recognition 2009 (2009)
Google Scholar
Dosovitskiy, A., et al.: An image is worth \(16\times 16\) words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2672–2680 (2014)
Google Scholar
Hasler, D., Suesstrunk, S.E.: Measuring colorfulness in natural images. In: Human Vision and Electronic Imaging VIII, vol. 5007, pp. 87–95. International Society for Optics and Photonics (2003)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar
He, M., Chen, D., Liao, J., Sander, P.V., Yuan, L.: Deep exemplar-based colorization. ACM Trans. Graph. (TOG) 37(4), 1–16 (2018)
Google Scholar
Hendrycks, D., Gimpel, K.: Gaussian error linear units (GELUs). arXiv preprint arXiv:1606.08415 (2016)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Ho, J., Kalchbrenner, N., Weissenborn, D., Salimans, T.: Axial attention in multidimensional transformers. arXiv preprint arXiv:1912.12180 (2019)
Huang, Y.C., Tung, Y.S., Chen, J.C., Wang, S.W., Wu, J.L.: An adaptive edge detection based colorization algorithm and its applications. In: Proceedings of the 13th Annual ACM International Conference on Multimedia, pp. 351–354 (2005)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Chapter Google Scholar
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations (2018)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (Poster) (2015)
Google Scholar
Kumar, M., Weissenborn, D., Kalchbrenner, N.: Colorization transformer. In: International Conference on Learning Representations (2021)
Google Scholar
Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 577–593. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_35
Chapter Google Scholar
Levin, A., Lischinski, D., Weiss, Y.: Colorization using optimization. In: ACM SIGGRAPH 2004 Papers, pp. 689–694 (2004)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Su, J.W., Chu, H.K., Huang, J.B.: Instance-aware image colorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7968–7977 (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Vitoria, P., Raad, L., Ballester, C.: ChromaGAN: adversarial picture colorization with semantic class distribution. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2445–2454 (2020)
Google Scholar
Welsh, T., Ashikhmin, M., Mueller, K.: Transferring color to greyscale images. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, pp. 277–280 (2002)
Google Scholar
Wu, Y., Wang, X., Li, Y., Zhang, H., Zhao, X., Shan, Y.: Towards vivid and diverse image colorization with generative color prior. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14377–14386 (2021)
Google Scholar
Xu, Z., Wang, T., Fang, F., Sheng, Y., Zhang, G.: Stylization-based architecture for fast deep exemplar colorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9363–9372 (2020)
Google Scholar
Yoo, S., Bahng, H., Chung, S., Lee, J., Chang, J., Choo, J.: Coloring with limited data: few-shot colorization via memory-augmented networks. IEEE (2019)
Google Scholar
Zhang, B., et al.: Deep exemplar-based video colorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8052–8061 (2019)
Google Scholar
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_40
Chapter Google Scholar
Zhang, R., et al.: Real-time user-guided image colorization with learned deep priors. ACM Trans. Graph. (TOG) 36(4), 1–11 (2017)
Google Scholar
Zhao, J., Liu, L., Snoek, C.G., Han, J., Shao, L.: Pixel-level semantics guided image colorization. arXiv preprint arXiv:1808.01597 (2018)

Download references

Author information

Authors and Affiliations

Youtu Lab, Tencent, Shanghai, China
Xiaozhong Ji, Boyuan Jiang, Donghao Luo, Guangpin Tao, Wenqing Chu, Chengjie Wang & Ying Tai
Shanghai University, Shanghai, China
Zhifeng Xie

Authors

Xiaozhong Ji
View author publications
You can also search for this author in PubMed Google Scholar
Boyuan Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Donghao Luo
View author publications
You can also search for this author in PubMed Google Scholar
Guangpin Tao
View author publications
You can also search for this author in PubMed Google Scholar
Wenqing Chu
View author publications
You can also search for this author in PubMed Google Scholar
Zhifeng Xie
View author publications
You can also search for this author in PubMed Google Scholar
Chengjie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ying Tai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Chengjie Wang or Ying Tai .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4621 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ji, X. et al. (2022). ColorFormer: Image Colorization via Color Memory Assisted Hybrid-Attention Transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13676. Springer, Cham. https://doi.org/10.1007/978-3-031-19787-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-19787-1_2
Published: 21 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19786-4
Online ISBN: 978-3-031-19787-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ColorFormer: Image Colorization via Color Memory Assisted Hybrid-Attention Transformer