$$\mathrm {CT^2}$$ : Colorization Transformer via Color Tokens

Weng, Shuchen; Sun, Jimeng; Li, Yu; Li, Si; Shi, Boxin

doi:10.1007/978-3-031-20071-7_1

Shuchen Weng¹²,
Jimeng Sun¹³,
Yu Li¹⁴,
Si Li¹³ &
…
Boxin Shi¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13667))

Included in the following conference series:

European Conference on Computer Vision

2188 Accesses
6 Citations

Abstract

Automatic image colorization is an ill-posed problem with multi-modal uncertainty, and there remains two main challenges with previous methods: incorrect semantic colors and under-saturation. In this paper, we propose an end-to-end transformer-based model to overcome these challenges. Benefited from the long-range context extraction of transformer and our holistic architecture, our method could colorize images with more diverse colors. Besides, we introduce color tokens into our approach and treat the colorization task as a classification problem, which increases the saturation of results. We also propose a series of modules to make image features interact with color tokens, and restrict the range of possible color candidates, which makes our results visually pleasing and reasonable. In addition, our method does not require any additional external priors, which ensures its well generalization capability. Extensive experiments and user studies demonstrate that our method achieves superior performance than previous works.

S. Weng and J. Sun—Equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Antic, J.: A deep learning based project for colorizing and restoring old images (and video!). https://github.com/jantic/DeOldify
Ardizzone, L., Lüth, C., Kruse, J., Rother, C., Köthe, U.: Guided image generation with conditional invertible neural networks. arXiv (2019)
Google Scholar
Cao, Y., Zhou, Z., Zhang, W., Yu, Y.: Unsupervised diverse colorization via generative adversarial networks. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10534, pp. 151–166. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71249-9_10
Chapter Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Chen, H., et al.: Pre-trained image processing transformer. In: CVPR (2021)
Google Scholar
Cheng, Z., Yang, Q., Sheng, B.: Deep colorization. In: ICCV (2015)
Google Scholar
Chu, X., Zhang, B., Tian, Z., Wei, X., Xia, H.: Do we really need explicit position encodings for vision transformers? arXiv (2021)
Google Scholar
Deshpande, A., Lu, J., Yeh, M.C., Chong, M.J., Forsyth, D.: Learning diverse image colorization. In: CVPR (2017)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16 $\times $ 16 words: transformers for image recognition at scale. In: ICLR (2021)
Google Scholar
Guadarrama, S., Dahl, R., Bieber, D., Norouzi, M., Shlens, J., Murphy, K.: PixColor: pixel recursive colorization. arXiv (2017)
Google Scholar
Hanin, B., Rolnick, D.: How to start training: the effect of initialization and architecture. In: NIPS (2018)
Google Scholar
Hasler, D., Suesstrunk, S.E.: Measuring colorfulness in natural images. In: Human vision and electronic imaging VIII (2003)
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. arXiv (2021)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Klambauer, G., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a Nash equilibrium. In: NIPS (2017)
Google Scholar
Huynh-Thu, Q., Ghanbari, M.: Scope of validity of PSNR in image/video quality assessment. Electron, Lett (2008)
Google Scholar
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Let there be color! joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ToG (2016)
Google Scholar
Kumar, M., Weissenborn, D., Kalchbrenner, N.: Colorization transformer. In: ICLR (2021)
Google Scholar
Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 577–593. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_35
Chapter Google Scholar
Lei, C., Wu, Y., Chen, Q.: Towards photorealistic colorization by imagination. arXiv (2021)
Google Scholar
Li, W., Lu, X., Lu, J., Zhang, X., Jia, J.: On efficient transformer and image pre-training for low-level vision. arXiv (2021)
Google Scholar
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: SwinIR: image restoration using swin transformer. In: ICCV (2021)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. ICCV (2021)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Strudel, R., Garcia, R., Laptev, I., Schmid, C.: Segmenter: transformer for semantic segmentation. In: ICCV (2021)
Google Scholar
Su, J.W., Chu, H.K., Huang, J.B.: Instance-aware image colorization. In: CVPR (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
Google Scholar
Vitoria, P., Raad, L., Ballester, C.: ChromaGAN: adversarial picture colorization with semantic class distribution. In: WACV (2020)
Google Scholar
Wang, Z., Cun, X., Bao, J., Liu, J.: Uformer: a general u-shaped transformer for image restoration. arXiv (2021)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. TIP (2004)
Google Scholar
Wu, Y., Wang, X., Li, Y., Zhang, H., Zhao, X., Shan, Y.: Towards vivid and diverse image colorization with generative color prior. In: ICCV (2021)
Google Scholar
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: ECCV (2016)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Google Scholar
Zhang, R., et al.: Real-time user-guided image colorization with learned deep priors. ACM TOG (2017)
Google Scholar
Zhao, J., Han, J., Shao, L., Snoek, C.G.: Pixelated semantic colorization. IJCV (2020). https://doi.org/10.1007/s11263-019-01271-4
Zhao, J., Liu, L., Snoek, C.G., Han, J., Shao, L.: Pixel-level semantics guided image colorization. In: BMVC (2018)
Google Scholar
Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: CVPR (2021)
Google Scholar
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: ICLR (2020)
Google Scholar

Download references

Acknowledgements

This project is supported by National Natural Science Foundation of China under Grant No. 62136001.

Author information

Authors and Affiliations

NERCVT, School of Computer Science, Peking University, Beijing, China
Shuchen Weng & Boxin Shi
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
Jimeng Sun & Si Li
International Digital Economy Academy, Shenzhen, China
Yu Li

Authors

Shuchen Weng
View author publications
You can also search for this author in PubMed Google Scholar
Jimeng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yu Li
View author publications
You can also search for this author in PubMed Google Scholar
Si Li
View author publications
You can also search for this author in PubMed Google Scholar
Boxin Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Boxin Shi .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3902 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Weng, S., Sun, J., Li, Y., Li, S., Shi, B. (2022). $\mathrm {CT^2}$: Colorization Transformer via Color Tokens. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13667. Springer, Cham. https://doi.org/10.1007/978-3-031-20071-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-20071-7_1
Published: 13 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20070-0
Online ISBN: 978-3-031-20071-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

\(\mathrm {CT^2}\): Colorization Transformer via Color Tokens

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 3902 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

\(\mathrm {CT^2}\): Colorization Transformer via Color Tokens

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 3902 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation