Advertisement

Structural Consistency and Controllability for Diverse Colorization

  • Safa Messaoud
  • David Forsyth
  • Alexander G. Schwing
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11210)

Abstract

Colorizing a given gray-level image is an important task in the media and advertising industry. Due to the ambiguity inherent to colorization (many shades are often plausible), recent approaches started to explicitly model diversity. However, one of the most obvious artifacts, structural inconsistency, is rarely considered by existing methods which predict chrominance independently for every pixel. To address this issue, we develop a conditional random field based variational auto-encoder formulation which is able to achieve diversity while taking into account structural consistency. Moreover, we introduce a controllability mechanism that can incorporate external constraints from diverse sources including a user interface. Compared to existing baselines, we demonstrate that our method obtains more diverse and globally consistent colorizations on the LFW, LSUN-Church and ILSVRC-2015 datasets.

Keywords

Colorization Gaussian-Conditional Random Field VAE 

Notes

Acknowledgments

This material is based upon work supported in part by the National Science Foundation under Grant No. 1718221, Samsung, and 3M. We thank NVIDIA for providing the GPUs used for this research.

Supplementary material

474211_1_En_37_MOESM1_ESM.pdf (50.7 mb)
Supplementary material 1 (pdf 51961 KB)

References

  1. 1.
    Welsh, T., Ashikhmin, M., Mueller, K.: Transferring color to greyscale images. In: SIGGRAPH (2002)Google Scholar
  2. 2.
    Levin, A., Lischinski, D., Weiss, Y.: Colorization using optimization. In: SIGGRAPH (2004)Google Scholar
  3. 3.
    Chia, A.Y.S., et al.: Semantic colorization with internet images. In: SIGGRAPH (2011)Google Scholar
  4. 4.
    Gupta, R.K., Chia, A.Y.S., Rajan, D., Ng, E.S., Zhiyong, H.: Image colorization using similar images. In: ACM Multimedia (2012)Google Scholar
  5. 5.
    Cohen-Or, D., Lischinski, D.: Colorization by example. In: Eurographics Symposium on Rendering (2005)Google Scholar
  6. 6.
    Morimoto, Y., Taguchi, Y., Naemura, T.: Automatic colorization of grayscale images using multiple images on the web. In: SIGGRAPH (2009)Google Scholar
  7. 7.
    Charpiat, G., Hofmann, M., Schölkopf, B.: Automatic image colorization via multimodal predictions. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008 Part III. LNCS, vol. 5304, pp. 126–139. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88690-7_10CrossRefGoogle Scholar
  8. 8.
    Deshpande, A., Rock, J., Forsyth, D.: Learning large-scale automatic image colorization. In: ICCV (2015)Google Scholar
  9. 9.
    Cheng, Z., Yang, Q., Sheng, B.: Deep colorization. In: ICCV (2015)Google Scholar
  10. 10.
    Iizuka, S., Simo-Serra, E., Ishikawa, H.: Let there be color!: Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. In: SIGGRAPH (2016)Google Scholar
  11. 11.
    Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016 Part IV. LNCS, vol. 9908, pp. 577–593. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_35CrossRefGoogle Scholar
  12. 12.
    Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016 Part III. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_40CrossRefGoogle Scholar
  13. 13.
    Varga, D., Szirányi, T.: Twin deep convolutional neural network for example-based image colorization. In: Felsberg, M., Heyden, A., Krüger, N. (eds.) CAIP 2017 Part I. LNCS, vol. 10424, pp. 184–195. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-64689-3_15CrossRefGoogle Scholar
  14. 14.
    Zhang, R., et al.: Real-time user-guided image colorization with learned deep priors. In: SIGGRAPH (2017)Google Scholar
  15. 15.
    Deshpande, A., Lu, J., Yeh, M.C., Forsyth, D.: Learning diverse image colorization. In: CVPR (2017)Google Scholar
  16. 16.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)Google Scholar
  17. 17.
    Guadarrama, S., Dahl, R., Bieber, D., Norouzi, M., Shlens, J., Murphy, K.: PixColor: pixel recursive colorization. In: BMVC (2017)Google Scholar
  18. 18.
    Royer, A., Kolesnikov, A., Lampert, C.H.: Probabilistic image colorization. In: BMVC (2017)Google Scholar
  19. 19.
    Cao, Y., Zhou, Z., Zhang, W., Yu, Y.: Unsupervised diverse colorization via generative adversarial networks. arXiv preprint arXiv:1702.06674 (2017)
  20. 20.
    Zhu, J.Y., et al.: Toward multimodal image-to-image translation. In: NIPS (2017)Google Scholar
  21. 21.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)Google Scholar
  22. 22.
    Learned-Miller, E., Huang, G.B., RoyChowdhury, A., Li, H., Hua, G.: Labeled faces in the wild: a survey. In: Kawulok, M., Celebi, M.E., Smolka, B. (eds.) Advances in Face Detection and Facial Image Analysis, pp. 189–248. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-25958-1_8CrossRefGoogle Scholar
  23. 23.
    Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. CoRR, abs/1506.03365 (2015)Google Scholar
  24. 24.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. IJCV 115, 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Varga, D., Szirányi, T.: Twin deep convolutional neural network for example-based image colorization. In: Felsberg, M., Heyden, A., Krüger, N. (eds.) CAIP 2017 Part I. LNCS, vol. 10424, pp. 184–195. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-64689-3_15CrossRefGoogle Scholar
  26. 26.
    van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., et al.: Conditional image generation with PixelCNN decoders. In: NIPS (2016)Google Scholar
  27. 27.
    Bishop, C.M.: Mixture density networks. Aston University (1994)Google Scholar
  28. 28.
    Laffont, P.Y., Ren, Z., Tao, X., Qian, C., Hays, J.: Transient attributes for high-level understanding and editing of outdoor scenes. In: SIGGRAPH (2014)Google Scholar
  29. 29.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)Google Scholar
  30. 30.
    Yu, A., Grauman, K.: Fine-grained visual comparisons with local learning. In: CVPR (2014)Google Scholar
  31. 31.
    Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016 Part V. LNCS, vol. 9909, pp. 597–613. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_36CrossRefGoogle Scholar
  32. 32.
    Kindermann, R., Snell, J.L.: Markov Random Fields and Their Applications. American Mathematical Society, Providence (1980)CrossRefGoogle Scholar
  33. 33.
    Schwing, A.G., Hazan, T., Pollefeys, M., Urtasun, R.: Distributed message passing for large scale graphical models. In: Proceedings of CVPR (2011)Google Scholar
  34. 34.
    Schwing, A.G., Hazan, T., Pollefeys, M., Urtasun, R.: Globally convergent dual MAP LP relaxation solvers using Fenchel-Young margins. In: Proceedings of NIPS (2012)Google Scholar
  35. 35.
    Schwing, A.G., Hazan, T., Pollefeys, M., Urtasun, R.: Globally convergent parallel MAP LP relaxation solver using the Frank-Wolfe algorithm. In: Proceedings of ICML (2014)Google Scholar
  36. 36.
    Meshi, O., Schwing, A.G.: Asynchronous parallel coordinate minimization for MAP inference. In: Proceedings of NIPS (2017)Google Scholar
  37. 37.
    Rue, H.: Gaussian Markov Random Fields: Theory and Applications. CRC Press, Boca Raton (2008)Google Scholar
  38. 38.
    Vemulapalli, R., Tuzel, O., Liu, M.Y., Chellapa, R.: Gaussian conditional random field network for semantic segmentation. In: CVPR (2016)Google Scholar
  39. 39.
    Chandra, S., Usunier, N., Kokkinos, I.: Dense and low-rank Gaussian CRFs using deep embeddings. In: ICCV (2017)Google Scholar
  40. 40.
    Chandra, S., Kokkinos, I.: Fast, exact and multi-scale inference for semantic image segmentation with deep Gaussian CRFs. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016 Part VII. LNCS, vol. 9911, pp. 402–418. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46478-7_25CrossRefGoogle Scholar
  41. 41.
    Jancsary, J., Nowozin, S., Sharp, T., Rother, C.: Regression tree fieldsan efficient, non-parametric approach to image labeling problems. In: CVPR (2012)Google Scholar
  42. 42.
    Tappen, M.F., Liu, C., Adelson, E.H., Freeman, W.T.: Learning Gaussian conditional random fields for low-level vision. In: CVPR (2007)Google Scholar
  43. 43.
    Vemulapalli, R., Tuzel, O., Liu, M.Y.: Deep Gaussian conditional random field network: a model-based deep network for discriminative denoising. In: CVPR (2016)Google Scholar
  44. 44.
    Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: NIPS (2015)Google Scholar
  45. 45.
    Wang, L., Schwing, A.G., Lazebnik, S.: Diverse and accurate image description using a variational auto-encoder with an additive Gaussian encoding space. In: NIPS (2017)Google Scholar
  46. 46.
    Jain\(^\ast \), U., Zhang\(^\ast \), Z., Schwing, A.G.: Creativity: senerating diverse questions using variational autoencoders. In: Proceedings of CVPR (2017). \(^\ast \) Equal contributionGoogle Scholar
  47. 47.
    Huang, G., Mattar, M., Lee, H., Learned-Miller, E.G.: Learning to align from scratch. In: NIPS (2012)Google Scholar
  48. 48.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. In: IEEE TIP (2004)Google Scholar
  49. 49.
    Endo, Y., Iizuka, S., Kanamori, Y., Mitani, J.: DeepProp: extracting deep features from a single image for edit propagation. In: Eurographics (2016)Google Scholar
  50. 50.
    Barron, J.T., Poole, B.: The fast bilateral solver. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016 Part III. LNCS, vol. 9907, pp. 617–632. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_38CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.University of Illinois at Urbana-ChampaignChampaignUSA

Personalised recommendations