Abstract
We present CoGS, a novel method for the style-conditioned, sketch-driven synthesis of images. CoGS enables exploration of diverse appearance possibilities for a given sketched object, enabling decoupled control over the structure and the appearance of the output. Coarse-grained control over object structure and appearance are enabled via an input sketch and an exemplar “style” conditioning image to a transformer-based sketch and style encoder to generate a discrete codebook representation. We map the codebook representation into a metric space, enabling fine-grained control over selection and interpolation between multiple synthesis options before generating the image via a vector quantized GAN (VQGAN) decoder. Our framework thereby unifies search and synthesis tasks, in that a sketch and style pair may be used to run an initial synthesis which may be refined via combination with similar results in a search corpus to produce an image more closely matching the user’s intent. We show that our model, trained on the 125 object classes of our newly created Pseudosketches dataset, is capable of producing a diverse gamut of semantic content and appearance styles.
C. Ham and G. C. Tarres—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ashual, O., Wolf, L.: Specifying object attributes and relations in interactive scene generation. In: Proceedings of the CVPR (2019)
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28(3), 24 (2009)
Barnes, C., Zhang, F.-L.: A survey of the state-of-the-art in patch-based synthesis. Comput. Visual Media 3(1), 3–20 (2016). https://doi.org/10.1007/s41095-016-0064-2
Bui, T., Ribeiro, L., Collomosse, J., Ponti, M.: Sketching out the details: Sketch-based image retrieval using convolutional neural networks with multi-stage regression. Comput. Graph. 71, 77–87 (2018)
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)
Casanova, A., Careil, M., Verbeek, J., Drozdzal, M., Romero-Soriano, A.: Instance-conditioned gan. arXiv preprint arXiv:2109.05070 (2021)
Chen, T., Cheng, M.M., Tan, P., Shamir, A., Hu, S.M.: Sketch2Photo: Internet image montage. Proc ACM SIGGRAPH 28(5), 124 (2009)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 1597–1607. PMLR, 13–18 July 2020. https://proceedings.mlr.press/v119/chen20j.html
Chen, W., Hays, J.: SketchyGAN: towards diverse and realistic sketch to image synthesis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, June 2016
Collomosse, J., Bui, T., Wilber, M., Fang, C., Jin, H.: Sketching with style: Visual search with sketches and aesthetic context. In: Proceedings of the ICCV (2017)
Collomosse, J.P., McNeill, G., Watts, L.: Free-hand sketch grouping for video retrieval. In: Proceedings of the ICPR (2008)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Efros, A., Freeman, W.: Image quilting for texture synthesis and transfer. In: Proceedings of the SIGGRAPH (2001)
Eitz, M., Hays, J., Alexa, M.: How do humans sketch objects? ACM Trans. Graph. (Proc. SIGGRAPH) 31(4), 44:1–44:10 (2012)
Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis (2020)
Gao, C., Liu, Q., Xu, Q., Wang, L., Liu, J., Zou, C.: SketchyCOCO: image generation from freehand scene sketches. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
Gao, H., Chen, Z., Huang, B., Chen, J., Li, Z.: Image super-resolution based on conditional generative adversarial network. IET Image Proc. 14(13), 3006–3013 (2020)
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2414–2423 (2016). https://doi.org/10.1109/CVPR.2016.265
Ghosh, A., et al.: Interactive sketch & fill: multiclass sketch-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision (2019)
Gucluturk, Y., Guclu, U., van Lier, R., van Gerven, M.A.: Convolutional sketch inversion. In: Proceedings of the ECCV Workshop on Vision and Art (VISART) (2016)
Guo, X., Yang, H., Huang, D.: Image inpainting via conditional texture and structure dual generation. In: Conference: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Hays, J., Efros, A.A.: Scene completion using millions of photographs. ACM Trans. Graph. 26(3), 4 (2007)
Hertzmann, A., Jacobs, C.E., Oliver, N., Curless, B., Salesin, D.H.: Image analogies. In: Proceedings of the ACM SIGGRAPH. pp. 327–340 (2001)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Adv. Neural Inf. Process. Syst. 30 (2017)
Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=Bklr3j0cKX
Hospedales, T., Song, Y.Z.: Sketch me that shoe. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), January 2016
Huang, X., Mallya, A., Wang, T.C., Liu, M.Y.: Multimodal conditional image synthesis with product-of-experts GANs (2021)
Hwang, J., Oh, S.W., Lee, J., Han, B.: Exemplar-based open-set panoptic segmentation network. CoRR abs/2105.08336 (2021). https://arxiv.org/abs/2105.08336
Hénaff, O.J., Razavi, A., Doersch, C., Eslami, S.M.A., Oord, A.v.d.: Data-efficient image recognition with contrastive predictive coding (2019). https://arxiv.org/abs/1905.09272, cite arxiv:1905.09272
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Let there be color!: Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Trans. Graph. (Proc. of SIGGRAPH 2016) 35(6) (2016)
Inoue, N., Ito, D., Xu, N., Yang, J., Price, B., Yamasaki, T.: Learning to trace: expressive line drawing generation from photographs. Comput. Graph. Forum 38(7), 69–80 (2019)
Isola, P., Zhu, J., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976 (2017). https://doi.org/10.1109/CVPR.2017.632
Johnson, J., Gupta, A., Fei-Fei, L.: Image synthesis from reconfigurable layout and style. In: Proceedings of the CVPR (2018)
Jongejan, J., Rowley, H., Kawashima, T., Kim, J., Fox-Gieg, N.: The quick, draw! A.I. experiment (2016). https://quickdraw.withgoogle.com/
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. ArXiv e-prints, December 2013
Lu, Y., Wu, S., Tai, Y.W., Tang, C.K.: Image generation from sketch constraint using contextual GAN. In: The European Conference on Computer Vision (ECCV), September 2018
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Ramesh, A., et al.: Zero-shot text-to-image generation. arXiv preprint arXiv:2102.12092 (2021)
Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning what and where to draw. In: Advances in Neural Information Processing Systems (NIPS) (2016)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text-to-image synthesis. In: Proceedings ICML (2016)
Ribeiro, L., Bui, T., Collomosse, J., Ponti, M.: Scene designer: a unified model for scene search and synthesis from sketch. In: Proceedings of CVPRW on Sketch and Human Expressivity (SHE) (2021)
Ribeiro, L.S.F., Bui, T., Collomosse, J., Ponti, M.: Sketchformer: transformer-based representation for sketched structure. In: Proceedings of CVPR (2020)
Ruta, D., et al.: Aladin: all layer adaptive instance normalization for fine-grained style similarity. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11906–11915 (2021)
Sangkloy, P., Burnell, N., Ham, C., Hays, J.: The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans. Graph. 35(4), 119 (2016)
Sangkloy, P., Burnell, N., Ham, C., Hays, J.: The sketchy database: Learning to retrieve badly drawn bunnies. ACM Trans. Graph. 35(4) (2016). https://doi.org/10.1145/2897824.2925954, https://doi.org/10.1145/2897824.2925954
Sangkloy, P., Lu, J., Fang, C., Yu, F., Hays, J.: Scribbler: controlling deep image synthesis with sketch and color. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5400–5409 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Song, J., Song, Y.Z., Xiang, T., Hospedales, T., Ruan, X.: Deep multi-task attribute-driven ranking for fine-grained sketch-based image retrieval. In: British Machine Vision Conference (2016)
Song, J., Yu, Q., Song, Y.Z., Xiang, T., Hospedales, T.M.: Deep spatial-semantic attention for fine-grained sketch-based image retrieval. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 2017
Sun, W., Wu, T.: Image synthesis from reconfigurable layout and style. In: Proceedings of CVPR (2019)
Sylvain, T., Zhang, P., Bengio, Y., Hjelm, D., Sharma, S.: Object-centric image generation from layouts. arXiv preprint arXiv:2003.07449 (2020)
Tang, H., Liu, H., Xu, D., Torr, P., Sebe, N.: Attentiongan: unpaired image-to-image translation using attention-guided generative adversarial networks. arXiv preprint arXiv:1911.11897 (2019)
Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. CoRR abs/1906.05849 (2019). https://arxiv.org/abs/1906.05849
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Wexler, Y., Shechtman, E., Irani, M.: Space-time video completion. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. vol. 1, pp. I-I. IEEE (2004)
Xian, W., et al.: TextureGAN: controlling deep image synthesis with texture patches. arXiv preprint arXiv:1706.02823 (2017)
Xue, Y., Guo, Y.-C., Zhang, H., Xu, T., Zhang, S.-H., Huang, X.: Deep image synthesis from intuitive user input: a review and perspectives. Comput. Visual Media 8(1), 3–31 (2021). https://doi.org/10.1007/s41095-021-0234-8
Yang, Y., Hossain, M.Z., Gedeon, T., Rahman, S.: S2FGAN: semantically aware interactive sketch-to-face translation. arXiv preprint arXiv:2011.14785 (2020)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595 (2018)
Zhao, B., Meng, L., Yin, W., Sigal, L.: Image generation from layout. In: Proceedings of CVPR (2019)
Zhou, X., et al.: Full-resolution correspondence learning for image translation. CoRR abs/2012.02047 (2020). https://arxiv.org/abs/2012.02047
Zhu, J.Y., Krahenbuhl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Proceedings of ECCV (2016)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ham, C., Tarrés, G.C., Bui, T., Hays, J., Lin, Z., Collomosse, J. (2022). CoGS: Controllable Generation and Search from Sketch and Style. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13676. Springer, Cham. https://doi.org/10.1007/978-3-031-19787-1_36
Download citation
DOI: https://doi.org/10.1007/978-3-031-19787-1_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19786-4
Online ISBN: 978-3-031-19787-1
eBook Packages: Computer ScienceComputer Science (R0)