Skip to main content

CoGS: Controllable Generation and Search from Sketch and Style

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13676))

Included in the following conference series:

Abstract

We present CoGS, a novel method for the style-conditioned, sketch-driven synthesis of images. CoGS enables exploration of diverse appearance possibilities for a given sketched object, enabling decoupled control over the structure and the appearance of the output. Coarse-grained control over object structure and appearance are enabled via an input sketch and an exemplar “style” conditioning image to a transformer-based sketch and style encoder to generate a discrete codebook representation. We map the codebook representation into a metric space, enabling fine-grained control over selection and interpolation between multiple synthesis options before generating the image via a vector quantized GAN (VQGAN) decoder. Our framework thereby unifies search and synthesis tasks, in that a sketch and style pair may be used to run an initial synthesis which may be refined via combination with similar results in a search corpus to produce an image more closely matching the user’s intent. We show that our model, trained on the 125 object classes of our newly created Pseudosketches dataset, is capable of producing a diverse gamut of semantic content and appearance styles.

C. Ham and G. C. Tarres—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ashual, O., Wolf, L.: Specifying object attributes and relations in interactive scene generation. In: Proceedings of the CVPR (2019)

    Google Scholar 

  2. Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28(3), 24 (2009)

    Article  Google Scholar 

  3. Barnes, C., Zhang, F.-L.: A survey of the state-of-the-art in patch-based synthesis. Comput. Visual Media 3(1), 3–20 (2016). https://doi.org/10.1007/s41095-016-0064-2

    Article  Google Scholar 

  4. Bui, T., Ribeiro, L., Collomosse, J., Ponti, M.: Sketching out the details: Sketch-based image retrieval using convolutional neural networks with multi-stage regression. Comput. Graph. 71, 77–87 (2018)

    Google Scholar 

  5. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)

    Article  Google Scholar 

  6. Casanova, A., Careil, M., Verbeek, J., Drozdzal, M., Romero-Soriano, A.: Instance-conditioned gan. arXiv preprint arXiv:2109.05070 (2021)

  7. Chen, T., Cheng, M.M., Tan, P., Shamir, A., Hu, S.M.: Sketch2Photo: Internet image montage. Proc ACM SIGGRAPH 28(5), 124 (2009)

    Article  Google Scholar 

  8. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 1597–1607. PMLR, 13–18 July 2020. https://proceedings.mlr.press/v119/chen20j.html

  9. Chen, W., Hays, J.: SketchyGAN: towards diverse and realistic sketch to image synthesis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

    Google Scholar 

  10. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, June 2016

    Google Scholar 

  11. Collomosse, J., Bui, T., Wilber, M., Fang, C., Jin, H.: Sketching with style: Visual search with sketches and aesthetic context. In: Proceedings of the ICCV (2017)

    Google Scholar 

  12. Collomosse, J.P., McNeill, G., Watts, L.: Free-hand sketch grouping for video retrieval. In: Proceedings of the ICPR (2008)

    Google Scholar 

  13. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  14. Efros, A., Freeman, W.: Image quilting for texture synthesis and transfer. In: Proceedings of the SIGGRAPH (2001)

    Google Scholar 

  15. Eitz, M., Hays, J., Alexa, M.: How do humans sketch objects? ACM Trans. Graph. (Proc. SIGGRAPH) 31(4), 44:1–44:10 (2012)

    Google Scholar 

  16. Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis (2020)

    Google Scholar 

  17. Gao, C., Liu, Q., Xu, Q., Wang, L., Liu, J., Zou, C.: SketchyCOCO: image generation from freehand scene sketches. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

    Google Scholar 

  18. Gao, H., Chen, Z., Huang, B., Chen, J., Li, Z.: Image super-resolution based on conditional generative adversarial network. IET Image Proc. 14(13), 3006–3013 (2020)

    Article  Google Scholar 

  19. Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2414–2423 (2016). https://doi.org/10.1109/CVPR.2016.265

  20. Ghosh, A., et al.: Interactive sketch & fill: multiclass sketch-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision (2019)

    Google Scholar 

  21. Gucluturk, Y., Guclu, U., van Lier, R., van Gerven, M.A.: Convolutional sketch inversion. In: Proceedings of the ECCV Workshop on Vision and Art (VISART) (2016)

    Google Scholar 

  22. Guo, X., Yang, H., Huang, D.: Image inpainting via conditional texture and structure dual generation. In: Conference: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  23. Hays, J., Efros, A.A.: Scene completion using millions of photographs. ACM Trans. Graph. 26(3), 4 (2007)

    Article  Google Scholar 

  24. Hertzmann, A., Jacobs, C.E., Oliver, N., Curless, B., Salesin, D.H.: Image analogies. In: Proceedings of the ACM SIGGRAPH. pp. 327–340 (2001)

    Google Scholar 

  25. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  26. Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=Bklr3j0cKX

  27. Hospedales, T., Song, Y.Z.: Sketch me that shoe. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), January 2016

    Google Scholar 

  28. Huang, X., Mallya, A., Wang, T.C., Liu, M.Y.: Multimodal conditional image synthesis with product-of-experts GANs (2021)

    Google Scholar 

  29. Hwang, J., Oh, S.W., Lee, J., Han, B.: Exemplar-based open-set panoptic segmentation network. CoRR abs/2105.08336 (2021). https://arxiv.org/abs/2105.08336

  30. Hénaff, O.J., Razavi, A., Doersch, C., Eslami, S.M.A., Oord, A.v.d.: Data-efficient image recognition with contrastive predictive coding (2019). https://arxiv.org/abs/1905.09272, cite arxiv:1905.09272

  31. Iizuka, S., Simo-Serra, E., Ishikawa, H.: Let there be color!: Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Trans. Graph. (Proc. of SIGGRAPH 2016) 35(6) (2016)

    Google Scholar 

  32. Inoue, N., Ito, D., Xu, N., Yang, J., Price, B., Yamasaki, T.: Learning to trace: expressive line drawing generation from photographs. Comput. Graph. Forum 38(7), 69–80 (2019)

    Google Scholar 

  33. Isola, P., Zhu, J., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976 (2017). https://doi.org/10.1109/CVPR.2017.632

  34. Johnson, J., Gupta, A., Fei-Fei, L.: Image synthesis from reconfigurable layout and style. In: Proceedings of the CVPR (2018)

    Google Scholar 

  35. Jongejan, J., Rowley, H., Kawashima, T., Kim, J., Fox-Gieg, N.: The quick, draw! A.I. experiment (2016). https://quickdraw.withgoogle.com/

  36. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. ArXiv e-prints, December 2013

    Google Scholar 

  37. Lu, Y., Wu, S., Tai, Y.W., Tang, C.K.: Image generation from sketch constraint using contextual GAN. In: The European Conference on Computer Vision (ECCV), September 2018

    Google Scholar 

  38. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)

  39. Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  40. Ramesh, A., et al.: Zero-shot text-to-image generation. arXiv preprint arXiv:2102.12092 (2021)

  41. Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning what and where to draw. In: Advances in Neural Information Processing Systems (NIPS) (2016)

    Google Scholar 

  42. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text-to-image synthesis. In: Proceedings ICML (2016)

    Google Scholar 

  43. Ribeiro, L., Bui, T., Collomosse, J., Ponti, M.: Scene designer: a unified model for scene search and synthesis from sketch. In: Proceedings of CVPRW on Sketch and Human Expressivity (SHE) (2021)

    Google Scholar 

  44. Ribeiro, L.S.F., Bui, T., Collomosse, J., Ponti, M.: Sketchformer: transformer-based representation for sketched structure. In: Proceedings of CVPR (2020)

    Google Scholar 

  45. Ruta, D., et al.: Aladin: all layer adaptive instance normalization for fine-grained style similarity. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11906–11915 (2021)

    Google Scholar 

  46. Sangkloy, P., Burnell, N., Ham, C., Hays, J.: The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans. Graph. 35(4), 119 (2016)

    Article  Google Scholar 

  47. Sangkloy, P., Burnell, N., Ham, C., Hays, J.: The sketchy database: Learning to retrieve badly drawn bunnies. ACM Trans. Graph. 35(4) (2016). https://doi.org/10.1145/2897824.2925954, https://doi.org/10.1145/2897824.2925954

  48. Sangkloy, P., Lu, J., Fang, C., Yu, F., Hays, J.: Scribbler: controlling deep image synthesis with sketch and color. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5400–5409 (2017)

    Google Scholar 

  49. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  50. Song, J., Song, Y.Z., Xiang, T., Hospedales, T., Ruan, X.: Deep multi-task attribute-driven ranking for fine-grained sketch-based image retrieval. In: British Machine Vision Conference (2016)

    Google Scholar 

  51. Song, J., Yu, Q., Song, Y.Z., Xiang, T., Hospedales, T.M.: Deep spatial-semantic attention for fine-grained sketch-based image retrieval. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 2017

    Google Scholar 

  52. Sun, W., Wu, T.: Image synthesis from reconfigurable layout and style. In: Proceedings of CVPR (2019)

    Google Scholar 

  53. Sylvain, T., Zhang, P., Bengio, Y., Hjelm, D., Sharma, S.: Object-centric image generation from layouts. arXiv preprint arXiv:2003.07449 (2020)

  54. Tang, H., Liu, H., Xu, D., Torr, P., Sebe, N.: Attentiongan: unpaired image-to-image translation using attention-guided generative adversarial networks. arXiv preprint arXiv:1911.11897 (2019)

  55. Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. CoRR abs/1906.05849 (2019). https://arxiv.org/abs/1906.05849

  56. Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  57. Wexler, Y., Shechtman, E., Irani, M.: Space-time video completion. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. vol. 1, pp. I-I. IEEE (2004)

    Google Scholar 

  58. Xian, W., et al.: TextureGAN: controlling deep image synthesis with texture patches. arXiv preprint arXiv:1706.02823 (2017)

  59. Xue, Y., Guo, Y.-C., Zhang, H., Xu, T., Zhang, S.-H., Huang, X.: Deep image synthesis from intuitive user input: a review and perspectives. Comput. Visual Media 8(1), 3–31 (2021). https://doi.org/10.1007/s41095-021-0234-8

    Article  Google Scholar 

  60. Yang, Y., Hossain, M.Z., Gedeon, T., Rahman, S.: S2FGAN: semantically aware interactive sketch-to-face translation. arXiv preprint arXiv:2011.14785 (2020)

  61. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595 (2018)

    Google Scholar 

  62. Zhao, B., Meng, L., Yin, W., Sigal, L.: Image generation from layout. In: Proceedings of CVPR (2019)

    Google Scholar 

  63. Zhou, X., et al.: Full-resolution correspondence learning for image translation. CoRR abs/2012.02047 (2020). https://arxiv.org/abs/2012.02047

  64. Zhu, J.Y., Krahenbuhl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Proceedings of ECCV (2016)

    Google Scholar 

  65. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cusuh Ham .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 19572 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ham, C., Tarrés, G.C., Bui, T., Hays, J., Lin, Z., Collomosse, J. (2022). CoGS: Controllable Generation and Search from Sketch and Style. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13676. Springer, Cham. https://doi.org/10.1007/978-3-031-19787-1_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19787-1_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19786-4

  • Online ISBN: 978-3-031-19787-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics