Skip to main content

A Self-supervised Inverse Graphics Approach for Sketch Parametrization

  • 1380 Accesses

Part of the Lecture Notes in Computer Science book series (LNIP,volume 12916)

Abstract

The study of neural generative models of handwritten text and human sketches is a hot topic in the computer vision field. The landmark SketchRNN provided a breakthrough by sequentially generating sketches as a sequence of waypoints, and more recent articles have managed to generate fully vector sketches by coding the strokes as Bézier curves. However, the previous attempts with this approach need them all a ground truth consisting in the sequence of points that make up each stroke, which seriously limits the datasets the model is able to train in. In this work, we present a self-supervised end-to-end inverse graphics approach that learns to embed each image to its best fit of Bézier curves. The self-supervised nature of the training process allows us to train the model in a wider range of datasets, but also to perform better after-training predictions by applying an overfitting process on the input binary image. We report qualitative an quantitative evaluations on the MNIST and the Quick, Draw! datasets.

Keywords

  • Inverse graphics
  • Sketch parametrization
  • Bézier curve
  • Chamfer distance
  • Symbol recognition

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-86198-8_3
  • Chapter length: 15 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-86198-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.

References

  1. Bahdanau, D., Cho, K.H., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations, ICLR (2015)

    Google Scholar 

  2. Bhunia, A.K., Chowdhury, P.N., Yang, Y., Hospedales, T., Xiang, T., Song, Y.Z.: Vectorization and rasterization: Self-supervised learning for sketch and handwriting. In: CVPR (2021)

    Google Scholar 

  3. de Boor, C.: A Practical Guide to Spline, vol. 27, January 1978. https://doi.org/10.2307/2006241

  4. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part I. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    CrossRef  Google Scholar 

  5. Chan, W., Saharia, C., Hinton, G., Norouzi, M., Jaitly, N.: Imputer: sequence modelling via imputation and dynamic programming. In: International Conference on Machine Learning, ICML, pp. 1403–1413 (2020)

    Google Scholar 

  6. Chen, W., Hays, J.: SketchyGAN: towards diverse and realistic sketch to image synthesis. In: CVPR (2018)

    Google Scholar 

  7. Dantanarayana, L., Dissanayake, G., Ranasinge, R.: C-log: a chamfer distance based algorithm for localisation in occupancy grid-maps. CAAI Trans. Intell. Technol. 1(3), 272–284 (2016)

    CrossRef  Google Scholar 

  8. Das, A., Yang, Y., Hospedales, T., Xiang, T., Song, Y.-Z.: BézierSketch: a generative model for scalable vector sketches. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part XXVI. LNCS, vol. 12371, pp. 632–647. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58574-7_38

    CrossRef  Google Scholar 

  9. Dey, S., Riba, P., Dutta, A., Lladós, J., Song, Y.Z.: Doodle to search: practical zero-shot sketch-based image retrieval. In: CVPR, pp. 2179–2188 (2019)

    Google Scholar 

  10. Egiazarian, V., et al.: Deep vectorization of technical drawings. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part XIII. LNCS, vol. 12358, pp. 582–598. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_35

    CrossRef  Google Scholar 

  11. Gu, J., Bradbury, J., Xiong, C., Li, V.O., Socher, R.: Non-autoregressive neural machine translation. In: International Conference on Learning Representations, ICLR (2018)

    Google Scholar 

  12. Ha, D., Eck, D.: A neural representation of sketch drawings. In: International Conference on Learning Representations, ICLR (2018)

    Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  14. Hug, R., Hübner, W., Arens, M.: Introducing probabilistic bézier curves for n-step sequence prediction. In: AAAI Conf. Artif. Intell., vol. 34, issue 06, pp. 10162–10169 (2020). https://doi.org/10.1609/aaai.v34i06.6576

  15. Kenton, J.D., Ming-Wei, C., Toutanova, L.K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186 (2019)

    Google Scholar 

  16. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, ICLR (2015)

    Google Scholar 

  17. Kulkarni, T.D., Whitney, W.F., Kohli, P., Tenenbaum, J.: Deep convolutional inverse graphics network. In: International Conference on – Neural Information Processing Systems (2015)

    Google Scholar 

  18. Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)

    CrossRef  MathSciNet  Google Scholar 

  19. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791

    CrossRef  Google Scholar 

  20. Liu, M.Y., Tuzel, O., Veeraraghavan, A., Chellappa, R.: Fast directional chamfer matching, pp. 1696–1703 (2010). https://doi.org/10.1109/CVPR.2010.5539837

  21. Liu, Y., Wang, W.: A revisit to least squares orthogonal distance fitting of parametric curves and surfaces. In: Chen, F., Jüttler, B. (eds.) GMP 2008. LNCS, vol. 4975, pp. 384–397. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79246-8_29

    CrossRef  Google Scholar 

  22. Lüscher, C., et al.: RWTH ASR systems for librispeech: hybrid vs attention. In: Proceedings of the Interspeech, pp. 231–235 (2019)

    Google Scholar 

  23. Mellor, J.F., et al.: Unsupervised doodling and painting with improved spiral. arXiv preprint arXiv:1910.01007 (2019)

  24. Parmar, N., et al.: Image transformer. In: International Conference on Machine Learning, ICML, pp. 4055–4064 (2018)

    Google Scholar 

  25. Paszke, A., et al.: Pytorch: An imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703 (2019)

  26. Plass, M., Stone, M.: Curve-fitting with piecewise parametric cubics. In: Proceedings of the annual conference on Computer Graphics and Interactive Techniques, pp. 229–239 (1983)

    Google Scholar 

  27. Revow, M., Williams, C., Hinton, G.: Using generative models for handwritten digit recognition. IEEE PAMI 18(6), 592–606 (1996). https://doi.org/10.1109/34.506410

    CrossRef  Google Scholar 

  28. Romaszko, L., Williams, C.K., Moreno, P., Kohli, P.: Vision-as-inverse-graphics: obtaining a rich 3D explanation of a scene from a single image. In: IEEE International Conference on Computer Vision, pp. 851–859 (2017)

    Google Scholar 

  29. Salomon, D.: Curves and Surfaces for Computer Graphics. Springer-Verlag, New York (2005). https://doi.org/10.1007/0-387-28452-4

    CrossRef  MATH  Google Scholar 

  30. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: ICONIP, pp. 3104–3112 (2014)

    Google Scholar 

  31. Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y., et al.: Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 99, 1057–1063 (1999)

    Google Scholar 

  32. Synnaeve, G., et al.: End-to-end ASR: from supervised to semi-supervised learning with modern architectures. arXiv preprint arXiv:1911.08460 (2019)

  33. Tripathi, A., Dani, R.R., Mishra, A., Chakraborty, A.: Sketch-guided object localization in natural images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part VI. LNCS, vol. 12351, pp. 532–547. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_32

    CrossRef  Google Scholar 

  34. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

  35. Zhang, H., Liu, S., Zhang, C., Ren, W., Wang, R., Cao, X.: SketchNet: sketch classification with web images. In: CVPR (2016)

    Google Scholar 

Download references

Acknowledgment

This work has been partially supported by the Spanish projects RTI2018-095645-B-C21 and FCT-19-15244, the Catalan project 2017-SGR-1783, and the CERCA Program/Generalitat de Catalunya.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Albert Suso .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Suso, A., Riba, P., Terrades, O.R., Lladós, J. (2021). A Self-supervised Inverse Graphics Approach for Sketch Parametrization. In: Barney Smith, E.H., Pal, U. (eds) Document Analysis and Recognition – ICDAR 2021 Workshops. ICDAR 2021. Lecture Notes in Computer Science(), vol 12916. Springer, Cham. https://doi.org/10.1007/978-3-030-86198-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86198-8_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86197-1

  • Online ISBN: 978-3-030-86198-8

  • eBook Packages: Computer ScienceComputer Science (R0)