Abstract
The study of neural generative models of handwritten text and human sketches is a hot topic in the computer vision field. The landmark SketchRNN provided a breakthrough by sequentially generating sketches as a sequence of waypoints, and more recent articles have managed to generate fully vector sketches by coding the strokes as Bézier curves. However, the previous attempts with this approach need them all a ground truth consisting in the sequence of points that make up each stroke, which seriously limits the datasets the model is able to train in. In this work, we present a self-supervised end-to-end inverse graphics approach that learns to embed each image to its best fit of Bézier curves. The self-supervised nature of the training process allows us to train the model in a wider range of datasets, but also to perform better after-training predictions by applying an overfitting process on the input binary image. We report qualitative an quantitative evaluations on the MNIST and the Quick, Draw! datasets.
Keywords
- Inverse graphics
- Sketch parametrization
- Bézier curve
- Chamfer distance
- Symbol recognition
This is a preview of subscription content, access via your institution.
Buying options










References
Bahdanau, D., Cho, K.H., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations, ICLR (2015)
Bhunia, A.K., Chowdhury, P.N., Yang, Y., Hospedales, T., Xiang, T., Song, Y.Z.: Vectorization and rasterization: Self-supervised learning for sketch and handwriting. In: CVPR (2021)
de Boor, C.: A Practical Guide to Spline, vol. 27, January 1978. https://doi.org/10.2307/2006241
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part I. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chan, W., Saharia, C., Hinton, G., Norouzi, M., Jaitly, N.: Imputer: sequence modelling via imputation and dynamic programming. In: International Conference on Machine Learning, ICML, pp. 1403–1413 (2020)
Chen, W., Hays, J.: SketchyGAN: towards diverse and realistic sketch to image synthesis. In: CVPR (2018)
Dantanarayana, L., Dissanayake, G., Ranasinge, R.: C-log: a chamfer distance based algorithm for localisation in occupancy grid-maps. CAAI Trans. Intell. Technol. 1(3), 272–284 (2016)
Das, A., Yang, Y., Hospedales, T., Xiang, T., Song, Y.-Z.: BézierSketch: a generative model for scalable vector sketches. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part XXVI. LNCS, vol. 12371, pp. 632–647. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58574-7_38
Dey, S., Riba, P., Dutta, A., Lladós, J., Song, Y.Z.: Doodle to search: practical zero-shot sketch-based image retrieval. In: CVPR, pp. 2179–2188 (2019)
Egiazarian, V., et al.: Deep vectorization of technical drawings. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part XIII. LNCS, vol. 12358, pp. 582–598. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_35
Gu, J., Bradbury, J., Xiong, C., Li, V.O., Socher, R.: Non-autoregressive neural machine translation. In: International Conference on Learning Representations, ICLR (2018)
Ha, D., Eck, D.: A neural representation of sketch drawings. In: International Conference on Learning Representations, ICLR (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hug, R., Hübner, W., Arens, M.: Introducing probabilistic bézier curves for n-step sequence prediction. In: AAAI Conf. Artif. Intell., vol. 34, issue 06, pp. 10162–10169 (2020). https://doi.org/10.1609/aaai.v34i06.6576
Kenton, J.D., Ming-Wei, C., Toutanova, L.K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186 (2019)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, ICLR (2015)
Kulkarni, T.D., Whitney, W.F., Kohli, P., Tenenbaum, J.: Deep convolutional inverse graphics network. In: International Conference on – Neural Information Processing Systems (2015)
Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Liu, M.Y., Tuzel, O., Veeraraghavan, A., Chellappa, R.: Fast directional chamfer matching, pp. 1696–1703 (2010). https://doi.org/10.1109/CVPR.2010.5539837
Liu, Y., Wang, W.: A revisit to least squares orthogonal distance fitting of parametric curves and surfaces. In: Chen, F., Jüttler, B. (eds.) GMP 2008. LNCS, vol. 4975, pp. 384–397. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79246-8_29
Lüscher, C., et al.: RWTH ASR systems for librispeech: hybrid vs attention. In: Proceedings of the Interspeech, pp. 231–235 (2019)
Mellor, J.F., et al.: Unsupervised doodling and painting with improved spiral. arXiv preprint arXiv:1910.01007 (2019)
Parmar, N., et al.: Image transformer. In: International Conference on Machine Learning, ICML, pp. 4055–4064 (2018)
Paszke, A., et al.: Pytorch: An imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703 (2019)
Plass, M., Stone, M.: Curve-fitting with piecewise parametric cubics. In: Proceedings of the annual conference on Computer Graphics and Interactive Techniques, pp. 229–239 (1983)
Revow, M., Williams, C., Hinton, G.: Using generative models for handwritten digit recognition. IEEE PAMI 18(6), 592–606 (1996). https://doi.org/10.1109/34.506410
Romaszko, L., Williams, C.K., Moreno, P., Kohli, P.: Vision-as-inverse-graphics: obtaining a rich 3D explanation of a scene from a single image. In: IEEE International Conference on Computer Vision, pp. 851–859 (2017)
Salomon, D.: Curves and Surfaces for Computer Graphics. Springer-Verlag, New York (2005). https://doi.org/10.1007/0-387-28452-4
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: ICONIP, pp. 3104–3112 (2014)
Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y., et al.: Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 99, 1057–1063 (1999)
Synnaeve, G., et al.: End-to-end ASR: from supervised to semi-supervised learning with modern architectures. arXiv preprint arXiv:1911.08460 (2019)
Tripathi, A., Dani, R.R., Mishra, A., Chakraborty, A.: Sketch-guided object localization in natural images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part VI. LNCS, vol. 12351, pp. 532–547. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_32
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)
Zhang, H., Liu, S., Zhang, C., Ren, W., Wang, R., Cao, X.: SketchNet: sketch classification with web images. In: CVPR (2016)
Acknowledgment
This work has been partially supported by the Spanish projects RTI2018-095645-B-C21 and FCT-19-15244, the Catalan project 2017-SGR-1783, and the CERCA Program/Generalitat de Catalunya.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Suso, A., Riba, P., Terrades, O.R., Lladós, J. (2021). A Self-supervised Inverse Graphics Approach for Sketch Parametrization. In: Barney Smith, E.H., Pal, U. (eds) Document Analysis and Recognition – ICDAR 2021 Workshops. ICDAR 2021. Lecture Notes in Computer Science(), vol 12916. Springer, Cham. https://doi.org/10.1007/978-3-030-86198-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-86198-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86197-1
Online ISBN: 978-3-030-86198-8
eBook Packages: Computer ScienceComputer Science (R0)
-
Published in cooperation with
http://www.iapr.org/