Generating Handwriting via Decoupled Style Descriptors

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12357)


Representing a space of handwriting stroke styles includes the challenge of representing both the style of each character and the overall style of the human writer. Existing VRNN approaches to representing handwriting often do not distinguish between these different style components, which can reduce model capability. Instead, we introduce the Decoupled Style Descriptor (DSD) model for handwriting, which factors both character- and writer-level styles and allows our model to represent an overall greater space of styles. This approach also increases flexibility: given a few examples, we can generate handwriting in new writer styles, and also now generate handwriting of new characters across writer styles. In experiments, our generated results were preferred over a state of the art baseline method 88% of the time, and in a writer identification task on 20 held-out writers, our DSDs achieved 89.38% accuracy from a single sample word. Overall, DSDs allows us to improve both the quality and flexibility over existing handwriting stroke generation approaches.



This work was supported by the Sloan Foundation and the National Science Foundation under award number IIS-1652561. We thank Kwang In Kim for fruitful discussions and for being our matrix authority. We thank Naveen Srinivasan and Purvi Goel for the ECCV deadline snack delivery service. Finally, we thank all anonymous writers who contributed to our dataset.

Supplementary material

504453_1_En_45_MOESM1_ESM.pdf (5.3 mb)
Supplementary material 1 (pdf 5433 KB)


  1. 1.
    Adak, C., Chaudhuri, B.B., Lin, C.T., Blumenstein, M.: Intra-variable handwriting inspection reinforced with idiosyncrasy analysis (2019)Google Scholar
  2. 2.
    Aksan, E., Pece, F., Hilliges, O.: DeepWriting: making digital ink editable via deep generative modeling. In: SIGCHI Conference on Human Factors in Computing Systems, CHI 2018. ACM, New York (2018)Google Scholar
  3. 3.
    Alonso, E., Moysset, B., Messina, R.O.: Adversarial generation of handwritten text images conditioned on sequences. arXiv abs/1903.00277 (2019)Google Scholar
  4. 4.
    Azadi, S., Fisher, M., Kim, V.G., Wang, Z., Shechtman, E., Darrell, T.: Multi-content GAN for few-shot font style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7564–7573 (2018)Google Scholar
  5. 5.
    Balashova, E., Bermano, A.H., Kim, V.G., DiVerdi, S., Hertzmann, A., Funkhouser, T.: Learning a stroke-based representation for fonts. Comput. Graph. Forum 38(1), 429–442 (2019).
  6. 6.
    Baluja, S.: Learning typographic style: from discrimination to synthesis. Mach. Vis. Appl. 28(5–6), 551–568 (2017)CrossRefGoogle Scholar
  7. 7.
    Bennour, A., Djeddi, C., Gattal, A., Siddiqi, I., Mekhaznia, T.: Handwriting based writer recognition using implicit shape codebook. Forensic Sci. Int. 301, 91–100 (2019)CrossRefGoogle Scholar
  8. 8.
    Bensefia, A., Nosary, A., Paquet, T., Heutte, L.: Writer identification by writer’s invariants. In: Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 274–279. IEEE (2002)Google Scholar
  9. 9.
    Berio, D., Akten, M., Leymarie, F.F., Grierson, M., Plamondon, R.: Calligraphic stylisation learning with a physiologically plausible model of movement and recurrent neural networks. In: Proceedings of the 4th International Conference on Movement Computing, pp. 1–8 (2017)Google Scholar
  10. 10.
    Bishop, C.M.: Mixture density networks (1994)Google Scholar
  11. 11.
    Bulacu, M., Schomaker, L.: Text-independent writer identification and verification using textural and allographic features. IEEE Trans. Pattern Anal. Mach. Intell. 29(4), 701–717 (2007)CrossRefGoogle Scholar
  12. 12.
    Campbell, N.D., Kautz, J.: Learning a manifold of fonts. ACM Trans. Graph. (TOG) 33(4), 1–11 (2014)CrossRefGoogle Scholar
  13. 13.
    Carter, S., Ha, D., Johnson, I., Olah, C.: Experiments in handwriting with a neural network. Distill (2016).
  14. 14.
    Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734. Association for Computational Linguistics, Doha, Qatar, October 2014.
  15. 15.
    Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A.C., Bengio, Y.: A recurrent latent variable model for sequential data. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 2980–2988. Curran Associates, Inc. (2015).
  16. 16.
    Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  17. 17.
    Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013)
  18. 18.
    Ha, D., Dai, A., Le, Q.V.: Hypernetworks. arXiv preprint arXiv:1609.09106 (2016)
  19. 19.
    Haines, T.S.F., Mac Aodha, O., Brostow, G.J.: My text in your handwriting. ACM Trans. Graph. 35(3) (2016).
  20. 20.
    He, S., Wiering, M., Schomaker, L.: Junction detection in handwritten documents and its application to writer identification. Pattern Recogn. 48(12), 4036–4048 (2015)CrossRefGoogle Scholar
  21. 21.
    Heigold, G., Moreno, I., Bengio, S., Shazeer, N.: End-to-end text-dependent speaker verification. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5115–5119. IEEE (2016)Google Scholar
  22. 22.
    Hsu, W.N., Glass, J.: Scalable factorized hierarchical variational autoencoder training. arXiv preprint arXiv:1804.03201 (2018)
  23. 23.
    Hsu, W.N., Zhang, Y., Glass, J.: Unsupervised learning of disentangled and interpretable representations from sequential data. In: Advances in Neural Information Processing Systems (2017)Google Scholar
  24. 24.
    Hu, C., Hersch, R.D.: Parameterizable fonts based on shape components. IEEE Comput. Graphics Appl. 21(3), 70–85 (2001)Google Scholar
  25. 25.
    Hu, Q., Szabó, A., Portenier, T., Favaro, P., Zwicker, M.: Disentangling factors of variation by mixing them. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar
  26. 26.
    Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  27. 27.
    Jia, X., De Brabandere, B., Tuytelaars, T., Gool, L.V.: Dynamic filter networks. In: Advances in Neural Information Processing Systems, pp. 667–675 (2016)Google Scholar
  28. 28.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
  29. 29.
    Kotovenko, D., Sanakoyeu, A., Lang, S., Ommer, B.: Content and style disentanglement for artistic style transfer. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4422–4431 (2019)Google Scholar
  30. 30.
    Lian, Z., Zhao, B., Chen, X., Xiao, J.: EasyFont: a style learning-based system to easily build your large-scale handwriting fonts. ACM Trans. Graph. (TOG) 38(1), 1–18 (2018)CrossRefGoogle Scholar
  31. 31.
    Lopes, R.G., Ha, D., Eck, D., Shlens, J.: A learned representation for scalable vector graphics. In: The IEEE International Conference on Computer Vision (ICCV), October 2019Google Scholar
  32. 32.
    Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002)CrossRefGoogle Scholar
  33. 33.
    van den Oord, A., et al.: WaveNet: a generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)
  34. 34.
    Qian, K., Zhang, Y., Chang, S., Yang, X., Hasegawa-Johnson, M.: AUTOVC: zero-shot voice style transfer with only autoencoder loss. In: International Conference on Machine Learning, pp. 5210–5219 (2019)Google Scholar
  35. 35.
    Shen, J., et al.: Natural TTS synthesis by conditioning WaveNet on MEL spectrogram predictions. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4779–4783. IEEE (2018)Google Scholar
  36. 36.
    Stanley, K.O., D’Ambrosio, D.B., Gauci, J.: A hypercube-based encoding for evolving large-scale neural networks. Artif. Life 15(2), 185–212 (2009)CrossRefGoogle Scholar
  37. 37.
    Suveeranont, R., Igarashi, T.: Example-based automatic font generation. In: Taylor, R., Boulanger, P., Krüger, A., Olivier, P. (eds.) SG 2010. LNCS, vol. 6133, pp. 127–138. Springer, Heidelberg (2010). Scholar
  38. 38.
    Wang, H., Liang, X., Zhang, H., Yeung, D.Y., Xing, E.P.: ZM-NET: real-time zero-shot image manipulation network. arXiv preprint arXiv:1703.07255 (2017)
  39. 39.
    Zhang, X.Y., Xie, G.S., Liu, C.L., Bengio, Y.: End-to-end online writer identification with recurrent neural network. IEEE Trans. Hum.-Mach. Syst. 47(2), 285–292 (2016)CrossRefGoogle Scholar
  40. 40.
    Zhang, X.Y., Yin, F., Zhang, Y.M., Liu, C.L., Bengio, Y.: Drawing and recognizing Chinese characters with recurrent neural network. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 849–862 (2017)CrossRefGoogle Scholar
  41. 41.
    Zongker, D.E., Wade, G., Salesin, D.H.: Example-based hinting of true type fonts. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 411–416 (2000)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Brown UniversityProvidenceUSA

Personalised recommendations