Skip to main content
Log in

Synthesizing data for text recognition with style transfer

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript


Most of the existing datasets for scene text recognition merely consist of a few thousand training samples with a very limited vocabulary, which cannot meet the requirement of the state-of-the-art deep learning based text recognition methods. Meanwhile, although the synthetic datasets (e.g., SynthText90k) usually contain millions of samples, they cannot fit the data distribution of the small target datasets in natural scenes completely. To address these problems, we propose a word data generating method called SynthText-Transfer, which is capable of emulating the distribution of the target dataset. SynthText-Transfer uses a style transfer method to generate samples with arbitray text content, which preserve the texture of the reference sample in the target dataset. The generated images are not only visibly similar with real images, but also capable of improving the accuracy of the state-of-the-art text recognition methods, especially for the English and Chinese dataset with a large alphabet (in which many characters only appear in few samples, making it hard to learn for sequence models). Moreover, the proposed method is fast and flexible, with a competitive speed among common style transfer methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others


  1. Chen T Q, Schmidt M (2016) Fast patch-based style transfer of arbitrary style. arXiv:1612.04337

  2. Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database Computer vision and pattern recognition, 2009. CVPR 2009. IEEE conference on, pp. 248–255. Ieee

  3. Duck S Y (2016) Painter by numbers.

  4. Gatys L, Ecker A, Bethge M (2015) A neural algorithm of artistic style. Nature communications

  5. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  6. Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 369–376

  7. Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324

  8. He P, Huang W, Qiao Y, Loy C C, Tang X (2016) Reading scene text in deep convolutional sequences. In: AAAI, vol 16, pp 3501–3508

  9. Huang X, Belongie S J (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV, pp 1510–1519

  10. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition. arXiv:1406.2227

  11. Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda L G, Mestre S R, Mas J, Mota D F, Almazan J A, De Las Heras L P (2013) Icdar 2013 robust reading competition. In: Document analysis and recognition (ICDAR), 2013 12th international conference on. IEEE, pp 1484–1493

  12. Lang K (1995) Newsweeder: Learning to filter netnews. In: Machine learning proceedings 1995. Elsevier, pp 331–339

  13. Li C, Wand M (2016) Combining markov random fields and convolutional neural networks for image synthesis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2479–2486

  14. Li Y, Fang C, Yang J, Wang Z, Lu X, Yang M H (2017) Universal style transfer via feature transforms. In: Advances in neural information processing systems, pp 386–396

  15. Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755

  16. Liu X, Liang D, Yan S, Chen D, Qiao Y, Yan J (2018) Fots: fast oriented text spotting with a unified network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5676–5685

  17. Lucas S M, Panaretos A, Sosa L, Tang A, Wong S, Young R, Ashida K, Nagai H, Okamoto M, Yamamoto H et al (2005) Icdar 2003 robust reading competitions: entries, results, and future directions. Int J Doc Anal Recogn (IJDAR) 7 (2-3):105–122

    Article  Google Scholar 

  18. Mishra A, Alahari K, Jawahar C (2012) Scene text recognition using higher order language priors. In: BMVC-British machine vision conference. BMVA

  19. Ravi S, Larochelle H (2016) Optimization as a model for few-shot learning

  20. Risser E, Wilmot P, Barnes C (2017) Stable and controllable neural texture synthesis and style transfer using histogram losses. arXiv:1701.08893

  21. Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence 39(11):2298–2304

    Article  Google Scholar 

  22. Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4168–4176

  23. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR

  24. Smith R, Gu C, Lee D S, Hu H, Unnikrishnan R, Ibarz J, Arnoud S, Lin S (2016) End-to-end interpretation of the french street name signs dataset. In: European conference on computer vision. Springer, pp 411–426

  25. Sun C, Shrivastava A, Singh S, Gupta A (2017) Revisiting unreasonable effectiveness of data in deep learning era. In: Computer vision (ICCV), 2017 IEEE international conference on. IEEE, pp 843–852

  26. Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: Computer vision (ICCV), 2011 IEEE international conference on. IEEE, pp 1457–1464

  27. Wang Y, Bai X, Liu C-L (2018) Icpr mtwi 2018 challenge 1 text recognition of web images.

  28. Yeh R A, Chen C, Lim T Y, Schwing A G, Hasegawa-Johnson M, Do M N (2017) Semantic image inpainting with deep generative models. In: CVPR, vol 2, p 4

Download references


This work is supported by National Natural Science Foundation of China under Grant 61673029. This work is also a research achievement of Key Laboratory of Science, Technology and Standard in Press Industry (Key Laboratory of Intelligent Press Media Technology).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Yongtao Wang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Jiahui Li and Siwei Wang made equal contributions to this paper

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Wang, S., Wang, Y. et al. Synthesizing data for text recognition with style transfer. Multimed Tools Appl 78, 29183–29196 (2019).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: