SynthTIGER: Synthetic Text Image GEneratoR Towards Better Text Recognition Models

Yim, Moonbin; Kim, Yoonsik; Cho, Han-Cheol; Park, Sungrae

doi:10.1007/978-3-030-86337-1_8

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12824))

Included in the following conference series:

International Conference on Document Analysis and Recognition

3258 Accesses
26 Citations

Abstract

For successful scene text recognition (STR) models, synthetic text image generators have alleviated the lack of annotated text images from the real world. Specifically, they generate multiple text images with diverse backgrounds, font styles, and text shapes and enable STR models to learn visual patterns that might not be accessible from manually annotated data. In this paper, we introduce a new synthetic text image generator, SynthTIGER, by analyzing techniques used for text image synthesis and integrating effective ones under a single algorithm. Moreover, we propose two techniques that alleviate the long-tail problem in length and character distributions of training data. In our experiments, SynthTIGER achieves better STR performance than the combination of synthetic datasets, MJSynth (MJ) and SynthText (ST). Our ablation study demonstrates the benefits of using sub-components of SynthTIGER and the guideline on generating synthetic text images for STR models. Our implementation is publicly available at https://github.com/clovaai/synthtiger.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Baek, J., et al.: What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)
Google Scholar
Hwang, W., et al.: Post-OCR parsing: building simple and robust parser via bio tagging. In: Workshop on Document Intelligence at NeurIPS 2019 (2019)
Google Scholar
Hwang, W., Yim, J., Park, S., Yang, S., Seo, M.: Spatial dependency parsing for 2D document understanding. arXiv preprint arXiv:2005.00642 (2020)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. In: Workshop on Deep Learning, NIPS (2014)
Google Scholar
Karatzas, D., Gomez-Bigorda, L., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015)
Google Scholar
Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013)
Google Scholar
Liao, M., Song, B., Long, S., He, M., Yao, C., Bai, X.: SynthText3D: synthesizing scene text images from 3D virtual worlds. Science China Inf. Sci. 63(2), 1–14 (2020). https://doi.org/10.1007/s11432-019-2737-0
Article Google Scholar
Limonova, E., Bezmaternykh, P., Nikolaev, D., Arlazarov, V.: Slant rectification in Russian passport OCR system using fast Hough transform. In: Ninth International Conference on Machine Vision (ICMV 2016), vol. 10341, p. 103410P. International Society for Optics and Photonics (2017)
Google Scholar
Liu, X., Meng, G., Pan, C.: Scene text detection and recognition with advances in deep learning: a survey. Int. J. Doc. Anal. Recognit. (IJDAR) 22(2), 143–162 (2019). https://doi.org/10.1007/s10032-019-00320-5
Article Google Scholar
Long, S., He, X., Yao, C.: Scene text detection and recognition: the deep learning era. Int. J. Comput. Vision 129(1), 161–184 (2021). https://doi.org/10.1007/s11263-020-01369-0
Article Google Scholar
Long, S., Yao, C.: UnrealText: synthesizing realistic scene text images from the unreal world. arXiv preprint arXiv:2003.10608 (2020)
Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: ICDAR, pp. 682–687 (2003)
Google Scholar
Mishra, A., Alahari, K., Jawahar, C.: Scene text recognition using higher order language priors. In: BMVC (2012)
Google Scholar
Motahari, H., Duffy, N., Bennett, P., Bedrax-Weiss, T.: A report on the first workshop on document intelligence (DI) at NeurIPS 2019. ACM SIGKDD Explor. Newsl. 22(2), 8–11 (2021)
Article Google Scholar
Patel, C., Shah, D., Patel, A.: Automatic number plate recognition system (ANPR): a survey. Int. J. Comput. Appl. 69(9) (2013)
Google Scholar
Phan, T.Q., Shivakumara, P., Tian, S., Tan, C.L.: Recognizing text with perspective distortion in natural scenes. In: ICCV, pp. 569–576 (2013)
Google Scholar
Risnumawan, A., Shivakumara, P., Chan, C.S., Tan, C.L.: A robust arbitrary text detection system for natural scene images. ESWA 41, 8027–8048 (2014)
Google Scholar
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)
Article Google Scholar
Simard, P.Y., Steinkraus, D., Platt, J.C., et al.: Best practices for convolutional neural networks applied to visual document analysis. In: ICDAR, vol. 3. Citeseer (2003)
Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

CLOVA AI Research, NAVER Corporation, Seongnam-si, South Korea
Moonbin Yim, Yoonsik Kim & Han-Cheol Cho
Upstage AI Research, Yongin-si, South Korea
Sungrae Park

Authors

Moonbin Yim
View author publications
You can also search for this author in PubMed Google Scholar
Yoonsik Kim
View author publications
You can also search for this author in PubMed Google Scholar
Han-Cheol Cho
View author publications
You can also search for this author in PubMed Google Scholar
Sungrae Park
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sungrae Park .

Editor information

Editors and Affiliations

Universitat Autònoma de Barcelona, Barcelona, Spain
Josep Lladós
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti
Kyushu University, Fukuoka-shi, Japan
Seiichi Uchida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yim, M., Kim, Y., Cho, HC., Park, S. (2021). SynthTIGER: Synthetic Text Image GEneratoR Towards Better Text Recognition Models. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12824. Springer, Cham. https://doi.org/10.1007/978-3-030-86337-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-86337-1_8
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86336-4
Online ISBN: 978-3-030-86337-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)