Generation of Synthetic Images of Full-Text Documents

Bureš, Lukáš; Neduchal, Petr; Hlaváč, Miroslav; Hrúz, Marek

doi:10.1007/978-3-319-99579-3_8

Generation of Synthetic Images of Full-Text Documents

Lukáš Bureš^16,17,
Petr Neduchal^16,17,
Miroslav Hlaváč^16,17,18 &
…
Marek Hrúz¹⁶

Conference paper
First Online: 25 August 2018

1471 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11096))

Abstract

In this paper, we present an algorithm for generating images of full-text documents. Such images can be used to train and evaluate models of optical character recognition. The algorithm is modular, individual parts can be changed and tweaked to generate desired images. We describe a method for obtaining background images of paper from already digitalized documents. We use a Variational Autoencoder to train a generative model of these backgrounds enabling the generation of similar background images as the training ones on the fly. The module for printing the text uses large text corpora, font, and suitable positional and brightness noise to obtain believable results. We use Tesseract OCR to compare the real world and generated images and observe that the recognition rate is very similar indicating the proper appearance of the synthetic images. Furthermore, the mistakes made by the OCR system in both cases are alike. Finally, the system generates detailed, structured annotation of the synthesized image.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced MSER trees. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part IV. LNCS, vol. 8692, pp. 497–511. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_33
Chapter Google Scholar
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part IV. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_34
Chapter Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision 116(1), 1–20 (2016). https://doi.org/10.1007/s11263-015-0823-z
Article MathSciNet Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: The International Conference on Learning Representations (2014)
Google Scholar
Larsen, A.B.L., Sønderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016, vol. 48, pp. 1558–1566. JMLR.org (2016), http://dl.acm.org/citation.cfm?id=3045390.3045555
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Article MathSciNet Google Scholar
Smith, R.: An overview of the tesseract ocr engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 629–633 (2007)
Google Scholar
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2651 (2017)
Google Scholar

Download references

Acknowledgments

This paper was supported by Ministry of Education, Youth and Sports of the Czech Republic project No. LO1506. The work has also been supported by the grant of the University of West Bohemia, project No. SGS-2016-039. Access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum provided under the programme “Projects of Large Research, Development, and Innovations Infrastructures” (CESNET LM2015042), is greatly appreciated.

Author information

Authors and Affiliations

Faculty of Applied Sciences, NTIS, UWB, Pilsen, Czech Republic
Lukáš Bureš, Petr Neduchal, Miroslav Hlaváč & Marek Hrúz
Department of Cybernetics, Faculty of Applied Sciences, UWB, Pilsen, Czech Republic
Lukáš Bureš, Petr Neduchal & Miroslav Hlaváč
ITMO University, St. Petersburg, Russia
Miroslav Hlaváč

Authors

Lukáš Bureš
View author publications
You can also search for this author in PubMed Google Scholar
Petr Neduchal
View author publications
You can also search for this author in PubMed Google Scholar
Miroslav Hlaváč
View author publications
You can also search for this author in PubMed Google Scholar
Marek Hrúz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marek Hrúz .

Editor information

Editors and Affiliations

SPIIRAS, St. Petersburg, Russia
Alexey Karpov
Leipzig University of Telecommunications, Leipzig, Germany
Oliver Jokisch
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bureš, L., Neduchal, P., Hlaváč, M., Hrúz, M. (2018). Generation of Synthetic Images of Full-Text Documents. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-99579-3_8
Published: 25 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics