One-Model Ensemble-Learning for Text Recognition of Historical Printings

Wick, Christoph; Reul, Christian

doi:10.1007/978-3-030-86549-8_25

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12821))

Included in the following conference series:

International Conference on Document Analysis and Recognition

3882 Accesses
2 Citations

Abstract

In this paper, we propose a novel method for Automatic Text Recognition (ATR) on early printed books. Our approach significantly reduces the Character Error Rates (CERs) for book-specific training when only a few lines of Ground Truth (GT) are available and considerably outperforms previous methods. An ensemble of models is trained simultaneously by optimising each one independently but also with respect to a fused output obtained by averaging the individual confidence matrices. Various experiments on five early printed books show that this approach already outperforms the current state-of-the-art by up to 20% and 10% on average. Replacing the averaging of the confidence matrices during prediction with a confidence-based voting boosts our results by an additional 8% leading to a total average improvement of about 17%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://zenodo.org/record/1344132.
2.
See our implementation at https://github.com/Calamari-OCR/calamari/blob/master/calamari_ocr/ocr/model/ensemblegraph.py.
3.
Since it is usually unclear how many lines of GT are required to achieve a certain CER and the transcriptions effort correlates with the amount of errors within the ATR result, it is usually advantageous to perform the GT production iteratively: Starting from an often quite erroneous output of an existing mixed model only a minimal amount of GT (for example 100 lines) is produced and used to train a first book-specific model. In most cases, applying this model to unseen data (for example 150 further lines) already results in a significantly better ATR output which can be corrected much faster than before. After training another model these steps are repeated until a satisfactory CER is reached or the whole book is transcribed.
4.
Since the traditional cross-fold training approach consists of n, usually five, independent training sub processes it is possible to minimise the training duration by running these processes in parallel if several, ideally n, GPUs are available. However, we think that the presence of, at most, a single GPU should be considered the default case.

References

Al Azawi, M., Liwicki, M., Breuel, T.: Combination of multiple aligned recognition outputs using WFST and LSTM. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 31–35. IEEE (2015)
Google Scholar
Boschetti, F., Romanello, M., Babeu, A., Bamman, D., Crane, G.: Improving OCR accuracy for classical critical editions. In: Research and Advanced Technology for Digital Libraries, pp. 156–167 (2009)
Google Scholar
Breuel, T.: High performance text recognition using a hybrid convolutional-LSTM implementation. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 11–16. IEEE (2017)
Google Scholar
Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Handwritten music recognition for mensural notation with convolutional recurrent neural networks. Pattern Recogn. Lett. (2019)
Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine learning, pp. 369–376. ACM (2006)
Google Scholar
Kiessling, B.: Kraken - an Universal Text Recognizer for the Humanities. DH 2019 Digital Humanities (2019)
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2014)
Google Scholar
Kurata, G., Audhkhasi, K.: Guiding ctc posterior spike timings for improved posterior fusion and knowledge distillation. arXiv preprint arXiv:1904.08311 (2019)
Reul, C., et al.: Ocr4all–an open-source tool providing a (semi-) automatic ocr workflow for historical printings. App. Sci. 9(22), 4853 (2019)
Article Google Scholar
Reul, C., Springmann, U., Wick, C., Puppe, F.: Improving OCR accuracy on early printed books by utilizing cross fold training and voting. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). pp. 423–428. IEEE (2018). https://ieeexplore.ieee.org/document/8395233
Reul, C., Wick, C., Springmann, U., Puppe, F.: Transfer learning for OCRopus model training on early printed books. 027.7 J. Libr. Cult. 5(1), 38–51 (2017). http://dx.doi.org/10.12685/027.7-5-1-169
Rice, S.V., Jenkins, F.R., Nartker, T.A.: The fifth annual test of OCR accuracy. Information Science Research Institute (1996)
Google Scholar
Rice, S.V., Kanai, J., Nartker, T.A.: An algorithm for matching OCR-generated text strings. Int. J. Pattern Recogn. Artif. Intell. 8(05), 1259–1268 (1994)
Article Google Scholar
Rice, S.V., Nartker, T.A.: The ISRI analytic tools for OCR evaluation. UNLV/Information Science Research Institute, TR-96-02 (1996)
Google Scholar
Sagi, O., Rokach, L.: Ensemble learning: a survey. WIREs Data Mining Knowl. Disc. 8(4), e1249 (2018). https://doi.org/10.1002/widm.1249
Smith, R.: An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 629–633. IEEE (2007)
Google Scholar
Springmann, U., Lüdeling, A.: OCR of historical printings with an application to building diachronic corpora: a case study using the RIDGES herbal corpus. Digital Human. Q. 11(2) (2017). http://www.digitalhumanities.org/dhq/vol/11/2/000288/000288.html
Springmann, U., Reul, C., Dipper, S., Baiter, J.: Ground truth for training ocr engines on historical documents in german fraktur and early modern latin. JLCL Spec. Issue Autom. Text Layout Recogn. 33(1), 97–114 (2018). https://jlcl.org/content/2-allissues/2-heft1-2018/jlcl-2018-1.pdf
Sánchez, J.A., Romero, V., Toselli, A.H., Villegas, M., Vidal, E.: Icdar 2017 competition on handwritten text recognition on the read dataset. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 1383–1388 (2017). https://doi.org/10.1109/ICDAR.2017.226
Wick, C., Puppe, F.: Experiments and detailed error-analysis of automatic square notation transcription of medieval music manuscripts using CNN/LSTM-networks and a neume dictionary. J. New Music Res., 1–19 (2021)
Google Scholar
Wick, C., Reul, C., Puppe, F.: Calamari - a high-performance tensorflow-based deep learning package for optical character recognition. Digital Human. Q. 14(1) (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Planet AI GmbH, Warnowufer 60, 18057, Rostock, Germany
Christoph Wick
University of Würzburg, Am Hubland, 97074, Würzburg, Germany
Christian Reul

Authors

Christoph Wick
View author publications
You can also search for this author in PubMed Google Scholar
Christian Reul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christoph Wick .

Editor information

Editors and Affiliations

Universitat Autònoma de Barcelona, Barcelona, Spain
Josep Lladós
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti
Kyushu University, Fukuoka-shi, Japan
Seiichi Uchida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wick, C., Reul, C. (2021). One-Model Ensemble-Learning for Text Recognition of Historical Printings. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12821. Springer, Cham. https://doi.org/10.1007/978-3-030-86549-8_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-86549-8_25
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86548-1
Online ISBN: 978-3-030-86549-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)