Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes

Wick, Christoph; Zöllner, Jochen; Grüning, Tobias

doi:10.1007/978-3-031-06555-2_18

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13237))

Included in the following conference series:

International Workshop on Document Analysis Systems

1812 Accesses
12 Citations

Abstract

In contrast to Connectionist Temporal Classification (CTC) approaches, Sequence-To-Sequence (S2S) models for Handwritten Text Recognition (HTR) suffer from errors such as skipped or repeated words which often occur at the end of a sequence. In this paper, to combine the best of both approaches, we propose to use the CTC-Prefix-Score during S2S decoding. Hereby, during beam search, paths that are invalid according to the CTC confidence matrix are penalised. Our network architecture is composed of a Convolutional Neural Network (CNN) as visual backbone, bidirectional Long-Short-Term-Memory-Cells (LSTMs) as encoder, and a decoder which is a Transformer with inserted mutual attention layers. The CTC confidences are computed on the encoder while the Transformer is only used for character-wise S2S decoding. We evaluate this setup on three HTR data sets: IAM, Rimes, and StAZH. On IAM, we achieve a competitive Character Error Rate (CER) of 2.95% when pretraining our model on synthetic data and including a character-based language model for contemporary English. Compared to other state-of-the-art approaches, our model requires about 10–20 times less parameters. Access our shared implementations via this link to GitHub.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Transformer for Handwritten Text Recognition Using Bidirectional Post-decoding

Convolve, Attend and Spell: An Attention-based Sequence-to-Sequence Model for Handwritten Word Recognition

Deep RNN Architecture: Design and Evaluation

Notes

1.
Arbitrary ordering is required, e.g., for translation tasks.
2.
https://github.com/Planet-AI-GmbH/tfaip-hybrid-ctc-s2s.
3.
https://github.com/jpuigcerver/Laia/tree/master/egs/iam.

References

Augustin, E., Carré, M., Grosicki, E., Brodin, J.M., Geoffrois, E., Prêteux, F.: Rimes evaluation campaign for handwritten mail processing. In: International Workshop on Frontiers in Handwriting Recognition (IWFHR 2006), pp. 231–235 (2006)
Google Scholar
Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 646–651. IEEE (2017)
Google Scholar
Diaz, D.H., Qin, S., Ingle, R., Fujii, Y., Bissacco, A.: Rethinking text line recognition models. arXiv preprint arXiv:2104.07787 (2021)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006)
Google Scholar
Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. arXiv preprint arXiv:2005.13044 (2020)
Li, C., et al.: ESPnet-SE: end-to-end speech enhancement and separation toolkit designed for ASR integration. In: Proceedings of Spoken Language Technology Workshop, pp. 785–792. IEEE (2021)
Google Scholar
Li, M., et al.: TrOCR: transformer-based optical character recognition with pre-trained models. arXiv preprint arXiv:2109.10282 (2021)
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002)
Article Google Scholar
Memon, J., Sami, M., Khan, R.A., Uddin, M.: Handwritten OCR: a comprehensive systematic literature review (SLR). IEEE Access 8, 142642–142668 (2020)
Article Google Scholar
Michael, J., Labahn, R., Grüning, T., Zöllner, J.: Evaluating sequence-to-sequence models for handwritten text recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1286–1293. IEEE (2019)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in NIPS, pp. 3104–3112 (2014)
Google Scholar
Watanabe, S., Hori, T., Kim, S., Hershey, J.R., Hayashi, T.: Hybrid CTC/attention architecture for end-to-end speech recognition. IEEE J. Sel. Top. Signal Process. 11(8), 1240–1253 (2017)
Article Google Scholar
Wick, C., et al.: tfaip-a generic and powerful research framework for deep learning based on Tensorflow. J. Open Sour. Softw. 6(62), 3297 (2021)
Article Google Scholar
Wick, C., Zöllner, J., Grüning, T.: Transformer for handwritten text recognition using bidirectional post-decoding. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12823, pp. 112–126. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86334-0_8
Chapter Google Scholar
Wigington, C., Stewart, S., Davis, B., Barrett, B., Price, B., Cohen, S.: Data augmentation for recognition of handwritten words and lines using a CNN-LSTM network. In: ICDAR, pp. 639–645 (2017)
Google Scholar
Yousef, M., Hussain, K.F., Mohammed, U.S.: Accurate, data-efficient, unconstrained text recognition with CNNs. Pattern Recognit. 108, 107482 (2020)
Article Google Scholar

Download references

Acknowledgments

This work was partially funded by the European Social Fund (ESF) and the Ministry of Education, Science and Culture of Mecklenburg-Western Pomerania (Germany) within the project Neural Extraction of Information, Structure and Symmetry in Images (NEISS) under grant no ESF/14-BM-A55-0006/19.

Author information

Authors and Affiliations

Planet AI GmbH, Warnowufer 60, 18057, Rostock, Germany
Christoph Wick, Jochen Zöllner & Tobias Grüning
Computational Intelligence Technology Lab, Department of Mathematics, University of Rostock, 18051, Rostock, Germany
Jochen Zöllner

Authors

Christoph Wick
View author publications
You can also search for this author in PubMed Google Scholar
Jochen Zöllner
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Grüning
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jochen Zöllner .

Editor information

Editors and Affiliations

Kyushu University, Fukuoka, Japan
Seiichi Uchida
Boise State University, BOISE, ID, USA
Elisa Barney
LIRIS UMR CNRS, Villeurbanne, France
Véronique Eglin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wick, C., Zöllner, J., Grüning, T. (2022). Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes. In: Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022. Lecture Notes in Computer Science, vol 13237. Springer, Cham. https://doi.org/10.1007/978-3-031-06555-2_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-06555-2_18
Published: 18 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06554-5
Online ISBN: 978-3-031-06555-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes

Abstract

Access this chapter

Similar content being viewed by others

Transformer for Handwritten Text Recognition Using Bidirectional Post-decoding

Convolve, Attend and Spell: An Attention-based Sequence-to-Sequence Model for Handwritten Word Recognition

Deep RNN Architecture: Design and Evaluation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes

Abstract

Access this chapter

Similar content being viewed by others

Transformer for Handwritten Text Recognition Using Bidirectional Post-decoding

Convolve, Attend and Spell: An Attention-based Sequence-to-Sequence Model for Handwritten Word Recognition

Deep RNN Architecture: Design and Evaluation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation