Training transformer architectures on few annotated data: an application to historical handwritten text recognition

Barrere, Killian; Soullard, Yann; Lemaitre, Aurélie; Coüasnon, Bertrand

doi:10.1007/s10032-023-00459-2

Training transformer architectures on few annotated data: an application to historical handwritten text recognition

Original Paper
Published: 25 January 2024

(2024)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Killian Barrere¹,
Yann Soullard^1,2,
Aurélie Lemaitre¹ &
…
Bertrand Coüasnon¹

313 Accesses
1 Citation
Explore all metrics

Abstract

Transformer-based architectures show excellent results on the task of handwritten text recognition, becoming the standard architecture for modern datasets. However, they require a significant amount of annotated data to achieve competitive results. They typically rely on synthetic data to solve this problem. Historical handwritten text recognition represents a challenging task due to degradations, specific handwritings for which few examples are available and ancient languages that vary over time. These limitations also make it difficult to generate realistic synthetic data. Given sufficient and appropriate data, Transformer-based architectures could alleviate these concerns, thanks to their ability to have a global view of textual images and their language modeling capabilities. In this paper, we propose the use of a lightweight Transformer model to tackle the task of historical handwritten text recognition. To train the architecture, we introduce realistic looking synthetic data reproducing the style of historical handwritings. We present a specific strategy, both for training and prediction, to deal with historical documents, where only a limited amount of training data are available. We evaluate our approach on the ICFHR 2018 READ dataset which is dedicated to handwriting recognition in specific historical documents. The results show that our Transformer-based approach is able to outperform existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Light Transformer-Based Architecture for Handwritten Text Recognition

Fine-Tuning Vision Encoder–Decoder Transformers for Handwriting Text Recognition on Historical Documents

Data Augmentation for Offline Handwritten Text Recognition: A Systematic Literature Review

Article Open access 10 February 2024

Notes

https://gitlab.inria.fr/intuidoc-public/vlt.
https://gitlab.inria.fr/intuidoc-public/synthetic-handwriting-generation.
We use books available at https://www.gutenberg.org/.
The fonts are available on https://fonts.google.com, https://www.dafont.com and https://www.p22.com.
On the IAM dataset, the Aachen split is used.

References

Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: ICDAR, pp. 646–651 (2017)
Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: ICDAR, pp. 67–72 (2017)
Dutta, K., Krishnan, P., Mathew, M., et al.: Improving CNN–RNN hybrid networks for handwriting recognition. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), IEEE, pp. 80–85 (2018)
Michael, J., Labahn, R., Grüning, T., et al.: Evaluating sequence-to-sequence models for handwritten text recognition. In: ICDAR, pp. 1286–1293 (2019)
Yousef, M., Hussain, K.F., Mohammed, U.S.: Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. Pattern Recognit. 108, 107482 (2020)
Article Google Scholar
Coquenet, D., Chatelain, C., Paquet, T.: End-to-end handwritten paragraph text recognition using a vertical attention network. In: PAMI (2022)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Wick, C., Zöllner, J., Grüning, T.: Transformer for handwritten text recognition using bidirectional post-decoding. In: ICDAR, pp. 112–126 (2021)
Kang, L., Riba, P., Rusiñol, M., et al.: Pay attention to what you read: non-recurrent handwritten text-line recognition. Pattern Recognit. 129, 108766 (2022)
Article Google Scholar
Singh, S.S., Karayev, S.: Full page handwriting recognition via image to sequence extraction. In: ICDAR, pp. 55–69 (2021)
Barrere, K., Soullard, Y., Lemaitre, A., et al.: A light transformer-based architecture for handwritten text recognition. In: DAS, pp. 275–290 (2022)
Wick, C., Zöllner, J., Grüning, T.: Rescoring sequence-to-sequence models for text line recognition with CTC-prefixes. In: DAS, pp. 260–274 (2022)
d’Arce, R., Norton, T., Hannuna, S., et al.: Self-attention networks for non-recurrent handwritten text recognition. In: ICFHR, pp. 389–403 (2022)
Coquenet, D., Chatelain, C., Paquet, T.: Dan: a segmentation-free document attention network for handwritten document recognition. PAMI (2023). https://doi.org/10.1109/TPAMI.2023.3235826
Article Google Scholar
Li, M., Lv, T., Chen, J., et al.: TROCR: transformer-based optical character recognition with pre-trained models. In: AAAI (2023)
Riaz, N., Arbab, H., Maqsood, A., et al.: Conv-transformer architecture for unconstrained off-line Urdu handwriting recognition. IJDAR (2022). https://doi.org/10.1007/s10032-022-00416-5
Article Google Scholar
Graves, A., Fernández, S., Gomez, F., et al.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006)
Strauß, T., Leifert, G., Labahn, R., et al.: ICFHR2018 competition on automated text recognition on a read dataset. In: ICFHR, pp. 477–482 (2018)
Soullard, Y., Swaileh, W., Tranouez, P., et al.: Improving text recognition using optical and language model writer adaptation. In: ICDAR, pp. 1175–1180 (2019)
Fogel, S., Averbuch-Elor, H., Cohen, S., et al.: ScrabbleGAN: semi-supervised varying length handwritten text generation. In: ICCV, pp. 4324–4333 (2020)
Davis, B., Tensmeyer, C., Price, B., et al.: Text and style conditioned GAN for generation of offline handwriting lines. In: BMVC (2020)
Gan, J., Wang, W.: HIGAN: handwriting imitation conditioned on arbitrary-length texts and disentangled styles. In: AAAI, pp. 7484–7492 (2021)
Bhunia, A.K., Khan, S., Cholakkal, H., et al.: Handwriting transformers. In: ICCV, pp. 1086–1094 (2021)
Vögtlin, L., Drazyk, M., Pondenkandath, V., et al.: Generating synthetic handwritten historical documents with OCR Constrained GANs. In: ICDAR, pp. 610–625 (2021)
Journet, N., Visani, M., Mansencal, B., et al.: Doccreator: a new software for creating synthetic ground-truthed document images. J. Imaging 3(4), 62 (2017)
Article Google Scholar
Madi, B., Alaasam, R., Droby, A., et al.: HST-GAN: Historical style transfer GAN for generating historical text images. In: DAS, pp. 523–537 (2022)
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. IJDAR 5(1), 39–46 (2002)
Article Google Scholar
Augustin, E., Carré, M., Grosicki, E., et al.: Rimes evaluation campaign for handwritten mail processing. In: IWFHR, pp. 231–235 (2006)

Download references

Acknowledgements

This work was performed using HPC/AI resources from GENCI-IDRIS (Grant 2021-AD011012550). We would also like to thank Solène Tarride for her contribution on the degradations used to generate synthetic data.

Author information

Authors and Affiliations

IRISA, CNRS, Univ. Rennes, Rennes, France
Killian Barrere, Yann Soullard, Aurélie Lemaitre & Bertrand Coüasnon
LETG, CNRS, Université de Rennes 2, Rennes, France
Yann Soullard

Authors

Killian Barrere
View author publications
You can also search for this author in PubMed Google Scholar
Yann Soullard
View author publications
You can also search for this author in PubMed Google Scholar
Aurélie Lemaitre
View author publications
You can also search for this author in PubMed Google Scholar
Bertrand Coüasnon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Killian Barrere.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Barrere, K., Soullard, Y., Lemaitre, A. et al. Training transformer architectures on few annotated data: an application to historical handwritten text recognition. IJDAR (2024). https://doi.org/10.1007/s10032-023-00459-2

Download citation

Received: 07 April 2023
Revised: 04 December 2023
Accepted: 21 December 2023
Published: 25 January 2024
DOI: https://doi.org/10.1007/s10032-023-00459-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Training transformer architectures on few annotated data: an application to historical handwritten text recognition

Abstract

Access this article

Similar content being viewed by others

A Light Transformer-Based Architecture for Handwritten Text Recognition

Fine-Tuning Vision Encoder–Decoder Transformers for Handwriting Text Recognition on Historical Documents

Data Augmentation for Offline Handwritten Text Recognition: A Systematic Literature Review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Training transformer architectures on few annotated data: an application to historical handwritten text recognition

Abstract

Access this article

Similar content being viewed by others

A Light Transformer-Based Architecture for Handwritten Text Recognition

Fine-Tuning Vision Encoder–Decoder Transformers for Handwriting Text Recognition on Historical Documents

Data Augmentation for Offline Handwritten Text Recognition: A Systematic Literature Review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation