Skip to main content

Full Page Handwriting Recognition via Image to Sequence Extraction

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2021 (ICDAR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12823))

Included in the following conference series:

Abstract

We present a Neural Network based Handwritten Text Recognition (HTR) model architecture that can be trained to recognize full pages of handwritten or printed text without image segmentation. Being based on Image to Sequence architecture, it can extract text present in an image and then sequence it correctly without imposing any constraints regarding orientation, layout and size of text and non-text. Further, it can also be trained to generate auxiliary markup related to formatting, layout and content. We use character level vocabulary, thereby enabling language and terminology of any subject. The model achieves a new state-of-art in paragraph level recognition on the IAM dataset. When evaluated on scans of real world handwritten free form test answers - beset with curved and slanted lines, drawings, tables, math, chemistry and other symbols - it performs better than all commercially available HTR cloud APIs. It is deployed in production as part of a commercial web application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This becomes relevant when text is not horizontal or when inserted using a circumflex or arrow.

  2. 2.

    Except a limit set at prediction to prevent an endless loop.

  3. 3.

    We view synthetic WikiText based data as an augmentation method since it does not rely on proprietary data or method.

  4. 4.

    Results from [2] are not included because it was trained on a lot more than IAM data and 30% of it was proprietary.

  5. 5.

    We evaluated Microsoft, Google and Mathpix cloud APIs. Microsoft performed the best and its results are reported here. This is not intended to be a comparison of models, rather a practical data point that can be used to make build-vs-buy decisions.

References

  1. Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer (2020)

    Google Scholar 

  2. Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 646–651 (2017)

    Google Scholar 

  3. Bluche, T.: Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. arXiv:1604.08352 (2016)

  4. Bluche, T., Louradour, J., Messina, R.O.: Scan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention. CoRR arxiv:1604.03286 (2016)

  5. Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context (2019)

    Google Scholar 

  6. Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: ICML (2017)

    Google Scholar 

  7. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2019)

    Google Scholar 

  8. Graves, A.: Supervised sequence labelling with recurrent neural networks. In: Studies in Computational Intelligence (2008)

    Google Scholar 

  9. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML 2006 (2006)

    Google Scholar 

  10. Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: NIPS (2008)

    Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR arxiv:1512.03385 (2015)

  12. Hendrycks, D., Gimpel, K.: Bridging nonlinearities and stochastic regularizers with gaussian error linear units. CoRR arxiv:abs/1606.08415 (2016)

  13. Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition (2020)

    Google Scholar 

  14. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2017)

    Google Scholar 

  15. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  16. Ly, N.T., Nguyen, C.T., Nakagawa, M.: An attention-based end-to-end model for multiple text lines recognition in japanese historical documents. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 629–634 (2019). https://doi.org/10.1109/ICDAR.2019.00106

  17. Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5, 39–46 (2002). https://doi.org/10.1007/s100320200071

    Article  MATH  Google Scholar 

  18. Merity, S., Xiong, C., Bradbury, J., Socher, R.: Pointer sentinel mixture models. CoRR arxiv:abs/1609.07843 (2016)

  19. Open SLR: Aachen data splits (train, test, val) for the IAM dataset. https://www.openslr.org/56/. Identifier: SLR56

  20. Parmar, N., et al.: Image transformer. Shazeer (2018)

    Google Scholar 

  21. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems vol. 32, pp. 8024–8035. Curran Associates Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

  22. Pham, V., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. CoRR arxiv:1312.4569 (2013)

  23. Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 67–72 (2017)

    Google Scholar 

  24. Radford, A.: Improving language understanding by generative pre-training (2018)

    Google Scholar 

  25. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer (2020)

    Google Scholar 

  26. Singh, S.S.: Teaching machines to code: neural markup generation with visual attention. CoRR arxiv:1802.05415 (2018)

  27. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR arxiv:1409.3215 (2014)

  28. Vaswani, A., et al.: Attention is all you need. CoRR arxiv:1706.03762 (2017)

  29. Vaswani, A., et al.: Tensor2Tensor for neural machine translation. CoRR arxiv:1803.07416 (2018)

  30. Voigtlaender, P., Doetsch, P., Ney, H.: Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 228–233 (2016)

    Google Scholar 

  31. Wang, T., et al.: Decoupled attention network for text recognition (2019)

    Google Scholar 

  32. Wigington, C., Tensmeyer, C., Davis, B., Barrett, W., Price, B., Cohen, S.: Start, follow, read: end-to-end full-page handwriting recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), September 2018

    Google Scholar 

  33. Xu, K.,et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML (2015)

    Google Scholar 

Download references

Acknowledgements

We would like to thank Saurabh Bipin Chandra for implementing the fast inference path (\(O(N^2)\)) of the Transformer decoder, which was lacking in PyTorch.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sumeet S. Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, S.S., Karayev, S. (2021). Full Page Handwriting Recognition via Image to Sequence Extraction. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12823. Springer, Cham. https://doi.org/10.1007/978-3-030-86334-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86334-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86333-3

  • Online ISBN: 978-3-030-86334-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics