Advertisement

Start, Follow, Read: End-to-End Full-Page Handwriting Recognition

  • Curtis WigingtonEmail author
  • Chris Tensmeyer
  • Brian Davis
  • William Barrett
  • Brian Price
  • Scott Cohen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11210)

Abstract

Despite decades of research, offline handwriting recognition (HWR) of degraded historical documents remains a challenging problem, which if solved could greatly improve the searchability of online cultural heritage archives. HWR models are often limited by the accuracy of the preceding steps of text detection and segmentation. Motivated by this, we present a deep learning model that jointly learns text detection, segmentation, and recognition using mostly images without detection or segmentation annotations. Our Start, Follow, Read (SFR) model is composed of a Region Proposal Network to find the start position of text lines, a novel line follower network that incrementally follows and preprocesses lines of (perhaps curved) text into dewarped images suitable for recognition by a CNN-LSTM network. SFR exceeds the performance of the winner of the ICDAR2017 handwriting recognition competition, even when not using the provided competition region annotations.

Keywords

Handwriting recognition Document analysis Historical document processing Text detection Text line segmentation 

Supplementary material

474211_1_En_23_MOESM1_ESM.pdf (32.4 mb)
Supplementary material 1 (pdf 33182 KB)

References

  1. 1.
    Antonacopoulos, A., Karatzas, D.: Document image analysis for World War II personal records. In: Workshop on Document Image Analysis for Libraries, pp. 336–341. IEEE (2004)Google Scholar
  2. 2.
    Avidan, S., Shamir, A.: Seam carving for content-aware image resizing. In: ACM SIGGRAPH 2007 Papers, SIGGRAPH 2007. ACM (2007).  https://doi.org/10.1145/1275808.1276390
  3. 3.
    Bluche, T.: Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. In: Advances in Neural Information Processing Systems (NIPS), pp. 838–846 (2016)Google Scholar
  4. 4.
    Bluche, T., Louradour, J., Messina, R.: Scan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention, April 2016Google Scholar
  5. 5.
    Boiangiu, C.A., Tanase, M., Ioanitescu, R.: Handwritten documents text line segmentation based on information energy. Int. J. Comput. Commun. Control. (IJCCC) 9, 8–15 (2014)CrossRefGoogle Scholar
  6. 6.
    Bunke, H., Bengio, S., Vinciarelli, A.: Offline recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 26(6), 709–720 (2004)CrossRefGoogle Scholar
  7. 7.
    Diem, M., Kleber, F., Fiel, S., Grüning, T., Gatos, B.: cBAD: ICDAR2017 competition on baseline detection. In: 14th International Conference on Document Analysis and Recognition (ICDAR), pp. 1355–1360. IEEE (2017)Google Scholar
  8. 8.
    Erhan, D., Szegedy, C., Toshev, A., Anguelov, D.: Scalable object detection using deep neural networks. CoRR abs/1312.2249 (2013). http://arxiv.org/abs/1312.2249
  9. 9.
    Frinken, V., Fischer, A., Martínez-Hinarejos, C.D.: Handwriting recognition in historical documents using very large vocabularies. In: 2nd International Workshop on Historical Document Imaging and Processing (HIP), pp. 67–72. ACM (2013)Google Scholar
  10. 10.
    Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006)Google Scholar
  11. 11.
    Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 31(5), 855–868 (2009)CrossRefGoogle Scholar
  12. 12.
    Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 545–552 (2009)Google Scholar
  13. 13.
    Ha, J., Haralick, R.M., Phillips, I.T.: Document page decomposition by the bounding-box project. In: 3rd International Conference on Document Analysis and Recognition (ICDAR), vol. 2, pp. 1119–1122. IEEE (1995).  https://doi.org/10.1109/ICDAR.1995.602115
  14. 14.
    He, J., Downton, A.C.: User-assisted archive document image analysis for digital library construction. In: International Conference on Document Analysis and Recognition, pp. 498–502. IEEE (2003)Google Scholar
  15. 15.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 2017–2025 (2015)Google Scholar
  16. 16.
    Kozielski, M., Rybach, D., Hahn, S., Schlüter, R., Ney, H.: Open vocabulary handwriting recognition using combined word-level and character-level language models. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8257–8261, May 2013.  https://doi.org/10.1109/ICASSP.2013.6639275
  17. 17.
    Lorigo, L.M., Govindaraju, V.: Offline arabic handwriting recognition: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 28(5), 712–724 (2006)CrossRefGoogle Scholar
  18. 18.
    Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)CrossRefGoogle Scholar
  19. 19.
    Moysset, B., Kermorvant, C., Wolf, C.: Full-page text recognition: learning where to start and when to stop. In: 14th International Conference on Document Analysis and Recognition (ICDAR), pp. 871–876. IEEE (2017).  https://doi.org/10.1109/ICDAR.2017.147
  20. 20.
    Plötz, T., Fink, G.A.: Markov models for offline handwriting recognition: a survey. Int. J. Doc. Anal. Recognit. (IJDAR) 12(4), 269 (2009)CrossRefGoogle Scholar
  21. 21.
    Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, December 2011. IEEE Catalog No. CFP11SRW-USBGoogle Scholar
  22. 22.
    Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: 14th International Conference on Document Analysis and Recognition (ICDAR), pp. 67–72. IEEE, November 2017.  https://doi.org/10.1109/ICDAR.2017.20
  23. 23.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRefGoogle Scholar
  24. 24.
    Saabni, R., El-Sana, J.: Language-independent text lines extraction using seam carving. In: 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 563–568. IEEE (2011)Google Scholar
  25. 25.
    Sanchez, J.A., Romero, V., Toselli, A.H., Villegas, M., Vidal, E.: ICDAR2017 competition on handwritten text recognition on the READ dataset. In: 14th International Conference on Document Analysis and Recognition (ICDAR), pp. 1383–1388. IEEE, November 2017. http://doi.ieeecomputersociety.org/10.1109/ICDAR.2017.226
  26. 26.
    Shapiro, V., Gluhchev, G., Sgurev, V.: Handwritten document image segmentation and analysis. Pattern Recognit. Lett. 14(1), 71–78 (1993)CrossRefGoogle Scholar
  27. 27.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). http://arxiv.org/abs/1409.1556
  28. 28.
    Smith, R.: Tutorial: tesseract blends old and new OCR technology (2016)Google Scholar
  29. 29.
    Sanchez, J.A., Romero, V., Toselli, A.H., Vidal, E.: ICFHR2016 competition on handwritten text recognition on the READ dataset. In: 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 630–635. IEEE, October 2016.  https://doi.org/10.1109/ICFHR.2016.0120
  30. 30.
    Tensmeyer, C., Saunders, D., Martinez, T.: Convolutional neural networks for font classification. In: 14th International Conference on Document Analysis and Recognition (ICDAR), pp. 985–990. IEEE, November 2018. http://doi.ieeecomputersociety.org/10.1109/ICDAR.2017.164
  31. 31.
    Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. CoRR abs/1609.03605 (2016). http://arxiv.org/abs/1609.03605
  32. 32.
    Vinciarelli, A.: A survey on off-line cursive word recognition. Pattern Recognit. 35(7), 1433–1446 (2002)CrossRefGoogle Scholar
  33. 33.
    Wigington, C., Stewart, S., Davis, B., Barrett, W., Price, B., Cohen, S.: Data augmentation for recognition of handwritten words and lines using a CNN-LSTM network. In: 14th International Conference on Document Analysis and Recognition (ICDAR), pp. 639–645 (2017)Google Scholar
  34. 34.
    Zamora-Martinez, F., Frinken, V., España-Boquera, S., Castro-Bleda, M.J., Fischer, A., Bunke, H.: Neural network language models for off-line handwriting recognition. Pattern Recognit. 47(4), 1642–1652 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Curtis Wigington
    • 1
    • 2
    Email author
  • Chris Tensmeyer
    • 1
    • 2
  • Brian Davis
    • 1
  • William Barrett
    • 1
  • Brian Price
    • 2
  • Scott Cohen
    • 2
  1. 1.Brigham Young UniversityProvoUSA
  2. 2.Adobe ResearchSan JoseUSA

Personalised recommendations