Skip to main content

Pho(SC)-CTC—a hybrid approach towards zero-shot word image recognition

Abstract

Annotating words in a historical document image archive for word image recognition purpose demands time and skilled human resource (like historians, paleographers). In a real-life scenario, obtaining sample images for all possible words is also not feasible. However, zero-shot learning methods could aptly be used to recognize unseen/out-of-lexicon words in such historical document images. Based on previous state-of-the-art method for zero-shot word recognition “Pho(SC)Net”, we propose a hybrid model based on the CTC framework (Pho(SC)-CTC) that takes advantage of the rich features learned by Pho(SC)Net followed by a “connectionist temporal classification” (CTC) framework to perform the final classification. Encouraging results were obtained on two publicly available historical document datasets and one synthetic handwritten dataset, which justifies the efficacy of Pho(SC)-CTC and Pho(SC)Net.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Notes

  1. https://github.com/first20hours/google-10000-english.

  2. https://github.com/raviRB/Pho-SC--CTC.

References

  1. Akata, Z., Perronnin, F., Harchaoui, Z., et al.: Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(7), 1425–1438 (2016)

    Article  Google Scholar 

  2. Almazán, J., Gordo, A., Fornés, A., et al.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)

    Article  Google Scholar 

  3. Annadani, Y., Biswas, S.: Preserving semantic relations for zero-shot learning. In: The IEEE Conference on Computer Vision and Pattern Recognition(2018)

  4. Bluche, T., Hamel, S., Kermorvant, C., et al Preparatory KWS experiments for large-scale indexing of a vast medieval manuscript collection in the HIMANIS project. In: International Conference on Document Analysis and Recognition, pp 311–316(2017)

  5. Carbonell, M., Fornés, A., Villegas, M., et al.: A neural model for text localization, transcription and named entity recognition in full pages. Pattern Recogn. Lett. 136, 219–227 (2020)

    Article  Google Scholar 

  6. Chanda, S., Baas, J., Haitink, D.: et al Zero-shot learning based approach for medieval word recognition using deep-learned features. In: International Conference on Frontiers of Handwriting Recognition, pp 345–350(2018)

  7. Dutta, K., Krishnan, P., Mathew, M.: et al Improving CNN-RNN hybrid networks for handwriting recognition. In: International Conference on Frontiers of Handwriting Recognition, pp 80–85(2018)

  8. Fischer, A., Keller, A., Frinken, V., et al.: Lexicon-free handwritten word spotting using character hmms. Pattern Recogn. Lett. 33(7), 934–942 (2012)

    Article  Google Scholar 

  9. Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks Adv Neural Information Process Syst pp 545–552(2009)

  10. Graves, A., Fernández, S., Gomez, F.: et al Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural ’networks ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning 2006: 369–376(2006)

  11. Kang, L., Toledo, JI., Riba, P.: et al Convolve, attend and spell: An attention-based sequence-to-sequence model for handwritten word recognition. In: German Conference on Pattern Recognition, pp 459–472(2018)

  12. Kass, D., Vats, E.: Attentionhtr: Handwritten text recognition based on attention encoder-decoder networks(2022)

  13. Krishnan, P., Jawahar, C.: Hwnet v2: an efficient word image representation for handwritten documents. Int. J. Doc. Anal. Recogn. 22(4), 387–405 (2019)

    Article  Google Scholar 

  14. Krishnan, P., Jawahar, C.: Bringing semantics into word image representation. Pattern Recognition 108(2020)

  15. Krishnan, P., Dutta, K., Jawahar, CV.: Word spotting and recognition using deep embedding. In: Document Analysis Systems, pp 1–6(2018)

  16. Li, K., Min, MR., Fu, Y.: Rethinking zero-shot learning: A conditional visual classification perspective. In: IEEE International Conference on Computer Vision, pp 3583–3592(2019)

  17. Li, Y., Zhang, J., Zhang, J.: et al Discriminative learning of latent features for zero-shot recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp 7463–7471(2018)

  18. Long, S., He, X., Yao, C.: Scene text detection and recognition: the deep learning era. Int. J. Comput. Vision 129, 161–184 (2020)

    Article  Google Scholar 

  19. Marti, U.V., Bunke, H.: The iam-database: an english sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002)

    Article  Google Scholar 

  20. Niu, L., Veeraraghavan, A., Sabharwal, A.: Webly supervised learning meets zero-shot learning: A hybrid approach for fine-grained classification. In: IEEE Conference on Computer Vision and Pattern Recognition(2018)

  21. Paul, A., Krishnan, NC., Munjal, P.: Semantically aligned bias reducing zero shot learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7056–7065(2019)

  22. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2298–2304 (2017)

  23. Sudholt, S., Fink, GA.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition, 2016, pp 277–282(2016)

  24. Sudholt, S., Fink, GA.: Evaluating word string embeddings and loss functions for cnn-based word spotting. In: International Conference on Document Analysis and Recognition, pp 493–498(2017)

  25. Wilkinson, T., Lindström, J., Brun, A.: Neural ctrl-f: Segmentation-free query-by-string word spotting in handwritten manuscript collections. In: International Conference on Computer Vision, pp 4443–4452(2017)

  26. Wolf, F., Fink, GA.: Annotation-free learning of deep representations for word spotting using synthetic data and self labeling. In: Document Analysis Systems, Lecture Notes in Computer Science, vol 12116. Springer, pp 293–308(2020)

  27. Xian, Y., Schiele, B., Akata, Z.: Zero-shot learning - the good, the bad and the ugly. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp 3077–3086(2017)

  28. Xie, GS., Liu, L., Jin, X.: et al Attentive region embedding network for zero-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)

  29. Zhang, H., Koniusz, P.: Zero-shot kernel learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 7670–7679(2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ravi Bhatt.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bhatt, R., Rai, A., Chanda, S. et al. Pho(SC)-CTC—a hybrid approach towards zero-shot word image recognition. IJDAR (2022). https://doi.org/10.1007/s10032-022-00407-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10032-022-00407-6

Keywords

  • PHo(SC)Net
  • CTC
  • Zero-shot word recognition
  • Historical documents
  • Zero-shot learning
  • Word recognition