Skip to main content
Log in

HWNet v3: a joint embedding framework for recognition and retrieval of handwritten text

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Learning an efficient label embedding framework for word images enables effective word spotting of handwritten documents. In this work, we propose different schemes of label embedding for word images using deep neural architectures and their representations. We refer to our first scheme as the two-stage label embedding technique which projects both word images and their corresponding textual strings into a common subspace. We further introduce an end-to-end label embedding scheme using deep neural architecture which simplifies the embedding process and reports state-of-the-art performance for the task of word spotting and recognition. We also validate the role of synthetic data as a complementary modality to further enhance the embedding process. On the challenging IAM handwritten dataset, we report an mAP of 0.9753 for query-by-string-based word spotting, while under lexicon-based word recognition, our proposed method reports 1.67 and 3.62 character and word error rates, respectively. We also present the detailed ablation study on various variants of our end-to-end embedding architecture and perform analysis under varying embedding sizes. We further validate the embedding scheme on degraded printed document datasets from both Latin and Indic scripts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Aldavert, D., Rusiñol, M., Toledo, R., Lladós, J.: Integrating visual and textual cues for query-by-string word spotting. In: International Conference on Document Analysis and Recognition, ICDAR, pp. 511–515 (2013)

  2. Aldavert, D., Rusiñol, M., Toledo, R., Lladós, J.: A study of bag-of-visual-words representations for handwritten keyword spotting. Int. J. Doc. Anal. Recognit. 18(3), 223–234 (2015)

    Article  Google Scholar 

  3. Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Segmentation-free word spotting with exemplar SVMs. Pattern Recognit. 47(12), 3967–3978 (2014)

    Article  Google Scholar 

  4. Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)

    Article  Google Scholar 

  5. Ambati, V., Balakrishnan, N., Reddy, R., Pratha, L., Jawahar, C.: The digital library of India project: process, policies and architecture. In: International Conference on Digital Libraries, ICDL (2006)

  6. Barakat, B.K., Alasam, R., El-Sana, J.: Word spotting using convolutional siamese network. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 229–234. IEEE (2018)

  7. Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Computer Vision and Pattern Recognition, CVPR, pp. 539–546 (2005)

  8. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition, CVPR, pp. 886–893 (2005)

  9. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)

    Article  Google Scholar 

  10. Dutta, K., Krishnan, P., Mathew, M., Jawahar, C.V.: Improving CNN-RNN hybrid networks for handwriting recognition. In: International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 80–85 (2018)

  11. Giotis, A.P., Sfikas, G., Gatos, B., Nikou, C.: A survey of document image word spotting techniques. Pattern Recognit. 68, 310–332 (2017)

    Article  Google Scholar 

  12. Gomez-Bigorda, L., Rusiñol, M., Karatzas, D.: LSDE: Levenshtein space deep embedding for query-by-string word spotting. In: International Conference on Document Analysis and Recognition, ICDAR, pp. 499–504 (2017)

  13. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, NIPS, pp. 2672–2680 (2014)

  14. Graves, A., Fernández, S., Gomez, F.J., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the Twenty-Third International Conference, ICML, vol. 148, pp. 369–376 (2006)

  15. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)

    Article  Google Scholar 

  16. Harris, C.G., Stephens, M.: A combined corner and edge detector. In: Taylor, C.J. (ed.) Alvey Vision Conference, AVC, pp. 1–6 (1988)

  17. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: IEEE International Conference on Computer Vision, ICCV, pp. 1026–1034 (2015)

  18. Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, NIPS, pp. 2017–2025 (2015)

  19. Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)

    Article  Google Scholar 

  20. Kim, Y., Jernite, Y., Sontag, D.A., Rush, A.M.: Character-aware neural language models. In: Schuurmans, D., Wellman, M.P. (eds.) Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 2741–2749 (2016)

  21. Krishnan, P., Dutta, K., Jawahar, C.V.: Deep feature embedding for accurate recognition and retrieval of handwritten text. In: International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 289–294 (2016)

  22. Krishnan, P., Dutta, K., Jawahar, C.V.: Word spotting and recognition using deep embedding. In: IAPR International Workshop on Document Analysis Systems, DAS, pp. 1–6 (2018)

  23. Krishnan, P., Jawahar, C.V.: Generating synthetic data for text recognition (2016). CoRR arXiv:1608.04224

  24. Krishnan, P., Jawahar, C.V.: Matching handwritten document images. In: European Conference on Computer Vision, ECCV, vol. 9905, 766–782 (2016)

  25. Krishnan, P., Jawahar, C.V.: Hwnet v2: an efficient word image representation for handwritten documents. Int. J. Doc. Anal. Recognit. IJDAR 22(4), 387–405 (2019)

    Article  Google Scholar 

  26. Krishnan, P., Shekhar, R., Jawahar, C.V.: Content level access to digital library of India pages. In: Triggs, B., Bala, K., Chandran, S. (eds.) Indian Conference on Vision, Graphics and Image Processing, ICVGIP, p. 5 (2012)

  27. Kumar, A., Jawahar, C.V., Manmatha, R.: Efficient search in document image collections. In: Asian Conference on Computer Vision, ACCV, vol. 4843, pp. 586–595 (2007)

  28. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. IJCV 60(2), 91–110 (2004)

    Article  Google Scholar 

  29. Manmatha, R., Han, C., Riseman, E.M.: Word spotting: a new approach to indexing handwriting. In: Computer Vision and Pattern Recognition, CVPR, pp. 631–637 (1996)

  30. Marti, U., Bunke, H.: Using a statistical language model to improve the performance of an hmm-based cursive handwriting recognition system. Int. J. Pattern Recognit. Artif. Intell. 15(1), 65–90 (2001)

    Article  Google Scholar 

  31. Marti, U., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)

    Article  MATH  Google Scholar 

  32. Myers, C., Rabiner, L., Rosenberg, A.: Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Trans. Acoust. Speech Signal Process. 28(6), 623–635 (1980)

    Article  MATH  Google Scholar 

  33. Poznanski, A., Wolf, L.: CNN-N-gram for handwriting word recognition. In: Computer Vision and Pattern Recognition, CVPR, pp. 2305–2314 (2016)

  34. Pratikakis, I., Zagoris, K., Gatos, B., Puigcerver, J., Toselli, A.H., Vidal, E.: ICFHR2016 handwritten keyword spotting competition (H-KWS 2016). In: International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 613–618 (2016)

  35. Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: Computer Vision and Pattern Recognition, CVPR, pp. 521–527 (2003)

  36. Rath, T.M., Manmatha, R.: Word spotting for historical documents. Int. J. Doc. Anal. Recognit. 9(2–4), 139–152 (2007)

    Article  Google Scholar 

  37. Rodriguez, J.A., Perronnin, F.: Local gradient histogram features for word spotting in unconstrained handwritten documents. In: International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 7–12 (2008)

  38. Rodriguez-Serrano, J.A., Gordo, A., Perronnin, F.: Label embedding: a frugal baseline for text recognition. Int. J. Comput. Vis. 113(3), 193–207 (2015)

    Article  Google Scholar 

  39. Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: European Conference on Computer, ECCV, vol. 3951, pp. 430–443 (2006)

  40. Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Efficient segmentation-free keyword spotting in historical document collections. Pattern Recognit. 48(2), 545–555 (2015)

    Article  Google Scholar 

  41. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)

    Article  MATH  Google Scholar 

  42. Shekhar, R., Jawahar, C.V.: Word image retrieval using bag of visual words. In: IAPR International Workshop on Document Analysis Systems, DAS, pp. 297–301 (2012)

  43. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)

    Article  Google Scholar 

  44. Simard, P.Y., Steinkraus, D., Platt, J.C.: Best practices for convolutional neural networks applied to visual document analysis. In: International Conference on Document Analysis and Recognition, ICDAR, pp. 958–962 (2003)

  45. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations, ICLR (2015)

  46. Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision, ICCV, pp. 1470–1477 (2003)

  47. Stuner, B., Chatelain, C., Paquet, T.: Cohort of LSTM and lexicon verification for handwriting recognition with gigantic lexicon (2016). CoRR arXiv:1612.07528

  48. Sudholt, S., Fink, G.A.: PHOCNet: a deep convolutional neural network for word spotting in handwritten documents. In: International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 277–282 (2016)

  49. Sudholt, S., Fink, G.A.: Evaluating word string embeddings and loss functions for CNN-based word spotting. In: International Conference on Document Analysis and Recognition, ICDAR, pp. 493–498 (2017)

  50. Sudholt, S., Fink, G.A.: Attribute CNNs for word spotting in handwritten documents. Int. J. Doc. Anal. Recognit. 21(3), 199–218 (2018)

    Article  Google Scholar 

  51. Sueiras, J., Ruíz, V., Sánchez, Á., Vélez, J.F.: Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing 289, 119–128 (2018)

    Article  Google Scholar 

  52. Sun, Z., Jin, L., Xie, Z., Feng, Z., Zhang, S.: Convolutional multi-directional recurrent network for offline handwritten text recognition. In: International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 240–245 (2016)

  53. Terasawa, K., Tanaka, Y.: Slit style HOG feature for document image word spotting. In: International Conference on Document Analysis and Recognition, ICDAR, pp. 116–120 (2009)

  54. Wigington, C., Stewart, S., Davis, B.L., Barrett, B., Price, B.L., Cohen, S.: Data augmentation for recognition of handwritten words and lines using a CNN-LSTM network. In: IAPR International Conference on Document Analysis and Recognition, ICDAR, pp. 639–645 (2017)

  55. Wilkinson, T., Brun, A.: Semantic and verbatim word spotting using deep neural networks. In: International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 307–312 (2016)

  56. Yalniz, I.Z., Manmatha, R.: An efficient framework for searching text in noisy document images. In: IAPR International Workshop on Document Analysis Systems, DAS, pp. 48–52 (2012)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Praveen Krishnan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Krishnan, P., Dutta, K. & Jawahar, C.V. HWNet v3: a joint embedding framework for recognition and retrieval of handwritten text. IJDAR 26, 401–417 (2023). https://doi.org/10.1007/s10032-022-00423-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-022-00423-6

Keywords

Navigation