HWNet v2: an efficient word image representation for handwritten documents

Krishnan, Praveen; Jawahar, C. V.

doi:10.1007/s10032-019-00336-x

HWNet v2: an efficient word image representation for handwritten documents

Original Paper
Published: 31 July 2019

Volume 22, pages 387–405, (2019)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

865 Accesses
44 Citations
3 Altmetric
Explore all metrics

Abstract

We present a framework for learning an efficient holistic representation for handwritten word images. The proposed method uses a deep convolutional neural network with traditional classification loss. The major strengths of our work lie in: (i) the efficient usage of synthetic data to pre-train a deep network, (ii) an adapted version of the ResNet-34 architecture with the region of interest pooling (referred to as HWNet v2) which learns discriminative features for variable sized word images, and (iii) a realistic augmentation of training data with multiple scales and distortions which mimics the natural process of handwriting. We further investigate the process of transfer learning to reduce the domain gap between synthetic and real domain and also analyze the invariances learned at different layers of the network using visualization techniques proposed in the literature. Our representation leads to a state-of-the-art word spotting performance on standard handwritten datasets and historical manuscripts in different languages with minimal representation size. On the challenging iam dataset, our method is first to report an mAP of around 0.90 for word spotting with a representation size of just 32 dimensions. Furthermore, we also present results on printed document datasets in English and Indic scripts which validates the generic nature of the proposed framework for learning word image representation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Document Image Analysis Using Deep Multi-modular Features

Article 15 October 2022

Handwritten Kazakh and Russian (HKR) database for text recognition

Article 13 August 2021

HWNet v3: a joint embedding framework for recognition and retrieval of handwritten text

Article 28 January 2023

Notes

We use ImageMagick for rendering the word images. URL: http://www.imagemagick.org/script/index.php.
http://cvit.iiit.ac.in/research/projects/cvit-projects/hwnet.

References

Aldavert, D., Rusinol, M., Toledo, R., Lladós, J.: Integrating visual and textual cues for query-by-string word spotting. In: ICDAR (2013)
Aldavert, D., Rusiñol, M., Toledo, R., Lladós, J.: A study of bag-of-visual-words representations for handwritten keyword spotting. In: IJDAR (2015)
Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Segmentation-free word spotting with exemplar SVMs. In: PR (2014)
Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. In: PAMI (2014)
Ambati, V., Balakrishnan, N., Reddy, R., Pratha, L., Jawahar, C.V.: The digital library of India Project: process, policies and architecture. In: ICDL (2007)
Axler, G., Wolf, L.: Toward a dataset-agnostic word segmentation method. In: ICIP (2018)
Balasubramanian, A., Meshesha, M., Jawahar, C.V.: Retrieval from document image collections. In: DAS (2006)
Google Scholar
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML (2009)
Causer, T., Wallace, V.: Building a volunteer community: results and findings from Transcribe Bentham. In: Digital Humanities Quarterly (2012)
Chen, K., Seuret, M., Liwicki, M., Hennebert, J., Ingold, R.: Page segmentation of historical document images with convolutional autoencoders. In: ICDAR (2015)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR09 (2009)
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. In: ICML (2014)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. In: IJCV (2010)
Fischer, A., Frinken, V., Bunke, H., Suen, C.Y.: Improving HMM-based keyword spotting with character language models. In: 2013 12th International Conference on Document Analysis and Recognition (2013)
Fischer, A., Keller, A., Frinken, V., Bunke, H.: Lexicon-free handwritten word spotting using character HMMs. In: PRL (2012)
Ghosh, S., Valveny, E.: Text box proposals for handwritten word spotting from documents. In: IJDAR (2018)
Article Google Scholar
Ghosh, S.K., Valveny, E.: A sliding window framework for word spotting based on word attributes. In: IbPRIA (2015)
Giotis, A.P., Sfikas, G., Gatos, B., Nikou, C.: A survey of document image word spotting techniques. Pattern Recognit. 68, 310–332 (2017)
Article Google Scholar
Girshick, R.: Fast R-CNN. In: ICCV (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Gómez, L., Rusinol, M., Karatzas, D.: LSDE: Levenshtein space deep embedding for query-by-string word spotting. In: ICDAR (2017)
Gordo, A., Almazán, J., Murray, N., Perronin, F.: LEWIS: latent embeddings for word images and their semantics. In: ICCV (2015)
Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: ICDAR (2015)
Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference (1988)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: CoRR (2015)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. In: IJCV (2014)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. In: CoRR (2014)
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: ECCV (2014)
Kovalchuk, A., Wolf, L., Dershowitz, N.: A simple and fast word spotting method. In: ICFHR (2014)
Krishnan, P., Dutta, K., Jawahar, C.V.: Deep feature embedding for accurate recognition and retrieval of handwritten text. In: ICFHR (2016)
Krishnan, P., Dutta, K., Jawahar, C.V.: Word spotting and recognition using deep embedding. In: DAS (2018)
Krishnan, P., Jawahar, C.V.: Matching handwritten document images. In: ECCV (2016)
Krishnan, P., Shekhar, R., Jawahar, C.: Content level access to digital library of India pages. In: ICVGIP (2012)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Kumar, A., Jawahar, C.V., Manmatha, R.: Efficient search in document image collections. In: ACCV (2007)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. In: IJCV (2004)
Article Google Scholar
Maaten, L.v.d., Hinton, G.: Visualizing data using t-SNE. In: JMLR (2008)
Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: CVPR (2015)
Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-SVMs for object detection and beyond. In: ICCV (2011)
Manmatha, R., Han, C., Riseman, E.M.: Word spotting: A new approach to indexing handwriting. In: CVPR (1996)
Marti, U., Bunke, H.: Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. In: IJPRAI (2001)
Marti, U., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. In: IJDAR (2002)
Meshesha, M., Jawahar, C.V.: Matching Word Images for Content-based Retrieval from Printed Document Images. In: IJDAR (2008)
Myers, C., Rabiner, L., Rosenberg, A.: Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Trans. Acoust. Speech Signal Process. 28(6), 623–635 (1980)
Article Google Scholar
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)
Perronnin, F., Rodríguez-Serrano, J.A.: Fisher kernels for handwritten word-spotting. In: ICDAR (2009)
Poznanski, A., Wolf, L.: CNN-N-Gram for handwriting word recognition. In: CVPR (2016)
Pratikakis, I., Zagoris, K., Gatos, B., Puigcerver, J., Toselli, A.H., Vidal, E.: ICFHR2016 handwritten keyword spotting competition (H-KWS 2016). In: ICFHR (2016)
Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: CVPR (2003)
Rath, T.M., Manmatha, R.: Word spotting for historical documents. In: IJDAR (2007)
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPR (2014)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (2015)
Rodriguez, J.A., Perronnin, F.: Local gradient histogram features for word spotting in unconstrained handwritten documents (2008)
Rodríguez-Serrano, J.A., Perronnin, F.: A model-based sequence similarity with application to handwritten word spotting. In: PAMI (2012)
Rohlicek, J.R., Russell, W., Roukos, S., Gish, H.: Continuous hidden Markov modeling for speaker-independent word spotting. In: ICASSP (1989)
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: CVPR (2016)
Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: ECCV (2006)
Rothacker, L., Rusinol, M., Fink, G.A.: Bag-of-features HMMs for segmentation-free word spotting in handwritten documents. In: ICDAR (2013)
Rothacker, L., Sudholt, S., Rusakov, E., Kasperidus, M., Fink, G.A.: Word hypotheses for segmentation-free word spotting in historic document images. In: ICDAR
Roy, P.P., Rayar, F., Ramel, J.Y.: Word spotting in historical documents using primitive codebook and dynamic programming. Image Vis. Comput. 44, 15–28 (2015)
Article Google Scholar
Rozantsev, A., Lepetit, V., Fua, P.: On rendering synthetic images for training an object detector. In: CVIU (2015)
Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Browsing heterogeneous document collections by a segmentation-free word spotting method. In: ICDAR (2011)
Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Efficient segmentation-free keyword spotting in historical document collections. In: PR (2015)
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)
Article Google Scholar
Shekhar, R., Jawahar, C.V.: Word image retrieval using bag of visual words. In: DAS (2012)
Simard, P.Y., Steinkraus, D., Platt, J.C.: Best practices for convolutional neural networks applied to visual document analysis. In: ICDAR (2003)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: CoRR (2014)
Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: ICCV (2003)
Sudholt, S., Fink, G.A.: PHOCNet: A deep convolutional neural network for word spotting in handwritten documents. In: ICFHR (2016)
Sudholt, S., Fink, G.A.: Evaluating word string embeddings and loss functions for CNN-based word spotting. In: ICDAR (2017)
Sudholt, S., Fink, G.A.: Attribute CNNs for word spotting in handwritten documents. Int. J. Doc. Anal. Recognit. (IJDAR) 21(3), 199–218 (2018)
Article Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
Terasawa, K., Tanaka, Y.: Slit style HOG feature for document image word spotting. In: ICDAR (2009)
Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: HMM word graph based keyword spotting in handwritten document images. Inf. Sci. 370–371, 497–518 (2016)
Article Google Scholar
Vinciarelli, A., Bengio, S.: Offline cursive word recognition using continuous density hidden markov models trained with PCA or ICA features. In: ICPR (2002)
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR (2010)
Wilkinson, T., Brun, A.: Semantic and verbatim word spotting using deep neural networks. In: ICFHR (2016)
Wilkinson, T., Lindstrom, J., Brun, A.: Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections. In: ICCV (2017)
Wilkinson, T., Lindström, J., Brun, A.: Neural word search in historical manuscript collections. In: CoRR arXiv:1812.02771 (2018)
Yalniz, I.Z., Manmatha, R.: An efficient framework for searching text in noisy document images. In: DAS (2012)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: NIPS (2014)
Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H.: Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: ECCV (2014)

Download references

Acknowledgements

This work is partly supported by IMPRINT. Praveen Krishnan is supported by Amazon Alexa Graduate Fellowship.

Author information

Authors and Affiliations

Center for Visual Information Technology, IIIT Hyderabad, Hyderabad, India
Praveen Krishnan & C. V. Jawahar

Authors

Praveen Krishnan
View author publications
You can also search for this author in PubMed Google Scholar
C. V. Jawahar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Praveen Krishnan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Krishnan, P., Jawahar, C.V. HWNet v2: an efficient word image representation for handwritten documents. IJDAR 22, 387–405 (2019). https://doi.org/10.1007/s10032-019-00336-x

Download citation

Received: 23 February 2018
Revised: 16 March 2019
Accepted: 15 July 2019
Published: 31 July 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10032-019-00336-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HWNet v2: an efficient word image representation for handwritten documents

Abstract

Access this article

Similar content being viewed by others

Document Image Analysis Using Deep Multi-modular Features

Handwritten Kazakh and Russian (HKR) database for text recognition

HWNet v3: a joint embedding framework for recognition and retrieval of handwritten text

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

HWNet v2: an efficient word image representation for handwritten documents

Abstract

Access this article

Similar content being viewed by others

Document Image Analysis Using Deep Multi-modular Features

Handwritten Kazakh and Russian (HKR) database for text recognition

HWNet v3: a joint embedding framework for recognition and retrieval of handwritten text

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation