Word searching in scene image and video frame in multi-script scenario using dynamic shape coding

Roy, Partha Pratim; Bhunia, Ayan Kumar; Bhattacharyya, Avirup; Pal, Umapada

doi:10.1007/s11042-018-6484-5

Word searching in scene image and video frame in multi-script scenario using dynamic shape coding

Published: 16 August 2018

Volume 78, pages 7767–7801, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Partha Pratim Roy ORCID: orcid.org/0000-0003-4526-2015¹,
Ayan Kumar Bhunia²,
Avirup Bhattacharyya² &
…
Umapada Pal³

284 Accesses
11 Citations
Explore all metrics

Abstract

Retrieval of text information from natural scene images and video frames is a challenging task due to its inherent problems like complex character shapes, low resolution, background noise, etc. Available OCR systems often fail to retrieve such information in scene/video frames. Keyword spotting, an alternative way to retrieve information, performs efficient text searching in such scenarios. However, current word spotting techniques in scene/video images are script-specific and they are mainly developed for Latin script. This paper presents a novel word spotting framework using dynamic shape coding for text retrieval in natural scene image and video frames. The framework is designed to search query keyword from multiple scripts with the help of on-the-fly script-wise keyword generation for the corresponding script. We have used a two-stage word spotting approach using Hidden Markov Model (HMM) to detect the translated keyword in a given text line by identifying the script of the line. A novel unsupervised dynamic shape coding based scheme has been used to group similar shape characters to avoid confusion and to improve text alignment. Next, the hypotheses locations are verified to improve retrieval performance. To evaluate the proposed system for searching keyword from natural scene image and video frames, we have considered two popular Indic scripts such as Bangla (Bengali) and Devanagari along with English. Inspired by the zone-wise recognition approach in Indic scripts [37], zone-wise text information has been used to improve the traditional word spotting performance in Indic scripts. For our experiment, a dataset consisting of images of different scenes and video frames of English, Bangla and Devanagari scripts were considered. The results obtained showed the effectiveness of our proposed word spotting approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Fig. 7

A survey on camera-captured scene text detection and extraction: towards Gurmukhi script

Article 19 January 2017

Decade research on text detection in images/videos: a review

Article 06 June 2019

A Novel Arbitrary-Oriented Multilingual Text Detection in Images/Video

Notes

https://code.google.com/p/tesseract-ocr/

References

Banerjee P, Chaudhuri BB (2013) An approach for Bangla and Devanagari video text recognition, in Proceedings of the 4th International Workshop on Multilingual OCR, p. 8
Bhunia AK, Das A, Roy PP, and Pal U (2015) A comparative study of features for handwritten Bangla text recognition, in 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 636–640
Bhunia AK, Kumar G, Roy PP, Balasubramanian R, Pal U (2018) Text recognition in scene image and video frame using Color Channel selection, Multimed Tools Appl, 77(7):8551–8578
Article Google Scholar
Bhunia AK, Roy PP, Mohata A, Pal U (2018) Cross-language framework for word recognition and spotting of Indic scripts. Pattern Recogn 79:12–31
Article Google Scholar
Bianne-Bernard AL, Menasri F, Al-Hajj Mohamad R, Mokbel C, Kermorvant C, Likforman-Sulem L (2011) Dynamic and contextual information in HMM modeling for handwritten word recognition. IEEE Trans Pattern Anal Mach Intell 33(10):2066–2080
Article Google Scholar
Cao H, Prasad R, Natarajan P (2011) Handwritten and typewritten text identification and recognition using hidden Markov models, in Document Analysis and Recognition (ICDAR), 2011 International Conference on, pp. 744–748
Chaudhuri BB, Pal U (1998) A complete printed Bangla OCR system. Pattern Recogn 31(5):531–549
Article Google Scholar
Chen D, Odobez J-M (2005) Video text recognition using sequential Monte Carlo and error voting methods. Pattern Recogn Lett 26(9):1386–1403
Article Google Scholar
Chen X, Yuille AL (2004) Detecting and reading text in natural scenes, in Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, vol. 2, pp. II--II
Chen H, Tsai SS, Schroth G, Chen DM, Grzeszczuk R, and Girod B (2011) Robust text detection in natural images with edge-enhanced maximally stable extremal regions, in Proceedings - International Conference on Image Processing, ICIP, pp. 2609–2612
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection, in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, pp. 886–893
Fischer A, Keller A, Frinken V, Bunke H (2012) Lexicon-free handwritten word spotting using character HMMs. Pattern Recogn Lett 33(7):934–942
Article Google Scholar
Frinken V, Fischer A, Manmatha R, Bunke H (2012) A novel word spotting method based on recurrent neural networks. IEEE Trans Pattern Anal Mach Intell 34(2):211–224
Article Google Scholar
Gatos B et al. (2015) GRPOLY-DB: An old Greek polytonic document image database, in Document Analysis and Recognition (ICDAR), 2015 13th International Conference on, pp. 646–650
Giotis AP, Sfikas G, Gatos B, Nikou C (2017) A survey of document image word spotting techniques. Pattern Recogn 68:310–332
Article Google Scholar
Guo JK, Ma MY (2001) Separating handwritten material from machine printed text using hidden markov models, in Document Analysis and Recognition, Proceedings. Sixth International Conference on, 2001, pp. 439–443
He P, Huang W, Qiao Y, Loy CC, and Tang X (2016) Reading Scene Text in Deep Convolutional Sequences., in AAAI, pp. 3501–3508
Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors, in Proceedings of the IEEE International Conference on Computer Vision, pp. 1241–1248
Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting, in European conference on computer vision, pp. 512–528
Google Scholar
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20
Article MathSciNet Google Scholar
Khotanzad A, Hong YH (1990) Invariant image recognition by Zernike moments. IEEE Trans Pattern Anal Mach Intell 12(5):489–497
Article Google Scholar
Krishnan P, Dutta K, and Jawahar CV (2016) Deep feature embedding for accurate recognition and retrieval of handwritten text, In International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 289–294
Kumar G, Govindaraju V (2017) Bayesian background models for keyword spotting in handwritten documents. Pattern Recogn 64:84–91
Article Google Scholar
Li K, He FZ, Yu HP, Chen X (2017) A correlative classifiers approach based on particle filter and sample set for tracking occluded target. Applied Mathematics-A Journal of Chinese Universities 32(3):294–312
Article MathSciNet Google Scholar
Li K, He FZ, Yu HP (2018) Robust visual tracking based on convolutional features with illumination and occlusion handing. J Comput Sci Technol 33(1):223–236
Article Google Scholar
Lu S, Li L, Tan CL (2008) Document image retrieval through word shape coding. IEEE Trans Pattern Anal Mach Intell 30(11):1913–1918
Article Google Scholar
Marti U-V, Bunke H (2001) Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. Int J Pattern Recognit Artif Intell 15(1):65–90
Article Google Scholar
Nakayama T (1994) Modeling content identification from document images, in Proceedings of the fourth conference on Applied natural language processing, pp. 22–27
Neumann L, Matas J (2011) A method for text localization and recognition in real-world images. Comput Vision--ACCV 2010:770–783
Google Scholar
Neumann L, Matas J (2012) Real-time scene text localization and recognition, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3538–3545
Quy Phan T, Shivakumara P, Tian S, and Lim Tan C (2013) Recognizing text with perspective distortion in natural scenes, in Proceedings of the IEEE International Conference on Computer Vision, pp. 569–576
Rath TM, Manmatha R (2006) Word spotting for historical documents. Int J Doc Anal Recognit 9(2–4):139–152
Google Scholar
Rodriguez-Serrano JA, Perronnin F (2009) Handwritten word-spotting using hidden Markov models and universal vocabularies. Pattern Recogn 42(9):2106–2116
Article Google Scholar
Roy S, Shivakumara P, Roy PP, and Tan CL (2012) Wavelet-Gradient-Fusion for Video Text Binarization, Int. Conf. Pattern Recognit., no. Icpr, pp. 3300–3303
Roy PP, Rayar F, Ramel J-Y (2015) Word spotting in historical documents using primitive codebook and dynamic programming. Image Vis Comput 44:15–28
Article Google Scholar
Roy S, Shivakumara P, Roy PP, Pal U, Tan CL, Lu T (2015) Bayesian classifier for multi-oriented video text recognition system. Expert Syst Appl 42(13):5554–5566
Article Google Scholar
Roy PP, Bhunia AK, Das A, Dey P, Pal U (2016) HMM-based Indic handwritten word recognition using zone segmentation. Pattern Recogn 60:1057–1075
Article Google Scholar
Roy PP, Bhunia AK, Pal U (2017) Date-field retrieval in scene image and video frames using text enhancement and shape coding, Neurocomputing
Roy PP, Bhunia AK, Das A, Dhar P, Pal U (2017) Keyword spotting in doctor’s handwriting on medical prescriptions. Expert Syst Appl 76:113–128
Article Google Scholar
Rusiñol M, Aldavert D, Toledo R, Lladós J (2015) Efficient segmentation-free keyword spotting in historical document collections. Pattern Recogn 48(2):545–555
Article Google Scholar
Saidane Z, Garcia C (2007) Robust binarization for video text recognition, in Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, vol. 2, pp. 874–878
Sain A, Bhunia AK, Roy PP, Pal U (2018) Multi-oriented text detection and verification in video frames and scene images. Neurocomputing 275:1531–1549
Article Google Scholar
Sharma N, Shivakumara P, Pal U, Blumenstein M, and Tan CL (2012) A new method for arbitrarily-oriented text detection in video, in Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on, pp. 74–78
Shivakumara P, Liang G, Roy S, Pal U, and Lu T (2015) New texture-spatial features for keyword spotting in video images, in Pattern Recognition (ACPR), 2015 3rd IAPR Asian Conference on, pp. 391–395
Srihari SN, Srinivasan H, Huang C, Shetty S (2006) Spotting words in Latin, Devanagari and Arabic scripts. Vivek-Bombay 16(3):2
Google Scholar
Sudholt S, Fink GA (2016) PHOCNet: A deep convolutional neural network for word spotting in handwritten documents, In International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 277–282
Sun L, Huo Q, Jia W, Chen K (2015) A robust approach for text detection from natural scene images. Pattern Recogn 48(9):2906–2920
Article Google Scholar
Sun J, He FZ, Chen YL, Chen X (2016) A multiple template approach for robust tracking of fast motion target. Applied Mathematics-A Journal of Chinese Universities 31(2):177–197
Article MathSciNet Google Scholar
Tarafdar A, Mondal R, Pal S, Pal U, and Kimura F (2010) Shape code based word-image matching for retrieval of Indian multi-lingual documents, in Pattern Recognition (ICPR), 2010 20th International Conference on, pp. 1989–1992
Thomas S, Chatelain CC, Heutte L, Paquet T, Kessentini Y (2015) A deep HMM model for multiple keywords spotting in handwritten documents. Pattern Anal Appl 18(4):1003–1015
Article MathSciNet Google Scholar
Toselli AH, Vidal E, Romero V, Frinken V (2016) HMM word graph based keyword spotting in handwritten document images. Inf Sci (Ny) 370–371:497–518
Article Google Scholar
Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition, in Computer Vision (ICCV), 2011 IEEE International Conference on, pp. 1457–1464
Wang T, Wu DJ, Coates A, and Ng AY (2012) End-to-end text recognition with convolutional neural networks, in Pattern Recognition (ICPR), 2012 21st International Conference on, pp. 3304–3308
Wang R, Sang N, Gao C (2015) Text detection approach based on confidence map and context information. Neurocomputing 157:153–165
Article Google Scholar
Wilkinson T, Lindström J, and Brun A (2017) Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections. In International Conference Computer Vision (ICCV), pp. 4443–4452
Wshah S, Kumar G, Govindaraju V (2014) Statistical script independent word spotting in offline handwritten documents,” in. Pattern Recogn 47(3):1039–1050
Article Google Scholar
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images, in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 1083–1090
Ye Q, Doermann D (2015) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500
Article Google Scholar
Yin X-C, Yin X, Huang K, Hao H-W (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983
Article Google Scholar
Yin X-C, Pei W-Y, Zhang J, Hao H-W (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37(9):1930–1937
Article Google Scholar
Young SJ et al. (2009) The HTK Book (for HTK Version 3.4), Construction, no. July 2000, p. 384
Yu C, Song Y, Zhang Y (2016) Scene text localization using edge analysis and feature pool. Neurocomputing 175:652–661
Article Google Scholar
Yu H, He F, Pan Y (2018) A novel region-based active contour model via local patch similarity measure for image segmentation. Multimedia Tools and Applications:1–23
Zagoris K, Pratikakis I, and Gatos B (2017) Unsupervised Word Spotting in Historical Handwritten Document Images using Document-oriented Local Features, IEEE Trans. Image Process
Zhang X, Pal U, and Tan CL (2014) Segmentation-free Keyword spotting for Bangla handwritten documents, in Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on, pp. 381–386
Zhang Z, Shen W, Yao C, and Bai X (2015) Symmetry-based text line detection in natural scenes, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2558–2567
Zhou Z, Li L, Tan CL (2010) Edge based binarization for video text images, in Proceedings - International Conference on Pattern Recognition, pp. 133–136
Zhou Y, He F, Qiu Y (2017) Dynamic strategy based parallel ant colony optimization on GPUs for TSPs. SCIENCE CHINA Inf Sci 60(6):068102
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, Indian Institute of Technology Roorkee, Roorkee, India
Partha Pratim Roy
Department of ECE, Institute of Engineering & Management, Kolkata, India
Ayan Kumar Bhunia & Avirup Bhattacharyya
CVPR Unit, Indian Statistical Institute, Kolkata, India
Umapada Pal

Authors

Partha Pratim Roy
View author publications
You can also search for this author in PubMed Google Scholar
Ayan Kumar Bhunia
View author publications
You can also search for this author in PubMed Google Scholar
Avirup Bhattacharyya
View author publications
You can also search for this author in PubMed Google Scholar
Umapada Pal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Partha Pratim Roy.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roy, P.P., Bhunia, A.K., Bhattacharyya, A. et al. Word searching in scene image and video frame in multi-script scenario using dynamic shape coding. Multimed Tools Appl 78, 7767–7801 (2019). https://doi.org/10.1007/s11042-018-6484-5

Download citation

Received: 25 November 2017
Revised: 30 May 2018
Accepted: 30 July 2018
Published: 16 August 2018
Issue Date: March 2019
DOI: https://doi.org/10.1007/s11042-018-6484-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Word searching in scene image and video frame in multi-script scenario using dynamic shape coding

Abstract

Access this article

Similar content being viewed by others

A survey on camera-captured scene text detection and extraction: towards Gurmukhi script

Decade research on text detection in images/videos: a review

A Novel Arbitrary-Oriented Multilingual Text Detection in Images/Video

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Word searching in scene image and video frame in multi-script scenario using dynamic shape coding

Abstract

Access this article

Similar content being viewed by others

A survey on camera-captured scene text detection and extraction: towards Gurmukhi script

Decade research on text detection in images/videos: a review

A Novel Arbitrary-Oriented Multilingual Text Detection in Images/Video

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation