Skip to main content
Log in

Word searching in scene image and video frame in multi-script scenario using dynamic shape coding

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Retrieval of text information from natural scene images and video frames is a challenging task due to its inherent problems like complex character shapes, low resolution, background noise, etc. Available OCR systems often fail to retrieve such information in scene/video frames. Keyword spotting, an alternative way to retrieve information, performs efficient text searching in such scenarios. However, current word spotting techniques in scene/video images are script-specific and they are mainly developed for Latin script. This paper presents a novel word spotting framework using dynamic shape coding for text retrieval in natural scene image and video frames. The framework is designed to search query keyword from multiple scripts with the help of on-the-fly script-wise keyword generation for the corresponding script. We have used a two-stage word spotting approach using Hidden Markov Model (HMM) to detect the translated keyword in a given text line by identifying the script of the line. A novel unsupervised dynamic shape coding based scheme has been used to group similar shape characters to avoid confusion and to improve text alignment. Next, the hypotheses locations are verified to improve retrieval performance. To evaluate the proposed system for searching keyword from natural scene image and video frames, we have considered two popular Indic scripts such as Bangla (Bengali) and Devanagari along with English. Inspired by the zone-wise recognition approach in Indic scripts [37], zone-wise text information has been used to improve the traditional word spotting performance in Indic scripts. For our experiment, a dataset consisting of images of different scenes and video frames of English, Bangla and Devanagari scripts were considered. The results obtained showed the effectiveness of our proposed word spotting approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

Notes

  1. https://code.google.com/p/tesseract-ocr/

References

  1. Banerjee P, Chaudhuri BB (2013) An approach for Bangla and Devanagari video text recognition, in Proceedings of the 4th International Workshop on Multilingual OCR, p. 8

  2. Bhunia AK, Das A, Roy PP, and Pal U (2015) A comparative study of features for handwritten Bangla text recognition, in 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 636–640

  3. Bhunia AK, Kumar G, Roy PP, Balasubramanian R, Pal U (2018) Text recognition in scene image and video frame using Color Channel selection, Multimed Tools Appl, 77(7):8551–8578

    Article  Google Scholar 

  4. Bhunia AK, Roy PP, Mohata A, Pal U (2018) Cross-language framework for word recognition and spotting of Indic scripts. Pattern Recogn 79:12–31

    Article  Google Scholar 

  5. Bianne-Bernard AL, Menasri F, Al-Hajj Mohamad R, Mokbel C, Kermorvant C, Likforman-Sulem L (2011) Dynamic and contextual information in HMM modeling for handwritten word recognition. IEEE Trans Pattern Anal Mach Intell 33(10):2066–2080

    Article  Google Scholar 

  6. Cao H, Prasad R, Natarajan P (2011) Handwritten and typewritten text identification and recognition using hidden Markov models, in Document Analysis and Recognition (ICDAR), 2011 International Conference on, pp. 744–748

  7. Chaudhuri BB, Pal U (1998) A complete printed Bangla OCR system. Pattern Recogn 31(5):531–549

    Article  Google Scholar 

  8. Chen D, Odobez J-M (2005) Video text recognition using sequential Monte Carlo and error voting methods. Pattern Recogn Lett 26(9):1386–1403

    Article  Google Scholar 

  9. Chen X, Yuille AL (2004) Detecting and reading text in natural scenes, in Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, vol. 2, pp. II--II

  10. Chen H, Tsai SS, Schroth G, Chen DM, Grzeszczuk R, and Girod B (2011) Robust text detection in natural images with edge-enhanced maximally stable extremal regions, in Proceedings - International Conference on Image Processing, ICIP, pp. 2609–2612

  11. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection, in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, pp. 886–893

  12. Fischer A, Keller A, Frinken V, Bunke H (2012) Lexicon-free handwritten word spotting using character HMMs. Pattern Recogn Lett 33(7):934–942

    Article  Google Scholar 

  13. Frinken V, Fischer A, Manmatha R, Bunke H (2012) A novel word spotting method based on recurrent neural networks. IEEE Trans Pattern Anal Mach Intell 34(2):211–224

    Article  Google Scholar 

  14. Gatos B et al. (2015) GRPOLY-DB: An old Greek polytonic document image database, in Document Analysis and Recognition (ICDAR), 2015 13th International Conference on, pp. 646–650

  15. Giotis AP, Sfikas G, Gatos B, Nikou C (2017) A survey of document image word spotting techniques. Pattern Recogn 68:310–332

    Article  Google Scholar 

  16. Guo JK, Ma MY (2001) Separating handwritten material from machine printed text using hidden markov models, in Document Analysis and Recognition, Proceedings. Sixth International Conference on, 2001, pp. 439–443

  17. He P, Huang W, Qiao Y, Loy CC, and Tang X (2016) Reading Scene Text in Deep Convolutional Sequences., in AAAI, pp. 3501–3508

  18. Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors, in Proceedings of the IEEE International Conference on Computer Vision, pp. 1241–1248

  19. Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting, in European conference on computer vision, pp. 512–528

    Google Scholar 

  20. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20

    Article  MathSciNet  Google Scholar 

  21. Khotanzad A, Hong YH (1990) Invariant image recognition by Zernike moments. IEEE Trans Pattern Anal Mach Intell 12(5):489–497

    Article  Google Scholar 

  22. Krishnan P, Dutta K, and Jawahar CV (2016) Deep feature embedding for accurate recognition and retrieval of handwritten text, In International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 289–294

  23. Kumar G, Govindaraju V (2017) Bayesian background models for keyword spotting in handwritten documents. Pattern Recogn 64:84–91

    Article  Google Scholar 

  24. Li K, He FZ, Yu HP, Chen X (2017) A correlative classifiers approach based on particle filter and sample set for tracking occluded target. Applied Mathematics-A Journal of Chinese Universities 32(3):294–312

    Article  MathSciNet  Google Scholar 

  25. Li K, He FZ, Yu HP (2018) Robust visual tracking based on convolutional features with illumination and occlusion handing. J Comput Sci Technol 33(1):223–236

    Article  Google Scholar 

  26. Lu S, Li L, Tan CL (2008) Document image retrieval through word shape coding. IEEE Trans Pattern Anal Mach Intell 30(11):1913–1918

    Article  Google Scholar 

  27. Marti U-V, Bunke H (2001) Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. Int J Pattern Recognit Artif Intell 15(1):65–90

    Article  Google Scholar 

  28. Nakayama T (1994) Modeling content identification from document images, in Proceedings of the fourth conference on Applied natural language processing, pp. 22–27

  29. Neumann L, Matas J (2011) A method for text localization and recognition in real-world images. Comput Vision--ACCV 2010:770–783

    Google Scholar 

  30. Neumann L, Matas J (2012) Real-time scene text localization and recognition, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3538–3545

  31. Quy Phan T, Shivakumara P, Tian S, and Lim Tan C (2013) Recognizing text with perspective distortion in natural scenes, in Proceedings of the IEEE International Conference on Computer Vision, pp. 569–576

  32. Rath TM, Manmatha R (2006) Word spotting for historical documents. Int J Doc Anal Recognit 9(2–4):139–152

    Google Scholar 

  33. Rodriguez-Serrano JA, Perronnin F (2009) Handwritten word-spotting using hidden Markov models and universal vocabularies. Pattern Recogn 42(9):2106–2116

    Article  Google Scholar 

  34. Roy S, Shivakumara P, Roy PP, and Tan CL (2012) Wavelet-Gradient-Fusion for Video Text Binarization, Int. Conf. Pattern Recognit., no. Icpr, pp. 3300–3303

  35. Roy PP, Rayar F, Ramel J-Y (2015) Word spotting in historical documents using primitive codebook and dynamic programming. Image Vis Comput 44:15–28

    Article  Google Scholar 

  36. Roy S, Shivakumara P, Roy PP, Pal U, Tan CL, Lu T (2015) Bayesian classifier for multi-oriented video text recognition system. Expert Syst Appl 42(13):5554–5566

    Article  Google Scholar 

  37. Roy PP, Bhunia AK, Das A, Dey P, Pal U (2016) HMM-based Indic handwritten word recognition using zone segmentation. Pattern Recogn 60:1057–1075

    Article  Google Scholar 

  38. Roy PP, Bhunia AK, Pal U (2017) Date-field retrieval in scene image and video frames using text enhancement and shape coding, Neurocomputing

  39. Roy PP, Bhunia AK, Das A, Dhar P, Pal U (2017) Keyword spotting in doctor’s handwriting on medical prescriptions. Expert Syst Appl 76:113–128

    Article  Google Scholar 

  40. Rusiñol M, Aldavert D, Toledo R, Lladós J (2015) Efficient segmentation-free keyword spotting in historical document collections. Pattern Recogn 48(2):545–555

    Article  Google Scholar 

  41. Saidane Z, Garcia C (2007) Robust binarization for video text recognition, in Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, vol. 2, pp. 874–878

  42. Sain A, Bhunia AK, Roy PP, Pal U (2018) Multi-oriented text detection and verification in video frames and scene images. Neurocomputing 275:1531–1549

    Article  Google Scholar 

  43. Sharma N, Shivakumara P, Pal U, Blumenstein M, and Tan CL (2012) A new method for arbitrarily-oriented text detection in video, in Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on, pp. 74–78

  44. Shivakumara P, Liang G, Roy S, Pal U, and Lu T (2015) New texture-spatial features for keyword spotting in video images, in Pattern Recognition (ACPR), 2015 3rd IAPR Asian Conference on, pp. 391–395

  45. Srihari SN, Srinivasan H, Huang C, Shetty S (2006) Spotting words in Latin, Devanagari and Arabic scripts. Vivek-Bombay 16(3):2

    Google Scholar 

  46. Sudholt S, Fink GA (2016) PHOCNet: A deep convolutional neural network for word spotting in handwritten documents, In International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 277–282

  47. Sun L, Huo Q, Jia W, Chen K (2015) A robust approach for text detection from natural scene images. Pattern Recogn 48(9):2906–2920

    Article  Google Scholar 

  48. Sun J, He FZ, Chen YL, Chen X (2016) A multiple template approach for robust tracking of fast motion target. Applied Mathematics-A Journal of Chinese Universities 31(2):177–197

    Article  MathSciNet  Google Scholar 

  49. Tarafdar A, Mondal R, Pal S, Pal U, and Kimura F (2010) Shape code based word-image matching for retrieval of Indian multi-lingual documents, in Pattern Recognition (ICPR), 2010 20th International Conference on, pp. 1989–1992

  50. Thomas S, Chatelain CC, Heutte L, Paquet T, Kessentini Y (2015) A deep HMM model for multiple keywords spotting in handwritten documents. Pattern Anal Appl 18(4):1003–1015

    Article  MathSciNet  Google Scholar 

  51. Toselli AH, Vidal E, Romero V, Frinken V (2016) HMM word graph based keyword spotting in handwritten document images. Inf Sci (Ny) 370–371:497–518

    Article  Google Scholar 

  52. Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition, in Computer Vision (ICCV), 2011 IEEE International Conference on, pp. 1457–1464

  53. Wang T, Wu DJ, Coates A, and Ng AY (2012) End-to-end text recognition with convolutional neural networks, in Pattern Recognition (ICPR), 2012 21st International Conference on, pp. 3304–3308

  54. Wang R, Sang N, Gao C (2015) Text detection approach based on confidence map and context information. Neurocomputing 157:153–165

    Article  Google Scholar 

  55. Wilkinson T, Lindström J, and Brun A (2017) Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections. In International Conference Computer Vision (ICCV), pp. 4443–4452

  56. Wshah S, Kumar G, Govindaraju V (2014) Statistical script independent word spotting in offline handwritten documents,” in. Pattern Recogn 47(3):1039–1050

    Article  Google Scholar 

  57. Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images, in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 1083–1090

  58. Ye Q, Doermann D (2015) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500

    Article  Google Scholar 

  59. Yin X-C, Yin X, Huang K, Hao H-W (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983

    Article  Google Scholar 

  60. Yin X-C, Pei W-Y, Zhang J, Hao H-W (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37(9):1930–1937

    Article  Google Scholar 

  61. Young SJ et al. (2009) The HTK Book (for HTK Version 3.4), Construction, no. July 2000, p. 384

  62. Yu C, Song Y, Zhang Y (2016) Scene text localization using edge analysis and feature pool. Neurocomputing 175:652–661

    Article  Google Scholar 

  63. Yu H, He F, Pan Y (2018) A novel region-based active contour model via local patch similarity measure for image segmentation. Multimedia Tools and Applications:1–23

  64. Zagoris K, Pratikakis I, and Gatos B (2017) Unsupervised Word Spotting in Historical Handwritten Document Images using Document-oriented Local Features, IEEE Trans. Image Process

  65. Zhang X, Pal U, and Tan CL (2014) Segmentation-free Keyword spotting for Bangla handwritten documents, in Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on, pp. 381–386

  66. Zhang Z, Shen W, Yao C, and Bai X (2015) Symmetry-based text line detection in natural scenes, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2558–2567

  67. Zhou Z, Li L, Tan CL (2010) Edge based binarization for video text images, in Proceedings - International Conference on Pattern Recognition, pp. 133–136

  68. Zhou Y, He F, Qiu Y (2017) Dynamic strategy based parallel ant colony optimization on GPUs for TSPs. SCIENCE CHINA Inf Sci 60(6):068102

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Partha Pratim Roy.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roy, P.P., Bhunia, A.K., Bhattacharyya, A. et al. Word searching in scene image and video frame in multi-script scenario using dynamic shape coding. Multimed Tools Appl 78, 7767–7801 (2019). https://doi.org/10.1007/s11042-018-6484-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6484-5

Keywords

Navigation