Text recognition in scene image and video frame using Color Channel selection

Abstract

In recent years, recognition of text from natural scene image and video frame has got increased attention among the researchers due to its various complexities and challenges. Because of low resolution, blurring effect, complex background, different fonts, color and variant alignment of text within images and video frames, etc., text recognition in such scenario is difficult. Most of the current approaches usually apply a binarization algorithm to convert them into binary images and next OCR is applied to get the recognition result. In this paper, we present a novel approach based on color channel selection for text recognition from scene images and video frames. In the approach, at first, a color channel is automatically selected and then selected color channel is considered for text recognition. Our text recognition framework is based on Hidden Markov Model (HMM) which uses Pyramidal Histogram of Oriented Gradient features extracted from selected color channel. From each sliding window of a color channel our color-channel selection approach analyzes the image properties from the sliding window and then a multi-label Support Vector Machine (SVM) classifier is applied to select the color channel that will provide the best recognition results in the sliding window. This color channel selection for each sliding window has been found to be more fruitful than considering a single color channel for the whole word image. Five different features have been analyzed for multi-label SVM based color channel selection where wavelet transform based feature outperforms others. Our framework of color channel selection is script-independent. It has been tested in English (Roman) and Devanagari (Indic) scripts. We have tested our approach on English datasets (ICDAR 2003, ICDAR 2013, MSRA-TD500, IIIT5K, SVT, YVT) publicly available for both video and scene images. For Devanagari script, we collected our own dataset. The performances obtained from experimental results are encouraging and show the advantage of the proposed method.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

References

  1. 1.

    ABBYY FineReader 9.0. http://www.abbyy.com/

  2. 2.

    Alsharif O, Pineau J (2013) End-to-end text recognition with hybrid HMM maxout models. arXiv preprint arXiv:1310.1811

  3. 3.

    Bhunia AK, Das A, Roy PP, Pal U (2015) A comparative study of features of handwritten Bangla text recognition. In Proceedings of International Conference on Document Analysis and Recognition, pp. 636-640

  4. 4.

    Bissacco A, Cummins M, Netzer Y, Neven H (2013) PhotoOCR: reading text in uncontrolled conditions. In Proceedings of International Conference on Computer Vision, pp. 785-792

  5. 5.

    Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramid kernel. In Proceedings of ACM International Conference on Image and Video Retrieval, pp. 401-408

  6. 6.

    Chattopadhyay T, Reddy VR, Garain U (2013) Automatic selection of binarization method for robust OCR. In Proceedings of International Conference on Document Analysis and Recognition, pp. 1170-1174

  7. 7.

    Chen D, Odobez JM (2005) Video text recognition using sequential Monte Carlo and error voting methods. Pattern Recogn Lett 26(9):1386–1403

    Article  Google Scholar 

  8. 8.

    Chen J, Shan S, He C, Zhao G, Pietikainen M, Chen X, Gao W (2010) WLD: a robust local image descriptor. IEEE Trans Pattern Anal Mach Intell 32(9):1705–1720

    Article  Google Scholar 

  9. 9.

    Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 22-30

  10. 10.

    Gonzalez R. C., Woods R. E. (2006) Digital image processing(3rd Edition). Prentice-Hall, Upper Saddle River

  11. 11.

    Gonzalez A, Bergasa LM, Yebes JJ (2015) Text detection and recognition on traffic panels from street-level imagery using visual appearance. IEEE Trans Intell Transp Syst 16(3):228–238

    Google Scholar 

  12. 12.

    Greenhalgh J, Mirmehdi M (2015) Recognizing text-based traffic signs. IEEE Trans Intell Transp Syst 16(3):1360–1369

    Article  Google Scholar 

  13. 13.

    Haralick RM, Shanmugam K (1973) Textural features for image classification. IEEE Trans Syst Man Cybern 3(6):610–621

    Article  Google Scholar 

  14. 14.

    Huang R, Oba S, Shivakumara P, Uchida S (2012) Scene character detection and recognition based on multiple hypotheses framework. In Proceedings of International Conference on Pattern Recognition, pp. 717-720

  15. 15.

    Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In Proceedings of International Conference on Computer Vision, pp. 1241-1248

  16. 16.

    Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In Proceedings of European Conference on Computer Vision, pp. 512-528

  17. 17.

    Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20

    MathSciNet  Article  Google Scholar 

  18. 18.

    Jain A, Peng X, Zhuang X, Natarajan P, Cao H (2014) Text detection and recognition in natural scenes and consumer videos. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 1245–1249

  19. 19.

    Jetley S, Behlhe S, Koppula VK, Nagi A (2012) Two-stage hybrid binarization around fringe map based text line segmentation for document images. In Proceedings of International Conference on Pattern Recognition, pp. 343-346

  20. 20.

    Karatzas D, Shafait F, Uchida S, Iwamura M, Gomez L, Robles S, Mas J, Fernandez D, Almazan J, de lasHeras, LP (2013) ICDAR 2013 robust reading competition. In Proceedings of International Conference on Document Analysis and Recognition, pp. 1484–1493

  21. 21.

    Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanow A, Iwamura M, Matas J, Neumann L, Chandrsekhar VR (2015) ICDAR 2015 competition on robust reading. In Proceedings of International Conference on Document Analysis and Recognition, pp. 1156–1160

  22. 22.

    Khare V, Shivakumara P, Raveendran P, Blumenstein M (2016) A blind deconvolution model for scene text detection and recognition in video. Pattern Recogn 54:128–148

    Article  Google Scholar 

  23. 23.

    Leung T, Malik J (2001) Representing and recognizing the visual appearance of materials using three-dimensional textons. Int J Comput Vis 43(1):29–44

    Article  MATH  Google Scholar 

  24. 24.

    Liu C, Wechsler H (2002) Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans Image Process 11(4):467–476

    Article  Google Scholar 

  25. 25.

    Liu L, Li W, Tang S, Gong W (2012) A novel separating strategy for face hallucination. In Proceedings of International Conference on Image Processing, pp. 1849-1852

  26. 26.

    Liu L, Wiliem A, Chen S, Lovell BC (2014) Automatic image attribute selection for zero-shot learning of object categories. In Proceedings of International Conference on Pattern Recognition, pp. 2619-2624

  27. 27.

    Liu M, Zhang D, Chen S (2014) Attribute relation learning for zero-shot classification. Neurocomputing 139:34–46

    Article  Google Scholar 

  28. 28.

    Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competitions. In Proceedings of International Conference on Document Analysis and Recognition, pp. 682–687

  29. 29.

    Lucas S, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competitions. In Proceedings of International Conference on Document Analysis and Recognition, pp. 682–687

  30. 30.

    Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693

    Article  MATH  Google Scholar 

  31. 31.

    Mishra A, Alahari K, Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition. In Proceedings of. Computer Vision and Pattern Recognition, pp. 2687–2694

  32. 32.

    Mittal A, Roy PP, Singh P, Balasubramanian R (2017) Rotation and script independent text detection from video frames using sub pixel mapping. J Vis Commun Image Represent 46:187–198

    Article  Google Scholar 

  33. 33.

    Neuman L, Matas J (2010) A method for text localization and recognition in real world images. In Proceedings of Asian Conference on Computer Vision, pp. 770-783

  34. 34.

    Neumann L, Matas J (2012) Real-time scene text localization and recognition. In Proceedings of Computer Vision and Pattern Recognition, pp. 3538-3545

  35. 35.

    Nguyen P, Wang K, Belongie S (2014) Video text detection and recognition: dataset and benchmark. In Proceedings of Winter Conference on Applications of Computer Vision, pp. 776–783

  36. 36.

    Novikova MT, Barinova O, Kohli P, Lempitsky V (2012) Large-lexicon attribute-consistent text recognition in natural images. In Proceedings of European Conference on Computer Vision, pp. 752–765

  37. 37.

    Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987

    Article  MATH  Google Scholar 

  38. 38.

    Ojansivu V, Heikkila J (2008) Blur insensitive texture classification using local phase quantization. International conference on image and signal processing, pp. 236–243

  39. 39.

    Pal U, Roy PP, Tripathy N, Lladós J (2010) Multi-oriented Bangla and Devanagari text recognition. Pattern Recogn 43:4124–4136

    Article  MATH  Google Scholar 

  40. 40.

    Phan TQ, Shivakumara P, Tian S, Tan CL (2013) Recognizing text with perspective distortion in natural scenes. In Proceedings of International Conference on Computer Vision, pp. 569-576

  41. 41.

    Roy S, Shivakumara P, Roy PP, Tan CL (2012) Wavelet-gradient-fusion for video text binarization. In Proceedings of International Conference on Pattern Recognition, pp. 3300-3303

  42. 42.

    Roy PP, Pal U, Lladós J, Delalandre M (2012) Multi-oriented touching text character segmentation in graphical documents using dynamic programming. Pattern Recogn 45:1972–1983

    Article  Google Scholar 

  43. 43.

    Roy PP, Pal U, Lladós J, Delalandre M (2012) Multi-oriented touching text character segmentation in graphical documents using dynamic programming. Pattern Recognit 45(5):1972–1983

    Article  Google Scholar 

  44. 44.

    Roy S, Roy PP, Shivakumara P, Louloudis G, Tan CL, Pal U (2013) HMM-based multi oriented text recognition in natural scene image. In Proceedings of Asian Conference on Pattern Recognition, pp. 288–292

  45. 45.

    Roy S, Shivakumara P, Roy PP, Pal U, Tan CL, Lu T (2015) Bayesian classifier for multi-oriented video text recognition system. Expert Systems with Applications 42(13):5554–5566

    Article  Google Scholar 

  46. 46.

    Roy S, Shivakumara P, Jalab HA, Ibrahim RW, Pal U, Lu T (2016) Fractional Poisson enhancement model for text detection and recognition in video frames. Pattern Recogn 52:433–447

    Article  Google Scholar 

  47. 47.

    Saidane Z, Garcia C (2007) Robust binarization for video text recognition. In Proceedings of International Conference on Document Analysis and Recognition, pp. 874-879

  48. 48.

    Saidane Z, Gracia C (2007) Automatic scene text recognition using a convolutional neural network. In Proceedings of Camera-Based Document Analysis and Recognition, pp. 100-107

  49. 49.

    Shivakumara P, Raghavendra R, Qin L, Raja KB, Lu T, Pal U (2017) A new multi-modal approach to bib number/text detection and recognition in Marathon images. Pattern Recogn 61:479–491

    Article  Google Scholar 

  50. 50.

    Tesseract. http://code.google.com/p/tesseract-ocr/

  51. 51.

    Tu Z, Chen X, Yuille AL, Zhu SC (2005) Image parsing: unifying segmentation, detection and recognition. Int J Comput Vis 61:113–140

    Article  Google Scholar 

  52. 52.

    Wang K, Belongie S (2010) Word spotting in the wild. In Proceedings of European Conference on Computer Vision, pp. 591–604

  53. 53.

    Wang K, Babenko B, Belongie S (2011) End to end scene text recognition. In Proceedings of International Conference on Computer Vision, pp. 1457-1464

  54. 54.

    Wu Y, Shivakumara P, Lu T, Tan CL, Blumenstein M, Kumar GH (2016) Contour restoration of text components for recognition in video/scene images. IEEE Trans Image Process 25(12):5622–5634

    MathSciNet  Article  Google Scholar 

  55. 55.

    Xin L, Guo Y (2013) Active learning with multi-label SVM classification. In Proceedings of International Joint Conference on Artificial Intelligence

  56. 56.

    Yang H, Quehl B, Sack H (2014) A framework for improved video text detection and recognition. Multimedia Tools and Applications 69(1):217–245

    Article  Google Scholar 

  57. 57.

    Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In Proceedings of Computer Vision and Pattern Recognition, pp. 1083-1090

  58. 58.

    Yao C, Bai X, Shi B, Liu W (2014) Strokelets: a learned multi-scale representation for scene text recognition. In Proceedings of Computer Vision and Pattern Recognition, pp. 4042-4049

  59. 59.

    Ye Q, Doermann DS (2015) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500

    Article  Google Scholar 

  60. 60.

    Zhang J, Liang J, Zhao H (2013) Local energy pattern for texture classification using self-adaptive quantization thresholds. IEEE Trans Image Process 22(1):31–42

    MathSciNet  Article  MATH  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Partha Pratim Roy.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bhunia, A.K., Kumar, G., Roy, P.P. et al. Text recognition in scene image and video frame using Color Channel selection. Multimed Tools Appl 77, 8551–8578 (2018). https://doi.org/10.1007/s11042-017-4750-6

Download citation

Keywords

  • Scene text recognition
  • Color channel selection
  • Hidden Markov model
  • Multi script recognition