Skip to main content
Log in

Exploiting colour information for better scene text detection and recognition

  • Special Issue Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

This paper presents an approach for text detection and recognition in scene images. The main contribution of this paper is to demonstrate that the colour information within the images if efficiently exploited is good enough to identify text regions from the surrounding noise. In the same way, the colour information present in character and word images can be used to achieve significant performance improvement in the recognition of characters and words. The proposed pipeline makes use of the colour information and low-level image processing operations to enhance text information that improves the overall performance of text detection and recognition in the wild. The proposed method offers two main advantages. First, it enhances the text regions up to a level of clarity where a simple off-the-shelf feature representation and classification method achieves state-of-the-art recognition performance. Second, the proposed framework is computationally fast as compared to other text detection and recognition techniques that offer good accuracy at the cost of significantly high latency. We performed extensive experimentation to evaluate our method on challenging benchmark datasets (Chars74K, ICDAR03, ICDAR11 and SVT), and the results show a considerable performance improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Lienhart, R., Effelsberg, W.: Automatic text segmentation and text recognition for video indexing. J. Multimed. Syst. 8, 69–81 (1998)

    Article  Google Scholar 

  2. Fraz, M., Zafar, I., Tzanidou, G., Edirisinghe, E.A., Sarfraz, M.S.: Human object annotation for surveillance video forensics. J. Electron. Imaging 22(4), 041115 (2013)

    Article  Google Scholar 

  3. Sarfraz, M.S., Shahzad, A., Elahi, Muhammad A., Fraz, M., Zafar, I., Edirisinghe, E.A.: Real-time automatic license plate recognition for CCTV forensic applications. J. Real-Time Image Process. 8(3), 285–295 (2013)

    Article  Google Scholar 

  4. Dumitras, T.: Eye of the Beholder: Phone-based text-recognition for the visually-impaired. In: 10th IEEE International Symposium on Wearable Computers (2006)

  5. Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptor. In: ICCV (2013)

  6. Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: ICCV (2013)

  7. Ezaki, N., Bulacu, M., Schomaker, L.: Text detection from natural scene images: towards a system for visually impaired persons. Pattern Recognit. 2, 683–686 (2004)

    Google Scholar 

  8. Lucas, S.M.: Text locating competition results. In: ICDAR (2005)

  9. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR (2010)

  10. Shivakumara, P., Phan, T.Q., Tan, C.L.: A Laplacian approach to multi-oriented text detection in video. PAMI 33(2), 412–419 (2011)

  11. Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: ACCV (2010)

  12. Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzesczuk, R., Girod, B.: Robust text detection in natural scene images with edge enhanced maximally stable extremal regions. In: ICIP (2011)

  13. de Campos, T., Babu, B., Varma, M.: Character recognition in natural images. In: VISAPP (2009)

  14. Sosa, L.P., Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR2003 robust reading competition. In: ICDAR (2003)

  15. Wang, K., Babenko, B., Belongie, S.: End to end scene text recognition. In: ICCV (2011)

  16. Jain, A.K., Zhong, Y.: Page segmentation using texture analysis. Pattern Recognit. 29(5), 743–770 (1996)

    Article  Google Scholar 

  17. Zhong, Y., Zhang, H., Jain, A.K.: Automatic caption localization in compressed video. PAMI 22(4), 385–392 (2000)

    Article  Google Scholar 

  18. Wu, V., Manmatha, R., Riseman, E.R.: Textfinder: an automatic system to detect and recognize text in images. PAMI 21(11), 1224–1229 (1999)

    Article  Google Scholar 

  19. Wu, V., Manmatha, R., Riseman, E.R.: Finding text in images. In: ACM Conference on Digital Libraries (1997)

  20. Sin, B., Kim, S., Cho, B.: Locating characters in scene images using frequency features. In: ICPR (2002)

  21. Mao, W., Chung, F., Lanm, K., Siu, W.: Hybrid Chinese/English text detection in images and video frames. In: ICPR (2002)

  22. Lim, Y.K., Choi, S.H., Lee, S.W.: Text extraction in MPEG compressed video for content-based indexing. In: ICPR, pp. 409412 (2000)

  23. Lee, C.W., Jung, K., Kim, H.J.: Automatic text detection and removal in video sequences. Pattern Recognit. Lett. 24(15), 2607–2623 (2003)

  24. Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: CVPR (2004)

  25. Ye, Q., Huang, Q., Gao, W., Zhao, D.: Fast and robust text detection in images and video frames. Image Vis. Comput. 23(6), 565–576 (2005)

    Article  Google Scholar 

  26. Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: CVPR (2012)

  27. Mosleh, A., Bouguila, N., Hamza, A.B.: Image text detection using a bandlet-based edge detector and stroke width transform. In: BMVC (2012)

  28. Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR (2012)

  29. Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: Scene text recognition using part-based tree-structured character detection. In: CVPR (2013)

  30. Mishra, A., Alahari, K., Jawahar, C.V.: Scene text recognition using higher order language priors. In: BMVC (2012)

  31. Mishra, A., Alahari, K., Jawahar, C.V.: Top-down and bottom-up cues for scene text recognition. In: CVPR (2012)

  32. Wang, K., Belongie, S.: Word spotting in the wild. In: ECCV (2010)

  33. Dalal, N., Triggs, B.: Histogram of oriented gradients for human detection. In: CVPR (2005)

  34. Sheshadri, K., Divyala, S.K.: Exemplar driven character recognition in the wild. In: BMVC (2012)

  35. Yi, C., Yang, X., Tian, Y.: Feature representations for scene text character recognition: a comparative study. In: ICDAR (2013)

  36. Lee, C., Bharadwaj, A., Di, W., Jagadeesh, V., Piramuthu, R.: Region based discriminative pooling for scene text recognition. In: CVPR (2014)

  37. Smith, D.L., Field, J., Miller, E.L.: Enforcing similarity constraints with integer programming for better scene text recognition. In: CVPR (2011)

  38. Weinmann, J., Butler, Z., Knoll, D., Field, J.: Towards integrated scene text reading. In: PAMI (2013)

  39. Bissaco, A., Cummins, M., Netzer, Y., Neven, H.: PhotoOCR: reading text in uncontrolled Conditions. In: ICCV (2013)

  40. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)

  41. Pan, Y., Hou, X., Liu, C.: Text localization in natural scene images based on conditional random fields. In: ICDAR (2009)

  42. Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: CVPR (2014)

  43. Novikova, T., Barinoya, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: ECCV (2012)

  44. Milyaev, S., Barinova, O., Kohli, P., Lempitsky, V.: Image binarization for end-to-end text understanding in natural images. In: ICDAR (2013)

  45. Mishra, A., Alahari, K., Jawahar, C.V.: An MRF model for binarization of natural scene text. In: ICDAR (2011)

  46. Wakhara, T., Kita, K.: Binarization of color character strings in scene image using k-mean clustering and support vector machines. In: ICDAR (2011)

  47. Field, J.L., Miller, E.G.L.: Improving open-vocabulary scene text recognition. In: ICDAR (2013)

  48. Bianco, S., Ciocca, G., Cusanom, C., Schenttini, R.: Improving color constancy using indoor-outdoor image classification. J. Image Process. 17(12), 2381–2392 (2008)

  49. Buchsbaum, G.: A spatial processor model for object color perception. J. Franklin Inst. 310, 126 (1980)

  50. Heckbert, P.S.: Color image quantization for frame buffer display. Comput. Graph. 16(3), 297–307 (1982)

  51. Nvarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)

  52. Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: ICCV (1998)

  53. Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. 8(4), 280–296 (2006)

    Article  Google Scholar 

  54. Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognit. 34(2), 107–116 (2013)

    Article  Google Scholar 

  55. Shahab, A., Shafait, F., Dengel, A.: ICDAR2011 robust reading competition challenge 2: reading text in scene images. In: ICDAR (2011)

  56. Yi, C., Tian, Y.: Text extraction from scene images by character appearance and structure modelling. J. Comput. Vis. Image Underst. 117(2), 182–194 (2013)

  57. Gonzalez, A., Begasa, L., Yebes, J., Bonte, S.: Text localization in complex images. In: ICPR (2007)

  58. Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. In: IEEE Transaction on Image Processing, p. 25942605 (2011)

  59. Neumann, L., Matas, J.: Text localization in real world images using efficiently pruned exhaustive search. In: ICDAR (2011)

  60. Goel, V., Mishra, A., Alahari, K., Jawahar, C.V.: Whole is greater than sum of parts: recognizing scene text words. In: ICDAR, pp. 398402 (2013)

  61. Phan, T.Q., Shivakumara, P., Tian, S., Tan, C.L.: Recognizing text with perspective distortion in natural scenes. In: ICCV (2013)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Fraz.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fraz, M., Sarfraz, M.S. & Edirisinghe, E.A. Exploiting colour information for better scene text detection and recognition. IJDAR 18, 153–167 (2015). https://doi.org/10.1007/s10032-015-0239-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-015-0239-x

Keywords

Navigation