A Novel Text Localization Scheme for Camera Captured Document Images

  • Tauseef Khan
  • Ayatullah Faruk Mollah
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 703)


In this paper, a hybrid model for detecting text regions from scene images as well as document image is presented. At first, background is suppressed to isolate foreground regions. Then, morphological operations are applied on isolated foreground regions to ensure appropriate region boundary of such objects. Statistical features are extracted from these objects to classify them as text or non-text using a multi-layer perceptron. Classified text components are localized, and non-text ones are ignored. Experimenting on a data set of 227 camera captured images, it is found that the object isolation accuracy is 0.8638 and text non-text classification accuracy is 0.9648. It may be stated that for images with near homogenous background, the present method yields reasonably satisfactory accuracy for practical applications.


Text detection Feature map Background suppression Textness features Text non-text classification MLP 



The authors are thankful to the Department of Computer Science and Engineering of Aliah University for providing every support for carrying out this work. The first author is also thankful to Aliah University for providing research fellowship.


  1. 1.
    Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4159–4167, (2016).Google Scholar
  2. 2.
    Chen, X., Yuille, A. L.: Detecting and reading text in natural scenes. In. IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. II-II. (2004).Google Scholar
  3. 3.
    Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In. IEEE Conference on Computer Vision and Pattern Recognition pp. 1083–1090, (2012).Google Scholar
  4. 4.
    Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. In. IEEE Transactions on Image Processing, pp. 2594–2605, (2011).Google Scholar
  5. 5.
    Neumann, L., Matas, J.: Real-time scene text localization and recognition., In. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3538–3545. IEEE, (2012).Google Scholar
  6. 6.
    Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In. IEEE International Conference on Computer Vision, pp. 1241–1248, (2013).Google Scholar
  7. 7.
    Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2963–2970, (2010).Google Scholar
  8. 8.
    Zhao, Y., Lu, T. and Liao, W.: A robust color-independent text detection method from complex videos. In International Conference on Document Analysis and Recognition (ICDAR), (pp. 374–378). IEEE, (2011).Google Scholar
  9. 9.
    Kim, K. I., Jung, K., Kim, J. H.: Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. In. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1631–1639 (2003).Google Scholar
  10. 10.
    Taravat, A., Del Frate, F., Cornaro, C., Vergari, S.: Neural networks and support vector machine algorithms for automatic cloud classification of whole-sky ground-based images. In. IEEE Geoscience and remote sensing letters, pp. 666–670 (2015).Google Scholar
  11. 11.
    Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Ng, A. Y.: Text detection and character recognition in scene images with unsupervised feature learning. In. IEEE International Conference on Document Analysis and recognition (ICDAR), pp. 440–445, (2011).Google Scholar
  12. 12.
    Shi, Z., Setlur, S., Govindaraju, V.: A steerable directional local profile technique for extraction of handwritten arabic text lines. In. IEEE 10th International Conference on Document Analysis and Recognition (ICDAR), pp. 176–180, IEEE, (2009).Google Scholar
  13. 13.
    Pan, Y. F., Hou, X., Liu, C. L.: A hybrid approach to detect and localize texts in natural scene images. In. IEEE Transactions on Image Processing, pp. 800–813, (2011).Google Scholar
  14. 14.
    Dalal, N. and Triggs, B.: Histograms of oriented gradients for human detection. In. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (Vol. 1, pp. 886–893). IEEE, (2005).Google Scholar
  15. 15.
    Minetto, R., Thome, N., Cord, M., Leite, N.J. and Stolfi, J.: T-HOG: An effective gradient-based descriptor for single line text regions. Pattern recognition, 46(3), pp. 1078–1090, (2013).Google Scholar
  16. 16.
    Tian, S., Bhattacharya, U., Lu, S., Su, B., Wang, Q., Wei, X., Lu, Y. and Tan, C.L.: Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recognition, 51, pp. 125–134, (2016).Google Scholar
  17. 17.
    Ojala, T., Pietikäinen, M. and Harwood, D.: A comparative study of texture measures with classification based on featured distributions. Pattern recognition, 29(1), pp. 51–59, (1996).Google Scholar
  18. 18.
    Mäenpää, T. and Pietikäinen, M.: Multi-scale binary patterns for texture analysis. Image analysis, pp. 267–275, (2003).Google Scholar
  19. 19.
    Goto, H. and Tanaka, M.: Text-tracking wearable camera system for the blind. In 10th International Conference on Document Analysis and Recognition, ICDAR’09. (pp. 141–145). IEEE, (2009).Google Scholar
  20. 20.
    Ye, Q., Huang, Q., Gao, W. and Zhao, D.: Fast and robust text detection in images and video frames. Image and Vision Computing, 23(6), pp. 565–576, (2005).Google Scholar
  21. 21.
    Ye, Q. and Doermann, D.: Text detection and recognition in imagery: A survey. IEEE transactions on pattern analysis and machine intelligence, 37(7), pp. 1480–1500, (2015).Google Scholar
  22. 22.
    Liang, J., Doermann, D. and Li, H.: Camera-based analysis of text and documents: a survey. International journal on document analysis and recognition, 7(2), pp. 84–104, (2005).Google Scholar
  23. 23.
    Song, Y., Liu, A., Pang, L., Lin, S., Zhang, Y., Tang, S.: A novel image text extraction method based on k-means clustering. In. 7th IEEE/ACIS International Conference on Computer and Information Science, pp. 185–190, IEEE, (2008).Google Scholar
  24. 24.
    Lu, S., Chen, T., Tian, S., Lim, J. H., Tan, C. L.: Scene text extraction based on edges and support vector regression. In. International Journal on Document Analysis and Recognition (IJDAR), pp. 125–135, (2015).Google Scholar
  25. 25.
    Hsieh, J. W., Yu, S. H., Chen, Y. S.: Morphology-based license plate detection from complex scenes. In. 16th IEEE International Conference on Pattern Recognition, Vol. 3, pp. 176–179, (2002).Google Scholar
  26. 26.
    Mollah, A. F., Basu, S., Nasipuri, M.: Text detection from camera captured images using a novel fuzzy-based technique. In. 3rd IEEE International Conference on Emerging Applications of Information Technology (EAIT), pp. 291–294, (2012).Google Scholar
  27. 27.
    Otsu, N.: A threshold selection method from gray-level histograms. Automatica, pp. 23–27, (1979).Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringAliah UniversityKolkataIndia

Personalised recommendations