Skip to main content

A learning-based method to detect and segment text from scene images


This paper proposes a learning-based method for text detection and text segmentation in natural scene images. First, the input image is decomposed into multiple connected-components (CCs) by Niblack clustering algorithm. Then all the CCs including text CCs and non-text CCs are verified on their text features by a 2-stage classification module, where most non-text CCs are discarded by an attentional cascade classifier and remaining CCs are further verified by an SVM. All the accepted CCs are output to result in text only binary image. Experiments with many images in different scenes showed satisfactory performance of our proposed method.

This is a preview of subscription content, access via your institution.


  1. Chen, D., Shearer, K., Bourlard, H., 2001. Text Enhancement with Symmetric Alter for Video OCR. Proc. International Conference on Image Analysis and Recognition, p.192–197.

  2. Chun, B.T., Bae, Y., Kim, T.Y., 1999. Automatic Text Extraction in Digital Videos Using FFT and Neural Network. Proc. IEEE International Fuzzy Systems Conference. Seoul, Korea, 2:1112–1115.

    Google Scholar 

  3. Clark, P., Mirmehdi, M., 2000. Finding Text Regions Using Localized Measures. Proc. 11th British Machine Vision Conference, p.675–684.

  4. Ekin, A., 2006. Local Information Based Overlaid Text Detection by Classifier Fusion. Proc. International Conference on Acoustics, Speech and Signal Processing, 2:753–756.

    Google Scholar 

  5. Kim, K.I., Jung, K., Kim, J.H., 2003. Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Trans. Pattern Anal. Machine Intell., 25(12):1631–1639. [doi:10.1109/TPAMI.2003.1251157]

    MathSciNet  Article  Google Scholar 

  6. Kim, K.C., Byun, H.R., Song, Y.J., Choi, Y.W., Chi, S.Y., Kim, K.K., Chung, Y.K., 2004. Scene Text Extraction in Natural Scene Images Using Hierarchical Feature Combining and Verification. Proc. International Conference on Computer Vision and Pattern Recognition, 2:679–682.

    Google Scholar 

  7. Liu, C., Wang, C., Dai, R., 2005. Text Detection in Images Based on Unsupervised Classification of Edge-based Features. Proc. International Conference on Document Analysis and Recognition.

  8. Liu, C.L., Koga, M., Fujisawa, H., 2005. Gabor Feature Extraction for Character Recognition Comparison with Gradient Feature. Proc. 8th International Conference on Document Analysis and Recognition, 1:121–125.

    Article  Google Scholar 

  9. Lyu, M.R., Song, J., Cai, M., 2005. A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans. Circuits Syst. Video Technol., 15(2):243–255. [doi:10.1109/TCSVT.2004.841653]

    Article  Google Scholar 

  10. Mao, W., Chung, F., Lanm, K., Siu, W., 2002. Hybrid Chinese/English Text Detection in Images and Video Frames. Proc. International Conference on Computer Vision and Pattern Recognition, 3:1015–1018.

    Google Scholar 

  11. Qian, X., Liu, G., 2006. Text Detection, Localization and Segmentation in Compressed Videos. Proc. International Conference on Acoustics, Speech and Signal Processing, 2:385–388.

    Google Scholar 

  12. Takahashi, H., Nakajima, M., 2005. Region Graph Based Text Extraction from Outdoor Images. Proc. 3rd International Conference on Information Technology and Applications, 1:680–685. [doi:10.1109/ICITA.2005.235]

    Article  Google Scholar 

  13. Wang, K.Q., Kangas, J.A., 2003. Character location in scene images from digital camera. Pattern Recognition, 36(10):2287–2299. [doi:10.1016/S0031-3203(03)00082-7]

    Article  MATH  Google Scholar 

  14. Weinman, J., Hanson, A., McCallum, A., 2004. Sign Detection in Natural Images with Conditional Random Fields. Proc. IEEE International Workshop on Machine Learning for Signal Processing. Brazil, p.549–558. [doi:10.1109/MLSP.2004.1423018]

  15. Winger, L., Robinson, J.A., Jernigan, M.E., 2000. Low-complexity character extraction in low-contrast scene images. IEEE Trans. Pattern Recog. Artif. Intell., 14(2):113–135. [doi:10.1142/S0218001400000106]

    Article  Google Scholar 

  16. Zhang, D.Q., Chang, F.H., 2004. Learning to Detect Scene Text Using a Higher-Order MRF with Belief Propagation. Proc. International Conference on Computer Vision and Pattern Recognition, p.101–107.

  17. Zhu, K., Qi, F., Jiang, R., Xu, L., 2005. Using Adaboost to Detect and Segment Characters from Natural Scenes. Proc. Conference on Camera Based Document Analysis and Recognition, p.52–59.

Download references

Author information



Additional information

Project supported by the OMRON and SJTU Collaborative Foundation under PVS project (2005.03–2005.10)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Jiang, Rj., Qi, Fh., Xu, L. et al. A learning-based method to detect and segment text from scene images. J. Zhejiang Univ. - Sci. A 8, 568–574 (2007).

Download citation

Key words

  • Text detection
  • Text segmentation
  • Text feature
  • Attentional cascade

CLC number

  • TP391.41