Skip to main content

K-NN Based Text Segmentation from Digital Images Using a New Binarization Scheme

  • Conference paper
  • First Online:
Computational Intelligence, Communications, and Business Analytics (CICBA 2017)

Abstract

Text segmentation in digital images is requisite for many image analysis and interpretation tasks. In this article, we have proposed an effective binarization method towards text segmentation in digital images. This method produces a number of connected components consisting of text as well as non-text. Next, it is required to identify the possible text components from the obtained connected components. Further, to distinguish between text and non-text components, a set of features are identified. Then, during training, we consider the two feature files namely text and non-text prepared by us. Here, K-Nearest Neighbour (K-NN) classifier is considered for the present two class classification problem. The experiments are based on ICDAR 2011 Born Digital Dataset. We have accomplished in binarization and as well as segmenting between text and non-text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Chen, X., Yuille, A.: Detecting and reading text in natural scenes. In: Proceedings of the IEEE Conference on CVPR, Washington, DC, USA, vol. 2, pp. 366–373 (2004)

    Google Scholar 

  2. Kim, K., Jung, K., Kim, J.: Texture-base approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 25(12), 1631–1639 (2003)

    Article  Google Scholar 

  3. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of the IEEE Conference on CVPR, San Francisco, CA, USA, pp. 2963–2970 (2010)

    Google Scholar 

  4. Mancas-Thillou, C., Gosselin, B.: Color text extraction with selective metric-based clustering. Comput. Vis. Image Underst. 107(1–2), 97–107 (2007)

    Article  Google Scholar 

  5. Yi, C., Tian, Y.: Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification. IEEE Trans. Image Process. 21(9), 4256–4268 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  6. Pan, Y.-F., Hou, X., Liu, C.-L.: A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image Process. 20(3), 800–813 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  7. Chen et al., H.: Robust text detection in natural images with edge enhanced maximally stable extremal regions. In: Proceedings of the IEEE International Conference on Image Processing, pp. 2609–2612 (2011)

    Google Scholar 

  8. Merino-Gracia, C., Lenc, K., Mirmehdi, M.: A head-mounted device for recognizing text in natural scenes. In: Proceedings of the International Workshop CBDAR, Beijing, China, pp. 29–41 (2011)

    Google Scholar 

  9. Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Proceedings of the IEEE Conference on CVPR, Providence, RI, USA, pp. 3538–3545 (2012)

    Google Scholar 

  10. Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognit. Lett. 34(2), 107–116 (2013)

    Article  Google Scholar 

  11. Zeng, C., Jia, W., He, X.: Text detection in born-digital images using multiple layer images. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 26–31 May 2013

    Google Scholar 

  12. Xu, J., Shivakumara, P., Lu, T.: Text detection in born-digital images by mass estimation. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), 3–6 November 2015

    Google Scholar 

  13. Bhattacharya, U., Parui, S.K., Mondal, S.: Devanagari and Bangla text extraction from natural scene images. In: Proceedings of the 10th International Conference on Document Analysis and Recognition (2010)

    Google Scholar 

  14. Kumar, D., Ramakrishnan, A.G.: OTCYMIST: Otsu-Canny minimal spanning tree for born-digital images. In: 2012 10th IAPR International Workshop on Document Analysis Systems (DAS), 27–29 March 2012

    Google Scholar 

  15. Karatzas, D., Mestre, S.R., Mas, J., Nourbakhsh, F., Roy, P.P.: ICDAR 2011 robust reading competition challenge 1: reading text in born-digital images. In: Proceedings of the ICDAR, pp. 1485–1490 (2011)

    Google Scholar 

  16. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 377–393 (1979)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ranjit Ghoshal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Ghoshal, R., Das, S., Saha, A. (2017). K-NN Based Text Segmentation from Digital Images Using a New Binarization Scheme. In: Mandal, J., Dutta, P., Mukhopadhyay, S. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2017. Communications in Computer and Information Science, vol 776. Springer, Singapore. https://doi.org/10.1007/978-981-10-6430-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6430-2_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6429-6

  • Online ISBN: 978-981-10-6430-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics