Advertisement

Text Detection with Deep Neural Network System Based on Overlapped Labels and a Hierarchical Segmentation of Feature Maps

  • Hong-Hyun Kim
  • Jea-Ho Jo
  • Zhu Teng
  • Dong-Joong KangEmail author
Article
  • 11 Downloads

Abstract

This paper proposes a three-level framework to detect texts in a single image. First, a salient feature map of text is extracted using a Fully Convolutional Network (FCN) that achieves good performance in semantic segmentation. Label combination using both boxes of word and characters level is proposed to improve the detection of uneven boundaries of text regions. Second, in the feature map of FCN, the text region has a higher probability value than the background region, and the coordinates in the character area are very close to each other. We segment the text area and the background area by using the characteristics of text feature map with Hierarchical Cluster Analysis (HCA). Finally, we applied a Convolutional Neural Networks (CNN) to classify the candidate text area into text and non-text. In this paper, we used CNN which can classify 4 classes in total by separating the background area and three text classes (one character, two characters, three characters or more). The text detection framework proposed in this paper have shown good performance with ICDAR 2015, and high performance especially in Recall criterion, finding more texts than other algorithms.

Keywords

Deep neural netwrok detection framework text detection text localization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.Google Scholar
  2. [2]
    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, pp. 1097–1105, 2012.Google Scholar
  3. [3]
    L. Rokach and O. Maimon, “Clustering methods,” Data Mining and Knowledge Discovery Handbook, pp. 321–352, Springer, Boston, MA. 2005.CrossRefGoogle Scholar
  4. [4]
    Q. Ye and D. Doermann, “Text detection and recognition in imagery: a survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 7, pp. 1480–1500, 2015.CrossRefGoogle Scholar
  5. [5]
    B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2963–2970, June 2010.Google Scholar
  6. [6]
    M. Jaderberg, A. Vedaldi, and A. Zisserman, “Deep features for text spotting,” European Conference on Computer Vision, pp. 512–528, Springer, Cham, September 2015.Google Scholar
  7. [7]
    L. Neumann and J. Matas, “Scene text localization and recognition with oriented stroke detection,” Proceedings of the IEEE International Conference on Computer Vision, pp. 97–104, 2013.Google Scholar
  8. [8]
    J. J. Lee, P. H. Lee, S. W. Lee, A. Yuille, and C. Koch, “Adaboost for text detection in natural scene,” Proc. of International Conference on Document Analysis and Recognition (ICDAR), pp. 429–434, September 2011.Google Scholar
  9. [9]
    X. Chen and A. L. Yuille, “Detecting and reading text in natural scenes,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, 2004.Google Scholar
  10. [10]
    R. Lienhart and A. Wernicke, “Localizing and segmenting text in images and videos,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 12, no. 4, pp. 256–268, 2002.CrossRefGoogle Scholar
  11. [11]
    J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust widebaseline stereo from maximally stable extremal regions,” Image and Vision Computing, vol. 22, no. 10, pp. 761–767, 2004.CrossRefGoogle Scholar
  12. [12]
    L. Neumann and J. Matas, “Real-time scene text localization and recognition,” Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3538–3545, June 2012.Google Scholar
  13. [13]
    Y. F. Pan, X. Hou, and C. L. Liu, “Text localization in natural scene images based on conditional random field,” Proc. of 10th International Conference on Document Analysis and Recognition (ICDAR'09), pp. 6–10, July 2009.Google Scholar
  14. [14]
    L. Neumann and J. Matas, “A method for text localization and recognition in real-world images,” Proc. of Asian Conference on Computer Vision, pp. 770–783, Springer, Berlin, Heidelberg, November 2010.Google Scholar
  15. [15]
    L. Neumann and J. Matas, “Text localization in real-world images using efficiently pruned exhaustive search,” Proc. of International Conference on Document Analysis and Recognition (ICDAR 2011), pp. 687–691, September 2011.CrossRefGoogle Scholar
  16. [16]
    W. Huang, Y. Qiao, and X. Tang, “Robust scene text detection with convolution neural network induced mser trees,” Proc. of European Conference on Computer Vision, pp. 497–511, September 2014.Google Scholar
  17. [17]
    Y. Zheng, Q. Li, J. Liu, H. Liu, G. Li, and S. Zhang, “A cascaded method for text detection in natural scene images,” Neurocomputing, vol. 238, pp. 307–315, May 2017.CrossRefGoogle Scholar
  18. [18]
    J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, and X. Xue, “Arbitrary-oriented scene text detection via rotation proposals,” IEEE Transactions on Multimedia, vol. 20, no. 11, pp. 3111–3122, Nov. 2018.CrossRefGoogle Scholar
  19. [19]
    K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint, arXiv:1409.1556, 2014.Google Scholar
  20. [20]
    C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, and A. Rabinovich, “Going deeper with convolutions,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9, 2015.Google Scholar
  21. [21]
    H. H. Kim, J. K. Park, J. H. Oh, and D. J. Kang, “Multitask convolutional neural network system for license plate recognition,” International Journal of Control, Automation and Systems, vol. 15, no. 6, pp. 2942–2949, 2017.CrossRefGoogle Scholar
  22. [22]
    J. K. Park and D. J. Kang, “Unified convolutional neural network for direct facial keypoints detection,” The Visual Computer, pp. 1–12, 2018. DOI: 10.1007/s00371-018-1561-3Google Scholar
  23. [23]
    Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai, “Multi-oriented text detection with fully convolutional networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4159–4167, 2016.Google Scholar
  24. [24]
    K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint, arXiv:1409.1556, 2014.Google Scholar
  25. [25]
    D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, and F. Shafait, “ICDAR 2015 competition on robust reading,” Proc. of 13th International Conference on on Document Analysis and Recognition (ICDAR), IEEE, pp. 1156–1160, August 2015.Google Scholar
  26. [26]
    A. Gupta, A. Vedaldi, and A. Zisserman, “Synthetic data for text localisation in natural images,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324, 2016.Google Scholar
  27. [27]
    P. D. Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, “A tutorial on the cross-entropy method,” Annals of Operations Research, vol. 134, no. 1, pp. 19–67, 2005.MathSciNetCrossRefzbMATHGoogle Scholar
  28. [28]
    C. Yao, X. Bai, N. Sang, X. Zhou, S. Zhou, and Z. Cao, “Scene text detection via holistic, multi-channel prediction,” arXiv preprint arXiv:1606.09002., 2016.Google Scholar
  29. [29]
    Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, “Detecting text in natural image with connectionist text proposal network,” Proc. of European Conference on Computer Vision, pp. 56–72. Springer, Cham. October 2016.Google Scholar
  30. [30]
    X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang, “EAST: an efficient and accurate scene text detector,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2642–2651, July 2017.Google Scholar

Copyright information

© ICROS, KIEE and Springer 2019

Authors and Affiliations

  • Hong-Hyun Kim
    • 1
  • Jea-Ho Jo
    • 1
  • Zhu Teng
    • 2
  • Dong-Joong Kang
    • 1
    Email author
  1. 1.School of Mechanical EngineeringPusan National UniversityBusanKorea
  2. 2.School of Computer and Information TechnologyBeijing Jiaotong UniversityBeijingP. R. China

Personalised recommendations