Skip to main content
Log in

Detecting Uyghur text in complex background images with convolutional neural network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Uyghur text detection is crucial to a variety of real-world applications, while little researches put their attention on it. In this paper, we develop an effective and efficient region-based convolutional neural network for Uyghur text detection in complex background images. The characteristics of the network include: (1) Three region proposal networks are used to improve the recall, which simultaneously utilize feature maps from different convolutional layers. (2) The overall architecture of our network is in the form of fully convolutional network, and global average pooling is applied to replace the fully connected layers in the classification and bounding box regression layers. (3) To fully utilize the baseline information, Uyghur text lines are detected directly by the network in an end-to-end fashion. Experiment results on benchmark dataset show that our method achieves an F-measure of 0.83 and detection time of 0.6 s for each image in a single K20c GPU, which is much faster than the state-of-the-art methods while keeps competitive accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://github.com/BVLC/caffe/wiki/Model-Zoo.

  2. It is an approximate joint training method due to ignoring the derivative w.r.t the coordinates of proposal, as discussed in [24].

References

  1. Ahmad AMA, Alqutami A, Atoum J (2012) A robust algorithm for arabic video text detection. In: Proceedings of the 2011 2nd international congress on computer applications and computational science. Springer, pp 261–266

  2. Bai J, Chen Z, Feng B, Xu B (2014) Chinese image text recognition on grayscale pixels. In: ICASSP. IEEE, pp 1380–1384

  3. Bai J, Chen Z, Feng B, Xu B (2014) Image character recognition using deep convolutional neural network learned from different languages. In: ICIP. IEEE, pp 2560–2564

  4. Chen J, Song Y, Xie H, Chen X, Deng H, Liu Y (2016) Robust uyghur text localization in complex background images. In: PCM, volume 9917 of lecture notes in computer science. Springer, pp 406–416

  5. Chen Z, Chen Y, Gao X, Wang S, Hu L, Yan C C, Lane N D, Miao C (2015) Unobtrusive sensing incremental social contexts using fuzzy class incremental learning. In: ICDM. IEEE Computer Society, pp 71–80

  6. Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: CVPR. IEEE Computer Society, pp 2963–2970

  7. Girshick RB (2015) Fast R-CNN. In: ICCV. IEEE Computer Society, pp 1440–1448

  8. Girshick RB, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR. IEEE Computer Society, pp 580–587

  9. Halima MB, Karray H, Alimi AM (2010) A comprehensive method for arabic video text detection, localization, extraction and recognition. In: PCM, volume 6298 of lecture notes in computer science. Springer, pp 648–659

  10. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. CoRR, arXiv:1512.03385

  11. He T, Huang W, Qiao Y, Yao J (2016) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541

    Article  MathSciNet  Google Scholar 

  12. Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In: ICCV. IEEE Computer Society, pp 1241–1248

  13. Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced MSER trees. In: ECCV, volume 8692 of lecture notes in computer science. Springer, pp 497–511

  14. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20

    Article  MathSciNet  Google Scholar 

  15. Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: ECCV, volume 8692 of lecture notes in computer science. Springer, pp 512–528

  16. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick RB, Guadarrama So, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM multimedia. ACM, pp 675–678

  17. Kang L, Li Y, Doermann DS (2014) Orientation robust text line detection in natural images. In: CVPR. IEEE Computer Society, pp 4034–4041

  18. Karatzas D, Shafait F, Uchida S, Iwamura M, Gomez i Bigorda L, Mestre SR, Mas J, Mota DF, Almazán J, de las Heras L-P (2013) ICDAR 2013 robust reading competition. In: ICDAR. IEEE Computer Society, pp 1484–1493

  19. Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400

  20. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: CVPR. IEEE Computer Society, pp 3431–3440

  21. Moradi M, Mozaffari S, Orouji AA (2010) Farsi/arabic text extraction from video images by corner detection. In: 2010 6th Iranian conference on machine vision and image processing. IEEE, pp 1–6

  22. Neumann L, Matas J (2012) Real-time scene text localization and recognition. In: CVPR. IEEE Computer Society, pp 3538–3545

  23. Neumann L, Matas J (2016) Real-time lexicon-free scene text localization and recognition. IEEE Trans Pattern Anal Mach Intell 38(9):1872–1885

    Article  Google Scholar 

  24. Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp 91–99

  25. Saudagar AKJ, Mohammed HV, Iqbal K, Gyani YJ (2015) Efficient arabic text extraction and recognition using thinning and dataset comparison technique. In: 2015 international conference on communication, information & computing technology (ICCICT). IEEE, pp 1–5

  26. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. CoRR, arXiv:1312.6229

  27. Shahab A, Shafait F, Dengel A (2011) ICDAR 2011 robust reading competition challenge 2: teading text in scene images. In: ICDAR. IEEE Computer Society, pp 1491–1496

  28. Shivakumara P, Dutta A, Tan CL, Pal U (2014) Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing. Multimed Tools Appl 72(1):515–539

    Article  Google Scholar 

  29. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR, arXiv:1409.1556

  30. Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: CVPR. IEEE Computer Society, pp 1–9

  31. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. CoRR, arXiv:1512.00567

  32. Tian S, Pan Y, Huang C, Lu S, Yu K, Tan CL (2015) Text flow: a unified text detection system in natural scene images. In: ICCV. IEEE Computer Society, pp 4651–4659

  33. Wang K, Babenko B, Belongie SJ (2011) End-to-end scene text recognition. In: ICCV. IEEE Computer Society, pp 1457–1464

  34. Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: ICPR. IEEE Computer Society, pp 3304–3308

  35. Wolf C, Jolion J-M (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. IJDAR 8(4):280–296

    Article  Google Scholar 

  36. Xie H, Gao K, Zhang Y, Li J, Ren H (2011) Common visual pattern discovery via graph matching. In: ACM multimedia. ACM, pp 1385–1388

  37. Xie H, Gao K, Zhang Y, Li J, Liu Y (2011) Pairwise weak geometric consistency for large scale image search. In: ICMR. ACM, p 42

  38. Xie H, Zhang Y, Ke G, Tang S, Kefu X, Li G, Li J (2013) Robust common visual pattern discovery using graph matching. J Vis Commun Image Represent 24(5):635–646

    Article  Google Scholar 

  39. Xu Z, Hu C, Lin M (2016) Video structured description technology based intelligence analysis of surveillance videos for public security applications. Multimedia Tools Appl 75(19):12155–12172

    Article  Google Scholar 

  40. Xu Z, Lin M, Hu C, Liu Y (2016) The big data analytics and applications of the surveillance system using video structured description technology. Clust Comput 19(3):1283–1292

    Article  Google Scholar 

  41. Xu Z, Mei L, Liu Y, Hu C, Chen L (2016) Semantic enhanced cloud environment for surveillance data management using video structural description. Computing 98(1–2):35–54

    Article  MathSciNet  MATH  Google Scholar 

  42. Yan J, Zhu M, Liu H, Liu Y (2010) Visual saliency detection via sparsity pursuit. IEEE Signal Process Lett 17(8):739–742

    Article  Google Scholar 

  43. Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: CVPR. IEEE Computer Society, pp 1083–1090

  44. Ye Q, Doermann DS (2015) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500

    Article  Google Scholar 

  45. Yin X-C, Pei W-Y, Zhang J, Hao H-W (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37 (9):1930–1937

    Article  Google Scholar 

  46. Yin X-C, Yin X, Huang K, Hao H-W (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983

    Article  Google Scholar 

  47. Yousfi S, Berrani S-A, Garcia C (2015) ALIF: a dataset for arabic embedded text recognition in TV broadcast. In: ICDAR. IEEE Computer Society, pp 1221–1225

  48. Yuan J, Wei B, Liu Y, Zhang Y, Wang L (2015) A method for text line detection in natural images. Multimed Tools Appl 74(3):859–884

    Article  Google Scholar 

  49. Zayene O, Hennebert J, Touj SM, Ingold R, Amara NEB (2015) A dataset for arabic text detection, tracking and recognition in news videos- activ. In: ICDAR. IEEE Computer Society, pp 996–1000

  50. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: ECCV, volume 8689 of lecture notes in computer science. Springer, pp 818–833

  51. Zhang Z, Shen W, Yao C, Bai X (2015) Symmetry-based text line detection in natural scenes. In: CVPR. IEEE Computer Society, pp 2558–2567

  52. Zhang C, Yan J, Li C, Rui X, Liu L, Bie R (2016) On estimating air pollution from photos using convolutional neural network. In: ACM Multimedia. ACM, pp 297–301

  53. Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. CoRR, arXiv:1604.04018

Download references

Acknowledgments

This work is supported by the National Nature Science Foundation of China (61303171,61303175), the “trategic Priority Research Program” of the Chinese Academy of Sciences (XDA06031000).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongtao Xie.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, S., Xie, H., Chen, Z. et al. Detecting Uyghur text in complex background images with convolutional neural network. Multimed Tools Appl 76, 15083–15103 (2017). https://doi.org/10.1007/s11042-017-4538-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4538-8

Keywords

Navigation