Detecting Uyghur text in complex background images with convolutional neural network

Fang, Shancheng; Xie, Hongtao; Chen, Zhineng; Zhu, Shiai; Gu, Xiaoyan; Gao, Xingyu

doi:10.1007/s11042-017-4538-8

Detecting Uyghur text in complex background images with convolutional neural network

Published: 09 March 2017

Volume 76, pages 15083–15103, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Shancheng Fang^1,2,
Hongtao Xie^1,2,
Zhineng Chen³,
Shiai Zhu⁴,
Xiaoyan Gu^1,2 &
…
Xingyu Gao⁵

593 Accesses
17 Citations
3 Altmetric
Explore all metrics

Abstract

Uyghur text detection is crucial to a variety of real-world applications, while little researches put their attention on it. In this paper, we develop an effective and efficient region-based convolutional neural network for Uyghur text detection in complex background images. The characteristics of the network include: (1) Three region proposal networks are used to improve the recall, which simultaneously utilize feature maps from different convolutional layers. (2) The overall architecture of our network is in the form of fully convolutional network, and global average pooling is applied to replace the fully connected layers in the classification and bounding box regression layers. (3) To fully utilize the baseline information, Uyghur text lines are detected directly by the network in an end-to-end fashion. Experiment results on benchmark dataset show that our method achieves an F-measure of 0.83 and detection time of 0.6 s for each image in a single K20c GPU, which is much faster than the state-of-the-art methods while keeps competitive accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Unified Deep Neural Network for Scene Text Detection

Scene text detection with fully convolutional neural networks

Article 21 January 2019

A Fast Method for Scene Text Detection

Notes

https://github.com/BVLC/caffe/wiki/Model-Zoo.
It is an approximate joint training method due to ignoring the derivative w.r.t the coordinates of proposal, as discussed in [24].

References

Ahmad AMA, Alqutami A, Atoum J (2012) A robust algorithm for arabic video text detection. In: Proceedings of the 2011 2nd international congress on computer applications and computational science. Springer, pp 261–266
Bai J, Chen Z, Feng B, Xu B (2014) Chinese image text recognition on grayscale pixels. In: ICASSP. IEEE, pp 1380–1384
Bai J, Chen Z, Feng B, Xu B (2014) Image character recognition using deep convolutional neural network learned from different languages. In: ICIP. IEEE, pp 2560–2564
Chen J, Song Y, Xie H, Chen X, Deng H, Liu Y (2016) Robust uyghur text localization in complex background images. In: PCM, volume 9917 of lecture notes in computer science. Springer, pp 406–416
Chen Z, Chen Y, Gao X, Wang S, Hu L, Yan C C, Lane N D, Miao C (2015) Unobtrusive sensing incremental social contexts using fuzzy class incremental learning. In: ICDM. IEEE Computer Society, pp 71–80
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: CVPR. IEEE Computer Society, pp 2963–2970
Girshick RB (2015) Fast R-CNN. In: ICCV. IEEE Computer Society, pp 1440–1448
Girshick RB, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR. IEEE Computer Society, pp 580–587
Halima MB, Karray H, Alimi AM (2010) A comprehensive method for arabic video text detection, localization, extraction and recognition. In: PCM, volume 6298 of lecture notes in computer science. Springer, pp 648–659
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. CoRR, arXiv:1512.03385
He T, Huang W, Qiao Y, Yao J (2016) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541
Article MathSciNet Google Scholar
Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In: ICCV. IEEE Computer Society, pp 1241–1248
Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced MSER trees. In: ECCV, volume 8692 of lecture notes in computer science. Springer, pp 497–511
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20
Article MathSciNet Google Scholar
Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: ECCV, volume 8692 of lecture notes in computer science. Springer, pp 512–528
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick RB, Guadarrama So, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM multimedia. ACM, pp 675–678
Kang L, Li Y, Doermann DS (2014) Orientation robust text line detection in natural images. In: CVPR. IEEE Computer Society, pp 4034–4041
Karatzas D, Shafait F, Uchida S, Iwamura M, Gomez i Bigorda L, Mestre SR, Mas J, Mota DF, Almazán J, de las Heras L-P (2013) ICDAR 2013 robust reading competition. In: ICDAR. IEEE Computer Society, pp 1484–1493
Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: CVPR. IEEE Computer Society, pp 3431–3440
Moradi M, Mozaffari S, Orouji AA (2010) Farsi/arabic text extraction from video images by corner detection. In: 2010 6th Iranian conference on machine vision and image processing. IEEE, pp 1–6
Neumann L, Matas J (2012) Real-time scene text localization and recognition. In: CVPR. IEEE Computer Society, pp 3538–3545
Neumann L, Matas J (2016) Real-time lexicon-free scene text localization and recognition. IEEE Trans Pattern Anal Mach Intell 38(9):1872–1885
Article Google Scholar
Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp 91–99
Saudagar AKJ, Mohammed HV, Iqbal K, Gyani YJ (2015) Efficient arabic text extraction and recognition using thinning and dataset comparison technique. In: 2015 international conference on communication, information & computing technology (ICCICT). IEEE, pp 1–5
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. CoRR, arXiv:1312.6229
Shahab A, Shafait F, Dengel A (2011) ICDAR 2011 robust reading competition challenge 2: teading text in scene images. In: ICDAR. IEEE Computer Society, pp 1491–1496
Shivakumara P, Dutta A, Tan CL, Pal U (2014) Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing. Multimed Tools Appl 72(1):515–539
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR, arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: CVPR. IEEE Computer Society, pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. CoRR, arXiv:1512.00567
Tian S, Pan Y, Huang C, Lu S, Yu K, Tan CL (2015) Text flow: a unified text detection system in natural scene images. In: ICCV. IEEE Computer Society, pp 4651–4659
Wang K, Babenko B, Belongie SJ (2011) End-to-end scene text recognition. In: ICCV. IEEE Computer Society, pp 1457–1464
Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: ICPR. IEEE Computer Society, pp 3304–3308
Wolf C, Jolion J-M (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. IJDAR 8(4):280–296
Article Google Scholar
Xie H, Gao K, Zhang Y, Li J, Ren H (2011) Common visual pattern discovery via graph matching. In: ACM multimedia. ACM, pp 1385–1388
Xie H, Gao K, Zhang Y, Li J, Liu Y (2011) Pairwise weak geometric consistency for large scale image search. In: ICMR. ACM, p 42
Xie H, Zhang Y, Ke G, Tang S, Kefu X, Li G, Li J (2013) Robust common visual pattern discovery using graph matching. J Vis Commun Image Represent 24(5):635–646
Article Google Scholar
Xu Z, Hu C, Lin M (2016) Video structured description technology based intelligence analysis of surveillance videos for public security applications. Multimedia Tools Appl 75(19):12155–12172
Article Google Scholar
Xu Z, Lin M, Hu C, Liu Y (2016) The big data analytics and applications of the surveillance system using video structured description technology. Clust Comput 19(3):1283–1292
Article Google Scholar
Xu Z, Mei L, Liu Y, Hu C, Chen L (2016) Semantic enhanced cloud environment for surveillance data management using video structural description. Computing 98(1–2):35–54
Article MathSciNet MATH Google Scholar
Yan J, Zhu M, Liu H, Liu Y (2010) Visual saliency detection via sparsity pursuit. IEEE Signal Process Lett 17(8):739–742
Article Google Scholar
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: CVPR. IEEE Computer Society, pp 1083–1090
Ye Q, Doermann DS (2015) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500
Article Google Scholar
Yin X-C, Pei W-Y, Zhang J, Hao H-W (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37 (9):1930–1937
Article Google Scholar
Yin X-C, Yin X, Huang K, Hao H-W (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983
Article Google Scholar
Yousfi S, Berrani S-A, Garcia C (2015) ALIF: a dataset for arabic embedded text recognition in TV broadcast. In: ICDAR. IEEE Computer Society, pp 1221–1225
Yuan J, Wei B, Liu Y, Zhang Y, Wang L (2015) A method for text line detection in natural images. Multimed Tools Appl 74(3):859–884
Article Google Scholar
Zayene O, Hennebert J, Touj SM, Ingold R, Amara NEB (2015) A dataset for arabic text detection, tracking and recognition in news videos- activ. In: ICDAR. IEEE Computer Society, pp 996–1000
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: ECCV, volume 8689 of lecture notes in computer science. Springer, pp 818–833
Zhang Z, Shen W, Yao C, Bai X (2015) Symmetry-based text line detection in natural scenes. In: CVPR. IEEE Computer Society, pp 2558–2567
Zhang C, Yan J, Li C, Rui X, Liu L, Bie R (2016) On estimating air pollution from photos using convolutional neural network. In: ACM Multimedia. ACM, pp 297–301
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. CoRR, arXiv:1604.04018

Download references

Acknowledgments

This work is supported by the National Nature Science Foundation of China (61303171,61303175), the “trategic Priority Research Program” of the Chinese Academy of Sciences (XDA06031000).

Author information

Authors and Affiliations

National Engineering Laboratory for Information Security Technologies, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Shancheng Fang, Hongtao Xie & Xiaoyan Gu
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Shancheng Fang, Hongtao Xie & Xiaoyan Gu
Interactive Digital Media Technology Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhineng Chen
University of Ottawa, Ottawa, Canada
Shiai Zhu
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing, China
Xingyu Gao

Authors

Shancheng Fang
View author publications
You can also search for this author in PubMed Google Scholar
Hongtao Xie
View author publications
You can also search for this author in PubMed Google Scholar
Zhineng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shiai Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyan Gu
View author publications
You can also search for this author in PubMed Google Scholar
Xingyu Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongtao Xie.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fang, S., Xie, H., Chen, Z. et al. Detecting Uyghur text in complex background images with convolutional neural network. Multimed Tools Appl 76, 15083–15103 (2017). https://doi.org/10.1007/s11042-017-4538-8

Download citation

Received: 30 October 2016
Revised: 15 February 2017
Accepted: 20 February 2017
Published: 09 March 2017
Issue Date: July 2017
DOI: https://doi.org/10.1007/s11042-017-4538-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting Uyghur text in complex background images with convolutional neural network

Abstract

Access this article

Similar content being viewed by others

A Unified Deep Neural Network for Scene Text Detection

Scene text detection with fully convolutional neural networks

A Fast Method for Scene Text Detection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detecting Uyghur text in complex background images with convolutional neural network

Abstract

Access this article

Similar content being viewed by others

A Unified Deep Neural Network for Scene Text Detection

Scene text detection with fully convolutional neural networks

A Fast Method for Scene Text Detection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation