A comparative approach on detecting multi-lingual and multi-oriented text in natural scene images

Yegnaraman, Aparna; Valli, S.

doi:10.1007/s10489-020-01972-1

A comparative approach on detecting multi-lingual and multi-oriented text in natural scene images

Published: 17 November 2020

Volume 51, pages 3696–3717, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

585 Accesses
3 Citations
Explore all metrics

Abstract

Text helps to convey the intended message to users very accurately. Detecting text from natural scene images for quadrilateral-type and polygon-type datasets is the primary scope of this work. A regression-based method using modified You Only Look Once YOLOv4 network is used for quadrilateral-type datasets. Hyperparameters for training the network are optimized using the Genetic Algorithm which proves to be a suitable candidate than traditional methods. The Pixels-IoU (PIoU) loss is introduced to derive an accurate bounding box and it seems to be productive under various challenging scenarios with high aspect ratios and complex background. This yielded quick results for quadrilateral-type datasets but did not scale for arbitrarily-shaped and curved scene text. So the approach is changed to segmentation based for enhancing the results. This introduces binarization operation in a segmentation network to boost its detection accuracy for polygon-type datasets. The introduction of a new module DiffBiSeg (Differentiable Binarization in Segmentation network) facilitates post-processing and text detection performance by setting the thresholds flexibly for binarization in the segmentation network. The efficacy of both approaches is clearly seen in their respective experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Article Open access 06 February 2017

References

Baek Y, Lee B, Han D, Yun S, Lee H (2019) Character region awareness for text detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9365–9374
Bochkovskiy A, Wang C Y, Liao H Y M (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv:200410934
Busta M, Neumann L, Matas J (2017) Deep textspotter: An end-to-end trainable scene text localization and recognition framework. In: Proceedings of the IEEE international conference on computer vision, pp 2204–2212
Chen X, Yuille A L (2004) Detecting and reading text in natural scenes. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004., IEEE, vol 2, pp II–II
Chen Z, Chen K, Lin W, See J, Yu H, Ke Y, Yang C (2020) Piou loss: Towards accurate oriented object detection in complex environments. arXiv:200709584
Ch’ng C K, Chan C S (2017) Total-text: A comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, vol 1, pp 935–942
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773
Deng D, Liu H, Li X, Cai D (2018) Pixellink: Detecting scene text via instance segmentation. In: Thirty-second AAAI conference on artificial intelligence
Feng W, He W, Yin F, Zhang X Y, Liu C L (2019) Textdragon: An End-to-end framework for arbitrary shaped text spotting. In: Proceedings of the IEEE international conference on computer vision, pp 9076–9085
Ghiasi G, Lin T Y, Le Q V (2018) Dropblock: A regularization method for convolutional networks. In: Advances in neural information processing systems, pp 10,727–10,737
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
He T, Tian Z, Huang W, Shen C, Qiao Y, Sun C (2018) An end-to-end textspotter with explicit alignment and attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5020–5029
He W, Zhang X Y, Yin F, Liu C L (2017) Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE international conference on computer vision, pp 745–753
Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In: Proceedings of the IEEE international conference on computer vision, pp 1241–1248
Huang Z, Wang J (2018) Dc-spp-yolo: Dense connection and spatial pyramid pooling based yolo for object detection. arxiv 2019. arXiv:190308589
Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: European conference on computer vision, Springer, pp 512–528
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20
Article MathSciNet Google Scholar
Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda L G, Mestre S R, Mas J, Mota D F, Almazan J A, De Las Heras L P (2013) Icdar 2013 robust reading competition. In: 2013 12th international conference on document analysis and recognition, IEEE, pp 1484–1493
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar V R, Lu S et al (2015) Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 1156–1160
Kingma D P, Ba J (2014) Adam: A method for stochastic optimization. arXiv:14126980
Li H, Wang P, Shen C (2017) Towards end-to-end text spotting with convolutional recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 5238–5246
Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: A fast text detector with a single deep neural network. In: Thirty-first AAAI conference on artificial intelligence
Liao M, Shi B, Bai X (2018) Textboxes++: A single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690
Article MathSciNet Google Scholar
Liao M, Zhu Z, Shi B, Xia GS, Bai X (2018) Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5909–5918
Liao M, Wan Z, Yao C, Chen K, Bai X (2020) Real-time scene text detection with differentiable binarization. In: AAAI, pp 11,474–11,481
Lin T Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37
Liu X, Liang D, Yan S, Chen D, Qiao Y, Yan J (2018) Fots: Fast oriented text spotting with a unified network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5676–5685
Liu Y, Jin L (2017) Deep matching prior network: Toward tighter multi-oriented text detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1962–1969
Liu Z, Lin G, Yang S, Feng J, Lin W, Goh W L (2018) Learning Markov clustering networks for scene text detection. arXiv:180508365
Liu Z, Lin G, Yang S, Liu F, Lin W, Goh W L (2019) Towards robust curve text detection with conditional spatial expansion. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7269–7278
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Long S, Yao C (2020) Unrealtext: Synthesizing realistic scene text images from the unreal world. arXiv:200310608
Long S, He X, Yao C (2018) Scene text detection and recognition: The deep learning era. arXiv:181104256
Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV), pp 20–36
Loshchilov I, Hutter F (2016) Sgdr: Stochastic gradient descent with warm restarts. arXiv:160803983
Lu L, Wu D, Wu T, Huang F, Yi Y (2020) Anchor-free multi-orientation text detection in natural scene images. Appl Intell 1–15
Lyu P, Liao M, Yao C, Wu W, Bai X (2018) Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV), pp 67–83
Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7553–7563
Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122
Article Google Scholar
Misra D (2019) Mish: A self regularized non-monotonic neural activation function. arXiv:190808681
Nayef N, Patel Y, Busta M, Chowdhury P N, Karatzas D, Khlif W, Matas J, Pal U, Burie J C, Liu CL et al (2019) Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: 2019 International conference on document analysis and recognition (ICDAR), IEEE, pp 1582–1587
Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian conference on computer vision, Springer, pp 770–783
Neumann L, Matas J (2015) Real-time lexicon-free scene text localization and recognition. IEEE Trans Pattern Anal Mach Intell 38(9):1872–1885
Article Google Scholar
Qin S, Bissacco A, Raptis M, Fujii Y, Xiao Y (2019) Towards unconstrained end-to-end text spotting. In: Proceedings of the IEEE international conference on computer vision, pp 4704–4714
Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:180402767
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 658–666
Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304
Article Google Scholar
Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2550–2558
Tan M, Pang R, Le Q V (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10,781–10,790
Tian S, Pan Y, Huang C, Lu S, Yu K, Lim Tan C (2015) Text flow: A unified text detection system in natural scene images. In: Proceedings of the IEEE international conference on computer vision, pp 4651–4659
Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In: European conference on computer vision, Springer, pp 56–72
Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J (2019) Learning shape-aware embedding for scene text detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4234–4243
Vatti B R (1992) A generic solution to polygon clipping. Commun ACM 35(7):56–63
Article Google Scholar
Wang C Y, Mark Liao H Y, Wu Y H, Chen P Y, Hsieh J W, Yeh I H (2020) Cspnet: A new backbone that can enhance learning capability of cnn. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 390–391
Wang T, Wu D J, Coates A, Ng A Y (2012) End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), IEEE, pp 3304–3308
Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9336–9345
Wang X, Jiang Y, Luo Z, Liu C L, Choi H, Kim S (2019) Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6449–6458
Woo S, Park J, Lee J Y, So Kweon I (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 9038–9045
Xing L, Tian Z, Huang W, Scott M R (2019) Convolutional character networks. In: Proceedings of the IEEE international conference on computer vision, pp 9126–9136
Xu Y, Wang Y, Zhou W, Wang Y, Yang Z, Bai X (2019) Textfield: Learning a deep direction field for irregular scene text detection. IEEE Trans Image Process 28(11):5566–5579
Article MathSciNet Google Scholar
Xue C, Lu S, Zhan F (2018) Accurate scene text detection through border semantics awareness and bootstrapping. In: Proceedings of the European conference on computer vision (ECCV), pp 355–372
Xue C, Lu S, Zhang W (2019) Msr: Multi-scale shape regression for scene text detection. arXiv:190102596
Yang Q, Cheng M, Zhou W, Chen Y, Qiu M, Lin W, Chu W (2018) Inceptext: A new inception-text module with deformable psroi pooling for multi-oriented scene text detection. arXiv:180501167
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1083–1090
Yao C, Bai X, Sang N, Zhou X, Zhou S, Cao Z (2016) Scene text detection via holistic, multi-channel prediction. arXiv:160609002
Yao Z, Cao Y, Zheng S, Huang G, Lin S (2020) Cross-iteration batch normalization. arXiv:200205712
Ye Q, Doermann D (2015) Text detection and recognition in imagery: A survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500
Article Google Scholar
Yin X C, Zuo Z Y, Tian S, Liu C L (2016) Text detection, tracking and recognition in video: A comprehensive survey. IEEE Trans Image Process 25(6):2752–2773
Article MathSciNet Google Scholar
Yuliang L, Lianwen J, Shuaitao Z, Sheng Z (2017) Detecting curve text in the wild: New dataset and new solution. arXiv:171202170
Yun S, Han D, Oh S J, Chun S, Choe J, Yoo Y (2019) Cutmix: Regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE international conference on computer vision, pp 6023–6032
Zhang C, Liang B, Huang Z, En M, Han J, Ding E, Ding X (2019) Look more than once: An accurate detector for text of arbitrary shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10,552–10,561
Zhang H, Zhao K, Song Y Z, Guo J (2013) Text extraction from natural scene image: A survey. Neurocomputing 122:310–323
Article Google Scholar
Zhang L, Liu Y, Xiao H, Yang L, Zhu G, Shah S A, Bennamoun M, Shen P (2020) Efficient scene text detection with textual attention tower. arXiv:200203741
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4159–4167
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: Faster and better learning for bounding box regression. In: AAAI, pp 12,993–13,000
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: An efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5551–5560
Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets v2: More deformable, better results. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9308–9316
Zhu Y, Yao C, Bai X (2016) Scene text detection and recognition: Recent advances and future trends. Front Comput Sci 10(1):19–36
Article Google Scholar

Download references

Funding

This paper was funded from Department of Science and Technology (DST), New Delhi, India under the INnovation in Science Pursuit for Inspired REsearch (INSPIRE) Fellowship Program.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, College of Engineering, Guindy, Anna University, Chennai, 600025, India
Aparna Yegnaraman & S. Valli

Authors

Aparna Yegnaraman
View author publications
You can also search for this author in PubMed Google Scholar
S. Valli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aparna Yegnaraman.

Ethics declarations

Conflict of interests

All the authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yegnaraman, A., Valli, S. A comparative approach on detecting multi-lingual and multi-oriented text in natural scene images. Appl Intell 51, 3696–3717 (2021). https://doi.org/10.1007/s10489-020-01972-1

Download citation

Accepted: 23 September 2020
Published: 17 November 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10489-020-01972-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparative approach on detecting multi-lingual and multi-oriented text in natural scene images

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A comparative approach on detecting multi-lingual and multi-oriented text in natural scene images

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation