Skip to main content

Advertisement

Log in

A comparative approach on detecting multi-lingual and multi-oriented text in natural scene images

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Text helps to convey the intended message to users very accurately. Detecting text from natural scene images for quadrilateral-type and polygon-type datasets is the primary scope of this work. A regression-based method using modified You Only Look Once YOLOv4 network is used for quadrilateral-type datasets. Hyperparameters for training the network are optimized using the Genetic Algorithm which proves to be a suitable candidate than traditional methods. The Pixels-IoU (PIoU) loss is introduced to derive an accurate bounding box and it seems to be productive under various challenging scenarios with high aspect ratios and complex background. This yielded quick results for quadrilateral-type datasets but did not scale for arbitrarily-shaped and curved scene text. So the approach is changed to segmentation based for enhancing the results. This introduces binarization operation in a segmentation network to boost its detection accuracy for polygon-type datasets. The introduction of a new module DiffBiSeg (Differentiable Binarization in Segmentation network) facilitates post-processing and text detection performance by setting the thresholds flexibly for binarization in the segmentation network. The efficacy of both approaches is clearly seen in their respective experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Baek Y, Lee B, Han D, Yun S, Lee H (2019) Character region awareness for text detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9365–9374

  2. Bochkovskiy A, Wang C Y, Liao H Y M (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv:200410934

  3. Busta M, Neumann L, Matas J (2017) Deep textspotter: An end-to-end trainable scene text localization and recognition framework. In: Proceedings of the IEEE international conference on computer vision, pp 2204–2212

  4. Chen X, Yuille A L (2004) Detecting and reading text in natural scenes. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004., IEEE, vol 2, pp II–II

  5. Chen Z, Chen K, Lin W, See J, Yu H, Ke Y, Yang C (2020) Piou loss: Towards accurate oriented object detection in complex environments. arXiv:200709584

  6. Ch’ng C K, Chan C S (2017) Total-text: A comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, vol 1, pp 935–942

  7. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773

  8. Deng D, Liu H, Li X, Cai D (2018) Pixellink: Detecting scene text via instance segmentation. In: Thirty-second AAAI conference on artificial intelligence

  9. Feng W, He W, Yin F, Zhang X Y, Liu C L (2019) Textdragon: An End-to-end framework for arbitrary shaped text spotting. In: Proceedings of the IEEE international conference on computer vision, pp 9076–9085

  10. Ghiasi G, Lin T Y, Le Q V (2018) Dropblock: A regularization method for convolutional networks. In: Advances in neural information processing systems, pp 10,727–10,737

  11. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  12. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  13. Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324

  14. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  15. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  16. He T, Tian Z, Huang W, Shen C, Qiao Y, Sun C (2018) An end-to-end textspotter with explicit alignment and attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5020–5029

  17. He W, Zhang X Y, Yin F, Liu C L (2017) Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE international conference on computer vision, pp 745–753

  18. Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In: Proceedings of the IEEE international conference on computer vision, pp 1241–1248

  19. Huang Z, Wang J (2018) Dc-spp-yolo: Dense connection and spatial pyramid pooling based yolo for object detection. arxiv 2019. arXiv:190308589

  20. Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: European conference on computer vision, Springer, pp 512–528

  21. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20

    Article  MathSciNet  Google Scholar 

  22. Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda L G, Mestre S R, Mas J, Mota D F, Almazan J A, De Las Heras L P (2013) Icdar 2013 robust reading competition. In: 2013 12th international conference on document analysis and recognition, IEEE, pp 1484–1493

  23. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar V R, Lu S et al (2015) Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 1156–1160

  24. Kingma D P, Ba J (2014) Adam: A method for stochastic optimization. arXiv:14126980

  25. Li H, Wang P, Shen C (2017) Towards end-to-end text spotting with convolutional recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 5238–5246

  26. Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: A fast text detector with a single deep neural network. In: Thirty-first AAAI conference on artificial intelligence

  27. Liao M, Shi B, Bai X (2018) Textboxes++: A single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690

    Article  MathSciNet  Google Scholar 

  28. Liao M, Zhu Z, Shi B, Xia GS, Bai X (2018) Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5909–5918

  29. Liao M, Wan Z, Yao C, Chen K, Bai X (2020) Real-time scene text detection with differentiable binarization. In: AAAI, pp 11,474–11,481

  30. Lin T Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  31. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768

  32. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37

  33. Liu X, Liang D, Yan S, Chen D, Qiao Y, Yan J (2018) Fots: Fast oriented text spotting with a unified network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5676–5685

  34. Liu Y, Jin L (2017) Deep matching prior network: Toward tighter multi-oriented text detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1962–1969

  35. Liu Z, Lin G, Yang S, Feng J, Lin W, Goh W L (2018) Learning Markov clustering networks for scene text detection. arXiv:180508365

  36. Liu Z, Lin G, Yang S, Liu F, Lin W, Goh W L (2019) Towards robust curve text detection with conditional spatial expansion. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7269–7278

  37. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  38. Long S, Yao C (2020) Unrealtext: Synthesizing realistic scene text images from the unreal world. arXiv:200310608

  39. Long S, He X, Yao C (2018) Scene text detection and recognition: The deep learning era. arXiv:181104256

  40. Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV), pp 20–36

  41. Loshchilov I, Hutter F (2016) Sgdr: Stochastic gradient descent with warm restarts. arXiv:160803983

  42. Lu L, Wu D, Wu T, Huang F, Yi Y (2020) Anchor-free multi-orientation text detection in natural scene images. Appl Intell 1–15

  43. Lyu P, Liao M, Yao C, Wu W, Bai X (2018) Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV), pp 67–83

  44. Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7553–7563

  45. Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122

    Article  Google Scholar 

  46. Misra D (2019) Mish: A self regularized non-monotonic neural activation function. arXiv:190808681

  47. Nayef N, Patel Y, Busta M, Chowdhury P N, Karatzas D, Khlif W, Matas J, Pal U, Burie J C, Liu CL et al (2019) Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: 2019 International conference on document analysis and recognition (ICDAR), IEEE, pp 1582–1587

  48. Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian conference on computer vision, Springer, pp 770–783

  49. Neumann L, Matas J (2015) Real-time lexicon-free scene text localization and recognition. IEEE Trans Pattern Anal Mach Intell 38(9):1872–1885

    Article  Google Scholar 

  50. Qin S, Bissacco A, Raptis M, Fujii Y, Xiao Y (2019) Towards unconstrained end-to-end text spotting. In: Proceedings of the IEEE international conference on computer vision, pp 4704–4714

  51. Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271

  52. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:180402767

  53. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  54. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  55. Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 658–666

  56. Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304

    Article  Google Scholar 

  57. Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2550–2558

  58. Tan M, Pang R, Le Q V (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10,781–10,790

  59. Tian S, Pan Y, Huang C, Lu S, Yu K, Lim Tan C (2015) Text flow: A unified text detection system in natural scene images. In: Proceedings of the IEEE international conference on computer vision, pp 4651–4659

  60. Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In: European conference on computer vision, Springer, pp 56–72

  61. Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J (2019) Learning shape-aware embedding for scene text detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4234–4243

  62. Vatti B R (1992) A generic solution to polygon clipping. Commun ACM 35(7):56–63

    Article  Google Scholar 

  63. Wang C Y, Mark Liao H Y, Wu Y H, Chen P Y, Hsieh J W, Yeh I H (2020) Cspnet: A new backbone that can enhance learning capability of cnn. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 390–391

  64. Wang T, Wu D J, Coates A, Ng A Y (2012) End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), IEEE, pp 3304–3308

  65. Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9336–9345

  66. Wang X, Jiang Y, Luo Z, Liu C L, Choi H, Kim S (2019) Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6449–6458

  67. Woo S, Park J, Lee J Y, So Kweon I (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  68. Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 9038–9045

  69. Xing L, Tian Z, Huang W, Scott M R (2019) Convolutional character networks. In: Proceedings of the IEEE international conference on computer vision, pp 9126–9136

  70. Xu Y, Wang Y, Zhou W, Wang Y, Yang Z, Bai X (2019) Textfield: Learning a deep direction field for irregular scene text detection. IEEE Trans Image Process 28(11):5566–5579

    Article  MathSciNet  Google Scholar 

  71. Xue C, Lu S, Zhan F (2018) Accurate scene text detection through border semantics awareness and bootstrapping. In: Proceedings of the European conference on computer vision (ECCV), pp 355–372

  72. Xue C, Lu S, Zhang W (2019) Msr: Multi-scale shape regression for scene text detection. arXiv:190102596

  73. Yang Q, Cheng M, Zhou W, Chen Y, Qiu M, Lin W, Chu W (2018) Inceptext: A new inception-text module with deformable psroi pooling for multi-oriented scene text detection. arXiv:180501167

  74. Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1083–1090

  75. Yao C, Bai X, Sang N, Zhou X, Zhou S, Cao Z (2016) Scene text detection via holistic, multi-channel prediction. arXiv:160609002

  76. Yao Z, Cao Y, Zheng S, Huang G, Lin S (2020) Cross-iteration batch normalization. arXiv:200205712

  77. Ye Q, Doermann D (2015) Text detection and recognition in imagery: A survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500

    Article  Google Scholar 

  78. Yin X C, Zuo Z Y, Tian S, Liu C L (2016) Text detection, tracking and recognition in video: A comprehensive survey. IEEE Trans Image Process 25(6):2752–2773

    Article  MathSciNet  Google Scholar 

  79. Yuliang L, Lianwen J, Shuaitao Z, Sheng Z (2017) Detecting curve text in the wild: New dataset and new solution. arXiv:171202170

  80. Yun S, Han D, Oh S J, Chun S, Choe J, Yoo Y (2019) Cutmix: Regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE international conference on computer vision, pp 6023–6032

  81. Zhang C, Liang B, Huang Z, En M, Han J, Ding E, Ding X (2019) Look more than once: An accurate detector for text of arbitrary shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10,552–10,561

  82. Zhang H, Zhao K, Song Y Z, Guo J (2013) Text extraction from natural scene image: A survey. Neurocomputing 122:310–323

    Article  Google Scholar 

  83. Zhang L, Liu Y, Xiao H, Yang L, Zhu G, Shah S A, Bennamoun M, Shen P (2020) Efficient scene text detection with textual attention tower. arXiv:200203741

  84. Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4159–4167

  85. Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: Faster and better learning for bounding box regression. In: AAAI, pp 12,993–13,000

  86. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: An efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5551–5560

  87. Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets v2: More deformable, better results. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9308–9316

  88. Zhu Y, Yao C, Bai X (2016) Scene text detection and recognition: Recent advances and future trends. Front Comput Sci 10(1):19–36

    Article  Google Scholar 

Download references

Funding

This paper was funded from Department of Science and Technology (DST), New Delhi, India under the INnovation in Science Pursuit for Inspired REsearch (INSPIRE) Fellowship Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aparna Yegnaraman.

Ethics declarations

Conflict of interests

All the authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yegnaraman, A., Valli, S. A comparative approach on detecting multi-lingual and multi-oriented text in natural scene images. Appl Intell 51, 3696–3717 (2021). https://doi.org/10.1007/s10489-020-01972-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01972-1

Keywords

Navigation