Skip to main content
Log in

Scene text understanding: recapitulating the past decade

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Computational perception has indeed been dramatically modified and reformed from handcrafted feature-based techniques to the advent of deep learning. Scene text identification and recognition have inexorably been touched by this bow effort of upheaval, ushering in the period of deep learning. It is an important aspect of machine vision. Society has seen significant improvements in thinking, approach, and effectiveness over time. The goal of this study is to summarize and analyze the important developments and notable advancements in scene text identification and recognition over the past decade. We have discussed the significant handcrafted feature-based techniques which had been regarded as flagship systems in the past. They were succeeded by deep learning-based techniques. We have discussed such approaches from their inception to the development of complex models which have taken scene text identification to the next stage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. http://www.freeocr.net/ as visited on 01.11.2022.

  2. https://www.simpleocr.com/ as visited on 01.11.2022.

  3. https://jocr.sourceforge.net/ as visited on 01.11.2022.

  4. https://easyscreenocr.com/ as visited on 01.11.2022.

  5. https://tesseract-ocr.github.io/tessdoc/Downloads.html as visited on 01.11.2022.

  6. https://pdf.abbyy.com/ as visited on 01.11.2022.

  7. https://pypi.org/project/PyPDF2/ as visited on 01.11.2022.

References

  • Aberdam A, Litman R, Tsiper S, Anschel O, Slossberg R, Mazor S, Manmatha R, Perona P (2021) Sequence-to-sequence contrastive learning for text recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15302–15312

  • Afzal MZ, Pastor-Pellicer J, Shafait F, Breuel TM, Dengel A, Liwicki M (2015) Document image binarization using lstm: a sequence learning approach. In: Proceedings of the 3rd international workshop on historical document imaging and processing, pp 79–84

  • Agrawal P, Varma R (2012) Text extraction from images. IJCSET 2(4):1083–1087

    Google Scholar 

  • Akata Z, Perronnin F, Harchaoui Z, Schmid C (2013) Label-embedding for attribute-based classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 819–826

  • Ammirato P, Berg AC (2019) A mask-rcnn baseline for probabilistic object detection. arXiv:1908.03621

  • Angadi S, Kodabagi M (2010) Text region extraction from low resolution natural scene images using texture features. In: 2010 IEEE 2nd international advance computing conference (IACC). IEEE, pp 121–128

  • Atienza R (2021a) Data augmentation for scene text recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1561–1570

  • Atienza R (2021b) Vision transformer for fast and efficient scene text recognition. In: International conference on document analysis and recognition. Springer, New York, pp 319–334

  • Azadboni MK, Samadhiya A, Khatri P (2014) Multi-orientation text detection by skeletonization (motds). In: 2014 2nd international symposium on computational and business intelligence. IEEE, pp 5–9

  • Baek J, Kim G, Lee J, Park S, Han D, Yun S, Oh SJ, Lee H (2019) What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4715–4723

  • Bai X, Shi B, Zhang C, Cai X, Qi L (2017) Text/non-text image classification in the wild with convolutional neural networks. Pattern Recogn 66:437–446

    Google Scholar 

  • Bai F, Cheng Z, Niu Y, Pu S, Zhou S (2018) Edit probability for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1508–1516

  • Bhattacharyya S, Kumar J, Ghoshal K (2020) Mathematical modeling and computational tools: ICACM 2018, Kharagpur, India, November 23–25, vol 320. Springer, New York

  • Bhunia AK, Konwer A, Bhunia AK, Bhowmick A, Roy PP, Pal U (2019) Script identification in natural scene image and video frames using an attention based convolutional-lstm network. Pattern Recogn 85:172–184

    Google Scholar 

  • Bissacco A, Cummins M, Netzer Y, Neven H (2013) Photoocr: reading text in uncontrolled conditions. In: Proceedings of the Ieee international conference on computer vision, pp 785–792

  • Borisyuk F, Gordo A, Sivakumar V (2018) Rosetta: large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 71–79

  • Boureau Y-L, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2559–2566

  • Busta M, Neumann L, Matas J (2017) Deep textspotter: an end-to-end trainable scene text localization and recognition framework. In: Proceedings of the IEEE international conference on computer vision, pp 2204–2212

  • Calvo-Zaragoza J, Gallego A-J (2019) A selectional auto-encoder approach for document image binarization. Pattern Recogn 86:37–47

    Google Scholar 

  • Cao Y, Ma S, Pan H (2020) Fdta: fully convolutional scene text detection with text attention. IEEE Access 8:155441–155449

    Google Scholar 

  • Cao D, Dang J, Zhong Y (2021) Towards accurate scene text detection with bidirectional feature pyramid network. Symmetry 13(3):486

    Google Scholar 

  • Chen X, Yuille AL (2004) Detecting and reading text in natural scenes. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition. CVPR 2004, vol. 2. IEEE

  • Chen H, Tsai SS, Schroth G, Chen DM, Grzeszczuk R, Girod B (2011) Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 2011 18th IEEE international conference on image processing. IEEE, pp 2609–2612

  • Chen J, Ma T, Xiao C (2018) Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv:1801.10247

  • Chen X, Jin L, Zhu Y, Luo C, Wang T (2021) Text recognition in the wild: a survey. ACM Comput Surv (CSUR) 54(2):1–35

    Google Scholar 

  • Cheng Z, Bai F, Xu Y, Zheng G, Pu S, Zhou S (2017) Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE international conference on computer vision, pp 5076–5084

  • Cheng Z, Xu Y, Bai F, Niu Y, Pu S, Zhou S (2018) Aon: towards arbitrarily-oriented text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5571–5579

  • Cheng C, Huang Q, Bai X, Feng B, Liu W (2019) Patch aggregator for scene text script identification. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 1077–1083

  • Ch’ng CK, Chan CS (2017) Total-text: a comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 935–942

  • Chng CK, Liu Y, Sun Y, Ng CC, Luo C, Ni Z, Fang C, Zhang S, Han J, Ding E, et al (2019) Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 1571–1576

  • Chowdhury AR, Bhattacharya U, Parui SK (2011) Text detection of two major Indian scripts in natural scene images. In: International workshop on camera-based document analysis and recognition. Springer, New York, pp 42–57

  • Coates A, Carpenter B, Case C, Satheesh S, Suresh B, Wang T, Wu DJ, Ng AY (2011) Text detection and character recognition in scene images with unsupervised feature learning. In: 2011 international conference on document analysis and recognition. IEEE, pp 440–445

  • Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893

  • Darab M, Rahmati M (2012) A hybrid approach to localize farsi text in natural scene images. Procedia Comput. Sci. 13:171–184

    Google Scholar 

  • Dargan S, Kumar M, Ayyagari MR, Kumar G (2020) A survey of deep learning and its applications: a new paradigm to machine learning. Arch. Comput. Methods Eng. 27(4):1071–1092

    MathSciNet  Google Scholar 

  • Dasgupta K, Das S, Bhattacharya U (2020) Scale-invariant multi-oriented text detection in wild scene image. In: 2020 IEEE international conference on image processing (ICIP), pp 2041–2045. IEEE

  • Dauphin YN, Fan A, Auli M, Grangier D (2017) Language modeling with gated convolutional networks. In: International conference on machine learning. PMLR, pp 933–941

  • De Campos TE, Babu BR, Varma M et al (2009) Character recognition in natural images. VISAPP 7:1–10

    Google Scholar 

  • Decker LGL, Pinto A, Campana JLF, Neira MC, dos Santos AA, Conceiçao JS, Angeloni MA, Li LT, et al (2020) MobText: a compact method for scene text localization. VISAPP

  • Del Gobbo J, Herrera RM (2020) Unconstrained text detection in manga: a new dataset and baseline. In: European conference on computer vision. Springer, New York, pp 629–646

  • Deng D, Liu H, Li X, Cai D (2018) Pixellink: detecting scene text via instance segmentation. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  • Dey S, Shivakumara P, Raghunandan K, Pal U, Lu T, Kumar GH, Chan CS (2017) Script independent approach for multi-oriented text detection in scene image. Neurocomputing 242:96–112

    Google Scholar 

  • Dhar D, Chakraborty N, Choudhury S, Paul A, Mollah AF, Basu S, Sarkar R (2020) Multilingual scene text detection using gradient morphology. Int J Comput Vis Image Process (IJCVIP) 10(3):31–43

    Google Scholar 

  • Dizaji KG, Zheng F, Sadoughi N, Yang Y, Deng C, Huang H (2018) Unsupervised deep generative adversarial hashing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3664–3673

  • Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2963–2970

  • Fang S, Xie H, Zha Z-J, Sun N, Tan J, Zhang Y (2018) Attention and language ensemble for scene text recognition with convolutional sequence modeling. In: Proceedings of the 26th ACM international conference on multimedia, pp 248–256

  • Fasil O, Manjunath S, Aradhya VM (2017) Word-level script identification from scene images. In: Proceedings of the 5th international conference on frontiers in intelligent computing: theory and applications. Springer, New York, pp 417–426

  • Feng Y, Song Y, Zhang Y (2016) Scene text detection based on multi-scale swt and edge filtering. In: 2016 23rd international conference on pattern recognition (ICPR). IEEE, pp 645–650

  • Fernando B, Fromont E, Tuytelaars T (2014) Mining mid-level features for image classification. Int J Comput Vision 108(3):186–203

    MathSciNet  Google Scholar 

  • Ganin Y, Lempitsky V (2015) Unsupervised domain adaptation by backpropagation. In: International conference on machine learning. PMLR, pp 1180–1189

  • Gao H, Li Y, Wang X, Han J, Li R (2019) Ensemble attention for text recognition in natural images. In: 2019 international joint conference on neural networks (IJCNN). IEEE, pp 1–8

  • Gao D, Li K, Wang R, Shan S, Chen X (2020) Multi-modal graph neural network for joint reasoning on vision and scene text. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12746–12756

  • Garcia C, Apostolidis X (2000) Text detection and segmentation in complex color images. In: 2000 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 00CH37100), vol. 4. IEEE, pp 2326–2329

  • Ghosh M, Obaidullah SM, Santosh K, Das N, Roy K (2018) Artistic multi-character script identification using iterative isotropic dilation algorithm. In: International conference on recent trends in image processing and pattern recognition. Springer, New York, pp 49–62

  • Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K (2019a) Artistic multi-character script identification. In: Document processing using machine learning. Chapman and Hall/CRC, Boston, pp 28–42

  • Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K (2019b) Identifying the presence of graphical texts in scene images using cnn. In: 2019 international conference on document analysis and recognition workshops (ICDARW), vol 1. IEEE, pp 86–91

  • Ghosh M, Roy SS, Mukherjee H, Obaidullah SM, Santosh K, Roy K (2019c) Automatic text localization in scene images: a transfer learning based approach. In: National conference on computer vision, pattern recognition, image processing, and graphics. Springer, New York, pp 470–479

  • Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K (2020) Artistic multi-script identification at character level with extreme learning machine. Procedia Comput. Sci. 167:496–505

    Google Scholar 

  • Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K (2021a) Lwsinet: a deep learning-based approach towards video script identification. Multimed Tools Appl 1:1–34

    Google Scholar 

  • Ghosh M, Roy SS, Mukherjee H, Obaidullah SM, Gao X-Z, Roy K (2021b) Movie title extraction and script separation using shallow convolution neural network. IEEE Access 9:125184–125201

    Google Scholar 

  • Ghosh M, Roy SS, Mukherjee H, Obaidullah SM, Santosh K, Roy K (2022) Understanding movie poster: transfer-deep learning approach for graphic-rich text recognition. Vis Comput 38(5):1645–1664

    Google Scholar 

  • Ghoshal R, Banerjee A (2020) Svm and mlp based segmentation and recognition of text from scene images through an effective binarization scheme. In: Computational intelligence in pattern recognition. Springer, New York, pp 237–246

  • Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  • Gkioxari G, Girshick R, Malik J (2015) Contextual action recognition with r* cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1080–1088

  • Gllavata J, Ewerth R, Freisleben B (2004) Text detection in images based on unsupervised classification of high-frequency wavelet coefficients. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004, vol 1. IEEE, pp 425–428

  • Gllavata J, Freisleben B (2005) Script recognition in images with complex backgrounds. In: Proceedings of the fifth IEEE international symposium on signal processing and information technology, 2005. IEEE, pp 589–594

  • Goel V, Mishra A, Alahari K, Jawahar C (2013) Whole is greater than sum of parts: Recognizing scene text words. In: 2013 12th international conference on document analysis and recognition. IEEE, pp 398–402

  • Gomez L, Karatzas D (2013) Multi-script text extraction from natural scenes. In: 2013 12th international conference on document analysis and recognition. IEEE, pp 467–471

  • Gomez L, Karatzas D (2016) A fine-grained approach to scene text script identification. In: 2016 12th IAPR workshop on document analysis systems (DAS). IEEE, pp 192–197

  • Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recogn 67:85–96

    Google Scholar 

  • Gonzalez A, Bergasa LM, Yebes JJ, Bronte S (2012) Text location in complex images. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 617–620

  • Goodfellow IJ, Bulatov Y, Ibarz J, Arnoud S, Shet V (2013a) Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv:1312.6082

  • Goodfellow I, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013b) Maxout networks. In: International conference on machine learning. PMLR, pp 1319–1327

  • Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27:1–10

    Google Scholar 

  • Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324

  • He T, Huang W, Qiao Y, Yao J (2016a) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541

    MathSciNet  MATH  Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2016b) Deep residual learning for image recognition. In: Proceedings of the ieee conference on computer vision and pattern recognition, pp 770–778

  • He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  • He W, Zhang X-Y, Yin F, Liu C-L (2018) Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Trans Image Process 27(11):5406–5419

    MathSciNet  Google Scholar 

  • Howe NR (2011) A Laplacian energy for document binarization. In: 2011 international conference on document analysis and recognition. IEEE, pp 6–10

  • Hu Z, Pi P, Wu Z, Xue Y, Shen J, Tan J, Lian X, Wang Z, Liu J (2021) E2vts: energy-efficient video text spotting from unmanned aerial vehicles. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 905–913

  • Huang W, Lin Z, Yang J, Wang J (2013a) Text localization in natural images using stroke feature transform and text covariance descriptors. In: Proceedings of the IEEE international conference on computer vision, pp 1241–1248

  • Huang R, Shivakumara P, Uchida S (2013b) Scene character detection by an edge-ray filter. In: 2013 12th international conference on document analysis and recognition. IEEE, pp 462–466

  • Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: European conference on computer vision. Springer, New York, pp 497–511

  • Huang Z, Zhong Z, Sun L, Huo Q (2019) Mask r-cnn with pyramid attention network for scene text detection. In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 764–772

  • Huang J, Pang G, Kovvuri R, Toh M, Liang KJ, Krishnan P, Yin X, Hassner T (2021) A multiplexed network for end-to-end, multilingual ocr. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4547–4557

  • Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014a) Synthetic data and artificial neural networks for natural scene text recognition. arXiv:1406.2227

  • Jaderberg M, Vedaldi A, Zisserman A (2014b) Deep features for text spotting. In: European conference on computer vision. Springer, New York, pp 512–528

  • Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20

    MathSciNet  Google Scholar 

  • Jang I, Ko B, Byun H, Choi Y (2002) Automatic text extraction in news images using morphology. In: Visual communications and image processing 2002, vol 4671. International Society for Optics and Photonics, pp 521–530

  • Juneja M, Vedaldi A, Jawahar C, Zisserman A (2013) Blocks that shout: distinctive parts for scene classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 923–930

  • Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda LG, Mestre SR, Mas J, Mota DF, Almazan JA, De Las Heras LP (2013) Icdar 2013 robust reading competition. In: 2013 12th international conference on document analysis and recognition. IEEE, pp 1484–1493

  • Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, et al (2015) Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, pp 1156–1160

  • Kasar T, Ramakrishnan AG (2011) Multi-script and multi-oriented text localization from scene images. In: International workshop on camera-based document analysis and recognition. Springer, New York, pp 1–14

  • Khalil A, Jarrah M, Al-Ayyoub M, Jararweh Y (2021) Text detection and script identification in natural scene images using deep learning. Comput. Electr. Eng. 91:107043

    Google Scholar 

  • Khan T, Mollah AF (2019) Autnt-a component level dataset for text non-text classification and benchmarking with novel script invariant feature descriptors and d-cnn. Multimed Tools Appl 78(22):32159–32186

    Google Scholar 

  • Khan T, Sarkar R, Mollah AF (2021) Deep learning approaches to scene text detection: a comprehensive review. Artif Intell Rev 54(5):3239–3298

    Google Scholar 

  • Kim KI, Jung K, Kim JH (2003) Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Trans Pattern Anal Mach Intell 25(12):1631–1639

    Google Scholar 

  • Kim K-H, Hong S, Roh B, Cheon Y, Park M (2016) Pvanet: deep but lightweight neural networks for real-time object detection. arXiv:1608.08021

  • Kim S, Hori T, Watanabe S (2017) Joint ctc-attention based end-to-end speech recognition using multi-task learning. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4835–4839

  • Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv:1609.02907

  • Kong H, Tang D, Meng X, Lu T (2019) Garn: a novel generative adversarial recognition network for end-to-end scene character recognition. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 689–694

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  • Kumuda T, Basavaraj L (2015) Detection and localization of text from natural scene images using texture features. In: 2015 IEEE international conference on computational intelligence and computing research (ICCIC). IEEE, pp 1–4

  • Lee C-Y, Bhardwaj A, Di W, Jagadeesh V, Piramuthu R (2014) Region-based discriminative feature pooling for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4050–4057

  • Lee CY, Baek Y, Lee H (2019) Tedeval: a fair evaluation metric for scene text detectors. In: 2019 international conference on document analysis and recognition workshops (ICDARW), vol 7. IEEE, pp 14–17

  • Lei Z, Zhao S, Song H, Shen J (2018) Scene text recognition using residual convolutional recurrent neural network. Mach Vis Appl 29(5):861–871

    Google Scholar 

  • Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process 9(1):147–156

    Google Scholar 

  • Li H, Wang P, Shen C (2017) Towards end-to-end text spotting with convolutional recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 5238–5246

  • Li H, Wang P, Shen C, Zhang G (2019a) Show, attend and read: a simple and strong baseline for irregular text recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 8610–8617

  • Li K, Zhang Y, Li K, Li Y, Fu Y (2019b) Visual semantic reasoning for image-text matching. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4654–4662

  • Liao M, Zhang J, Wan Z, Xie F, Liang J, Lyu P, Yao C, Bai X (2019) Scene text recognition from two-dimensional perspective. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 8714–8721

  • Lim JJ, Zitnick CL, Dollár P (2013) Sketch tokens: A learned mid-level representation for contour and object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3158–3165

  • Lin G, Milan A, Shen C, Reid I (2017) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925–1934

  • Lin H, Yang P, Zhang F (2020) Review of scene text detection and recognition. Arch Comput Methods Eng 27(2):433–454

    Google Scholar 

  • Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016a) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, New York, pp 21–37

  • Liu W, Chen C, Wong K-YK, Su Z, Han J (2016b) Star-net: a spatial attention residue network for scene text recognition. In: BMVC, vol 2, p 7

  • Liu Z, Lin G, Yang S, Feng J, Lin W, Goh WL (2018a) Learning markov clustering networks for scene text detection. arXiv:1805.08365

  • Liu Z, Li Y, Ren F, Goh WL, Yu H (2018b) Squeezedtext: a real-time scene text recognition by binary convolutional encoder-decoder network. In: Thirty-second AAAI conference on artificial intelligence

  • Liu X, Meng G, Pan C (2019) Scene text detection and recognition with advances in deep learning: a survey. Int J Doc Anal Recogn (IJDAR) 22(2):143–162

    Google Scholar 

  • Liu H, Guo A, Jiang D, Hu Y, Ren B (2020) Puzzlenet: scene text detection by segment context graph learning. arXiv:2002.11371

  • Liu Y, He T, Chen H, Wang X, Luo C, Zhang S, Shen C, Jin L (2021) Exploring the capacity of an orderless box discretization network for multi-orientation scene text detection. Int J Comput Vis 129(6):1972–1992

    Google Scholar 

  • Long S, He X, Yao C (2021) Scene text detection and recognition: the deep learning era. Int J Comput Vis 129(1):161–184

    Google Scholar 

  • Long M, Cao Y, Wang J, Jordan M (2015) Learning transferable features with deep adaptation networks. In: International conference on machine learning. PMLR, pp 97–105

  • Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV), pp 20–36

  • Lu S, Su B, Tan CL (2010) Document image binarization using background estimation and stroke edges. Int J Doc Anal Recogn (IJDAR) 13(4):303–314

    Google Scholar 

  • Lu L, Yi Y, Huang F, Wang K, Wang Q (2019) Integrating local CNN and global CNN for script identification in natural scene images. IEEE Access 7:52669–52679

    Google Scholar 

  • Lucas SM (2005) Icdar 2005 text locating competition results. In: Eighth international conference on document analysis and recognition (ICDAR’05), pp 80–84

  • Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R, Ashida K, Nagai H, Okamoto M, Yamamoto H et al (2005) Icdar 2003 robust reading competitions: entries, results, and future directions. IJDAR 7(2–3):105–122

    Google Scholar 

  • Luo C, Jin L, Sun Z (2019) Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn 90:109–118

    Google Scholar 

  • Lyu MR, Song J, Cai M (2005) A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans Circuits Syst Video Technol 15(2):243–255

    Google Scholar 

  • Lyu P, Liao M, Yao C, Wu W, Bai X (2018a) Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV), pp 67–83

  • Lyu P, Yao C, Wu W, Yan S, Bai X (2018b) Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7553–7563

  • Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122

    Google Scholar 

  • Ma C, Sun L, Zhong Z, Huo Q (2021a) Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recogn 111:107684

    Google Scholar 

  • Ma M, Wang Q-F, Huang S, Huang S, Goulermas Y, Huang K (2021b) Residual attention-based multi-scale script identification in scene text images. Neurocomputing 421:222–233

    Google Scholar 

  • Mafla A, Dey S, Biten AF, Gomez L, Karatzas D (2021) Multi-modal reasoning graph for scene-text based fine-grained image classification and retrieval. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 4023–4033

  • Mahajan S, Rani R (2021) Text detection and localization in scene images: a broad review. Artif Intell Rev 54(6):4317–4377

    Google Scholar 

  • Mathew M, Jain M, Jawahar C (2017) Benchmarking scene text recognition in devanagari, telugu and malayalam. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 7. IEEE, pp 42–46

  • Mei J, Dai L, Shi B, Bai X (2016) Scene text script identification with convolutional recurrent neural networks. In: 2016 23rd international conference on pattern recognition (ICPR). IEEE, pp 4053–4058

  • Mishra A, Alahari K, Jawahar C (2012a) Scene text recognition using higher order language priors. In: BMVC-British Machine Vision Conference. BMVA

  • Mishra A, Alahari K, Jawahar C (2012b) Top-down and bottom-up cues for scene text recognition. In: 2012 IEEE conference on computer vision and pattern recognition, pp 2687–2694

  • Munjal RS, Goyal M, Moharir R, Moharana S (2021) Telcos: on device text localization with clustering of script. arXiv:2104.08045

  • Nagaoka Y, Miyazaki T, Sugaya Y, Omachi S (2021) Text detection using multi-stage region proposal network sensitive to text scale. Sensors 21(4):1232

    Google Scholar 

  • Naiemi F, Ghods V, Khalesi H (2021) A novel pipeline framework for multi oriented scene text image detection and recognition. Expert Syst Appl 170:114549

    Google Scholar 

  • Nayef N, Yin F, Bizid I, Choi H, Feng Y, Karatzas D, Luo Z, Pal U, Rigaud C, Chazalon J, et al (2017) Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 1454–1459

  • Nayef N, Patel Y, Busta M, Chowdhury PN, Karatzas D, Khlif W, Matas J, Pal U, Burie J-C, Liu C-l, et al (2019) Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition-rrc-mlt-2019. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 1582–1587

  • Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian conference on computer vision. Springer, New York, pp 770–783

  • Neumann L, Matas J (2012) Real-time scene text localization and recognition. In: 2012 ieee conference on computer vision and pattern recognition. IEEE, pp 3538–3545

  • Neumann L, Matas J (2013) Scene text localization and recognition with oriented stroke detection. In: Proceedings of the Ieee international conference on computer vision, pp 97–104

  • Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66

    Google Scholar 

  • Pan Y-F, Hou X, Liu C-L (2009) Text localization in natural scene images based on conditional random field. In: 2009 10th international conference on document analysis and recognition. IEEE, pp 6–10

  • Pan Y-F, Liu C-L, Hou X (2010a) Fast scene text localization by learning-based filtering and verification. In: 2010 IEEE international conference on image processing. IEEE, pp 2269–2272

  • Pan Y-F, Hou X, Liu C-L (2010b) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(3):800–813

    MathSciNet  MATH  Google Scholar 

  • Pandey D, Pandey BK, Wairya S (2021) Hybrid deep neural network with adaptive galactic swarm optimization for text extraction from scene images. Soft Comput 25(2):1563–1580

    Google Scholar 

  • Pastor-Pellicer J, España-Boquera S, Zamora-Martínez F, Afzal MZ, Castro-Bleda MJ (2015) Insights on the use of convolutional neural networks for document image binarization. In: International work-conference on artificial neural networks. Springer, New York, pp 115–126

  • Paul S, Saha S, Basu S, Saha PK, Nasipuri M (2019) Text localization in camera captured images using fuzzy distance transform based adaptive stroke filter. Multimed Tools Appl 78(13):18017–18036

    Google Scholar 

  • Pei Z, Cao Z, Long M, Wang J (2018) Multi-adversarial domain adaptation. In: Thirty-second AAAI conference on artificial intelligence

  • Peng X, Cao H, Natarajan P (2017) Using convolutional encoder-decoder for document image binarization. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 708–713

  • Phan TQ, Shivakumara P, Ding Z, Lu S, Tan CL (2011) Video script identification based on text lines. In: 2011 international conference on document analysis and recognition. IEEE, pp 1240–1244

  • Phan TQ, Shivakumara P, Tan CL (2012) Detecting text in the real world. In: Proceedings of the 20th ACM international conference on multimedia, pp 765–768

  • Phan TQ, Shivakumara P, Tian S, Tan CL (2013) Recognizing text with perspective distortion in natural scenes. In: Proceedings of the IEEE international conference on computer vision, pp 569–576

  • Pratikakis I, Gatos B, Ntirogiannis K (2013) Icdar 2013 document image binarization contest (dibco 2013). In: 2013 12th international conference on document analysis and recognition. IEEE, pp 1471–1476

  • Qin X, Jiang J, Yuan C-A, Qiao S, Fan W (2020) Arbitrary shape natural scene text detection method based on soft attention mechanism and dilated convolution. IEEE Access 8:122685–122694

    Google Scholar 

  • Raghunandan K, Shivakumara P, Roy S, Kumar GH, Pal U, Lu T (2018) Multi-script-oriented text detection and recognition in video/scene/born digital images. IEEE Trans Circuits Syst Video Technol 29(4):1145–1162

    Google Scholar 

  • Rainarli E et al (2021) A decade: review of scene text detection methods. Comput. Sci. Rev. 42:100434

    MathSciNet  Google Scholar 

  • Raisi Z, Naiel MA, Fieguth P, Wardell S, Zelek J (2020) 2d positional embedding-based transformer for scene text recognition. J Comput Vis Imaging Syst 6(1):1–4

    Google Scholar 

  • Raisi Z, Naiel MA, Younes G, Wardell S, Zelek JS (2021) Transformer-based text detection in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3162–3171

  • Rashmi V, Nayak SN (2018) A hybrid approach to localize text in natural scene images. Int J Eng Appl Sci Technol 3(1):53–60

    Google Scholar 

  • Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271

  • Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767

  • Ren X, Ramanan D (2013) Histograms of sparse codes for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3246–3253

  • Risnumawan A, Shivakumara P, Chan CS, Tan CL (2014) A robust arbitrary text detection system for natural scene images. Expert Syst Appl 41(18):8027–8048

    Google Scholar 

  • Risnumawan A, Sulistijono IA, Abawajy J (2016) Text detection in low resolution scene images using convolutional neural network. In: International conference on soft computing and data mining. Springer, New York, pp 366–375

  • Sajid U, Chow M, Zhang J, Kim T, Wang G (2021) Parallel scale-wise attention network for effective scene text recognition. arXiv:2104.12076

  • Selvam P, Koilraj JAS, Romero CAT, Alharbi M, Mehbodniya A, Webber JL, Sengan S (2022) A transformer-based framework for scene text recognition. IEEE Access 10:100895–100910

    Google Scholar 

  • Sengupta P, Mollah AF (2021) Scene character recognition with morphological filtering and hog features. In: Soft computing techniques and applications. Springer, New York, pp 1–9

  • Sermanet P, Chintala S, LeCun Y (2012) Convolutional neural networks applied to house numbers digit classification. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 3288–3291

  • Shahab A, Shafait F, Dengel A (2011) Icdar 2011 robust reading competition challenge 2: Reading text in scene images. In: 2011 international conference on document analysis and recognition, pp 1491–1496

  • Sharma N, Mandal R, Sharma R, Pal U, Blumenstein M (2015) Icdar2015 competition on video script identification (cvsi 2015). In: 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, pp 1196–1200

  • Sheng F, Chen Z, Xu B (2019) Nrtr: a no-recurrence sequence-to-sequence model for scene text recognition. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 781–786

  • Shi C, Xiao B, Wang C, Zhang Y (2012) Graph-based background suppression for scene text detection. In: 2012 10th IAPR international workshop on document analysis systems. IEEE, pp 210–214

  • Shi C, Wang C, Xiao B, Zhang Y, Gao S, Zhang Z (2013) Scene text recognition using part-based tree-structured character detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2961–2968

  • Shi B, Yao C, Zhang C, Guo X, Huang F, Bai X (2015) Automatic script identification in the wild. In: 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, pp 531–535

  • Shi B, Bai X, Yao C (2016a) Script identification in the wild via discriminative convolutional neural network. Pattern Recogn 52:448–458

    Google Scholar 

  • Shi B, Bai X, Yao C (2016b) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304

    Google Scholar 

  • Shi B, Wang X, Lyu P, Yao C, Bai X (2016c) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4168–4176

  • Shi B, Bai X, Belongie S (2017a) Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2550–2558

  • Shi B, Yao C, Liao M, Yang M, Xu P, Cui L, Belongie S, Lu S, Bai X (2017b) Icdar2017 competition on reading Chinese text in the wild (rctw-17). In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 1429–1434

  • Shi B, Yang M, Wang X, Lyu P, Yao C, Bai X (2018) Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans Pattern Anal Mach Intell 41(9):2035–2048

    Google Scholar 

  • Shinde A, Patil M (2021) Street view text detection methods. In: 2021 international conference on artificial intelligence and smart systems (ICAIS). IEEE, pp 961–965

  • Shivakumara P, Phan TQ, Tan CL (2010) A Laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419

    Google Scholar 

  • Shivakumara P, Sreedhar RP, Phan TQ, Lu S, Tan CL (2012) Multioriented video scene text detection through Bayesian classification and boundary growing. IEEE Trans Circuits Syst Video Technol 22(8):1227–1235

    Google Scholar 

  • Shivakumara P, Yuan Z, Zhao D, Lu T, Tan CL (2015) New gradient-spatial-structural features for video script identification. Comput Vis Image Underst 130:35–53

    Google Scholar 

  • Simanjuntak GD, Nugroho H (2021) Scene text detection with quadtree-based candidate text regions and convolutional neural network. Int J Electr Eng Inf 13(1):152–162

    Google Scholar 

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  • Singh AK, Mishra A, Dabral P, Jawahar C (2016) A simple and effective solution for script identification in the wild. In: 2016 12th IAPR workshop on document analysis systems (DAS). IEEE, pp 428–433

  • Soni R, Kumar B, Chand S (2019) Text detection and localization in natural scene images based on text awareness score. Appl Intell 49(4):1376–1405

    Google Scholar 

  • Sravani M, Maheswararao A, Murthy MK (2021) Robust detection of video text using an efficient hybrid method via key frame extraction and text localization. Multimed Tools Appl 80(6):9671–9686

    Google Scholar 

  • Sriman B, Schomaker L (2019) Multi-script text versus non-text classification of regions in scene images. J Vis Commun Image Represent 62:23–42

    Google Scholar 

  • Su B, Lu S (2014) Accurate scene text recognition based on recurrent neural network. In: Asian conference on computer vision. Springer, New York, pp 35–48

  • Su Y-M, Peng H-W, Huang K-W, Yang C-S (2019) Image processing technology for text recognition. In: 2019 international conference on technologies and applications of artificial intelligence (TAAI). IEEE, pp 1–5

  • Sun L, Huo Q, Jia W, Chen K (2015) A robust approach for text detection from natural scene images. Pattern Recogn 48(9):2906–2920

    Google Scholar 

  • Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  • Tang J, Yang Z, Wang Y, Zheng Q, Xu Y, Bai X (2019) Seglink++: detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern Recogn 96:106954

    Google Scholar 

  • Tao Y, Jia Z, Ma R, Xu S (2021) Trig: transformer-based text recognizer with initial embedding guidance. Electronics 10(22):2780

    Google Scholar 

  • Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/cvf international conference on computer vision, pp 9627–9636

  • Tounsi M, Moalla I, Lebourgeois F, Alimi AM (2017) Cnn based transfer learning for scene script identification. In: International conference on neural information processing. Springer, New York, pp 702–711

  • Turki H, Halima MB, Alimi AM (2016) Text detection in natural scene images using two masks filtering. In: 2016 IEEE/ACS 13th international conference of computer systems and applications (AICCSA). IEEE, pp 1–6

  • Turki H, Halima MB, Alimi AM (2017) A hybrid method of natural scene text detection using msers masks in hsv space color. In: Ninth international conference on machine vision (ICMV 2016), vol 10341. International Society for Optics and Photonics, p 1034111

  • Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7167–7176

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:1–10

    Google Scholar 

  • Veit A, Matera T, Neumann L, Matas J, Belongie S (2016) Coco-text: dataset and benchmark for text detection and recognition in natural images. arXiv:1601.07140

  • Verma M, Sood N, Roy PP, Raman B (2017) Script identification in natural scene images: a dataset and texture-feature based performance evaluation. In: Proceedings of international conference on computer vision and image processing. Springer, New York, pp 309–319

  • Wang K, Belongie S (2010) Word spotting in the wild. In: European conference on computer vision. Springer, New York, pp 591–604

  • Wang J, Hu X (2017) Gated recurrent convolution neural network for ocr. Adv Neural Inf Process Syst 30:1–10

    Google Scholar 

  • Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 international conference on computer vision. IEEE, pp 1457–1464

  • Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 3304–3308

  • Wang X, Wang B, Bai X, Liu W, Tu Z (2013) Max-margin multiple-instance dictionary learning. In: International conference on machine learning, pp 846–854

  • Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G (2018) Understanding convolution for semantic segmentation. In: 2018 IEEE winter conference on applications of computer vision (WACV), pp 1451–1460. IEEE

  • Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9336–9345

  • Wang S, Liu Y, He Z, Wang Y, Tang Z (2020a) A quadrilateral scene text detector with two-stage network architecture. Pattern Recogn 102:107230

    Google Scholar 

  • Wang T, Zhu Y, Jin L, Luo C, Chen X, Wu Y, Wang Q, Cai M (2020b) Decoupled attention network for text recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 12216–12224

  • Wang X, Zheng S, Zhang C, Li R, Gui L (2021a) R-yolo: a real-time text detector for natural scenes with arbitrary rotation. Sensors 21(3):888

    Google Scholar 

  • Wang P, Li H, Shen C (2021b) Towards end-to-end text spotting in natural scenes. IEEE Trans Pattern Anal Mach Intell

  • Wojna Z, Gorban AN, Lee D-S, Murphy K, Yu Q, Li Y, Ibarz J (2017) Attention-based extraction of structured information from street view imagery. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 844–850

  • Wolf C, Doermann D (2002) Binarization of low quality text using a markov random field model. In: Object recognition supported by user interaction for service robots, vol 3. IEEE, pp 160–163

  • Wolf C, Jolion J-M (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. IJDAR 8(4):280–296

    Google Scholar 

  • Wu H, Zou B, Zhao Y-Q, Chen Z, Zhu C, Guo J (2016) Natural scene text detection by multi-scale adaptive color clustering and non-text filtering. Neurocomputing 214:1011–1025

    Google Scholar 

  • Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K (2019a) Simplifying graph convolutional networks. In: International conference on machine learning. PMLR, pp 6861–6871

  • Wu H, Zhang J, Huang K, Liang K, Yu Y (2019b) Fastfcn: rethinking dilated convolution in the backbone for semantic segmentation. arXiv:1903.11816

  • Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1395–1403

  • Xu Y, Wang Y, Zhou W, Wang Y, Yang Z, Bai X (2019a) Textfield: learning a deep direction field for irregular scene text detection. IEEE Trans Image Process 28(11):5566–5579

    MathSciNet  MATH  Google Scholar 

  • Xu H, Su X, Liu T, Guo P, Gao G, Bao F (2019b) A natural scene text extraction approach based on generative adversarial learning. In: International conference on neural information processing. Springer, New York, pp 65–73

  • Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 1794–1801

  • Yang X, He D, Zhou Z, Kifer D, Giles CL (2017) Learning to read irregular text with attention mechanisms. In: IJCAI, vol 1, p 3

  • Yang B, Ma AJ, Yuen PC (2018) Learning domain-shared group-sparse representation for unsupervised domain adaptation. Pattern Recogn 81:615–632

    Google Scholar 

  • Yang M, Guan Y, Liao M, He X, Bian K, Bai S, Yao C, Bai X (2019) Symmetry-constrained rectification network for scene text recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9147–9156

  • Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 1083–1090

  • Yao C, Bai X, Liu W (2014a) A unified framework for multioriented text detection and recognition. IEEE Trans Image Process 23(11):4737–4749

    MathSciNet  MATH  Google Scholar 

  • Yao C, Bai X, Shi B, Liu W (2014b) Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4042–4049

  • Yao C, Bai X, Shi B, Liu W (2014c) Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4042–4049

  • Yao C, Wu J, Zhou X, Zhang C, Zhou S, Cao Z, Yin Q (2015) Incidental scene text understanding: Recent progresses on icdar 2015 robust reading competition challenge 4. arXiv:1511.09207

  • Yao C, Bai X, Sang N, Zhou X, Zhou S, Cao Z (2016) Scene text detection via holistic, multi-channel prediction. arXiv:1606.09002

  • Yi C, Tian Y (2011) Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans Image Process 20(9):2594–2605

    MathSciNet  MATH  Google Scholar 

  • Yi C, Tian Y (2013) Text extraction from scene images by character appearance and structure modeling. Comput Vis Image Underst 117(2):182–194

    Google Scholar 

  • Yildirim G, Achanta R, Süsstrunk S (2013) Text recognition in natural images using multiclass hough forests. In: Proceedings of the 8th international conference on computer vision theory and applications, vol 1, pp 737–741

  • Yin X, Yin X-C, Hao H-W, Iqbal K (2012) Effective text localization in natural scene images with mser, geometry-based grouping and adaboost. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 725–728

  • Yin X-C, Yin X, Huang K, Hao H-W (2013) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983

    Google Scholar 

  • Yin X-C, Pei W-Y, Zhang J, Hao H-W (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37(9):1930–1937

    Google Scholar 

  • Yu F, Koltun V, Funkhouser T (2017) Dilated residual networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 472–480

  • Yu D, Li X, Zhang C, Liu T, Han J, Liu J, Ding E (2020) Towards accurate scene text recognition with semantic reasoning networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12113–12122

  • Yuliang L, Lianwen J, Shuaitao Z, Sheng Z (2017) Detecting curve text in the wild: new dataset and new solution. arXiv:1712.02170

  • Zdenek J, Nakayama H (2017) Bag of local convolutional triplets for script identification in scene text. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 369–375

  • Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv:1212.5701

  • Zhan F, Lu S (2019) Esir: end-to-end scene text recognition via iterative image rectification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2059–2068

  • Zhang C, Yao C, Shi B, Bai X (2015) Automatic discrimination of text and non-text natural images. In: 2015 13th International conference on document analysis and recognition (icdar). IEEE, pp 886–890

  • Zhang S, Liu Y, Jin L, Luo C (2018) Feature enhancement network: a refined scene text detector. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  • Zhang Y, Nie S, Liu W, Xu X, Zhang D, Shen HT (2019) Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2740–2749

  • Zhang S-X, Zhu X, Yang C, Wang H, Yin X-C (2021a) Adaptive boundary proposal network for arbitrary shape text detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1305–1314

  • Zhang M, Ma M, Wang P (2021b) Scene text recognition with cascade attention network. In: Proceedings of the 2021 international conference on multimedia retrieval, pp 385–393

  • Zhao D, Shivakumara P, Lu S, Tan CL (2012) New spatial-gradient-features for video script identification. In: 2012 10th IAPR international workshop on document analysis systems. IEEE, pp 38–42

  • Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890

  • Zharikov I, Nikitin P, Vasiliev I, Dokholyan V (2020) Ddi-100: Dataset for text detection and recognition. In: Proceedings of the 2020 4th international symposium on computer science and intelligent control, pp 1–5

  • Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017a) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5551–5560

  • Zhuo J, Wang S, Zhang W, Huang Q (2017b) Deep unsupervised convolutional domain adaptation. In: Proceedings of the 25th ACM international conference on multimedia, pp 261–269

  • Zhu Y, Du J (2021) Textmountain: accurate scene text detection via instance segmentation. Pattern Recogn 110:107336

    Google Scholar 

  • Zhu X, Zhang Z (2021) Transformer-based end-to-end scene text recognition. In: 2021 IEEE 16th conference on industrial electronics and applications (ICIEA), pp 1691–1695. https://doi.org/10.1109/ICIEA51954.2021.9516154

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaushik Roy.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghosh, M., Mukherjee, H., Obaidullah, S.M. et al. Scene text understanding: recapitulating the past decade. Artif Intell Rev 56, 15301–15373 (2023). https://doi.org/10.1007/s10462-023-10530-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-023-10530-3

Keywords

Navigation