Abstract
Scene text recognition has received increasing attention in the research community. Text in the wild often possesses irregular arrangements, which typically include perspective, curved, and oriented texts. Most of the existing methods do not work well for irregular text, especially for severely distorted text. In this paper, we propose a novel progressive rectification network (PRN) for irregular scene text recognition. Our PRN progressively rectifies the irregular text to a front-horizontal view and further boosts the recognition performance. The distortions are removed step by step by leveraging the observation that the intermediate rectified result provides good guidance for subsequent higher quality rectification. Additionally, by decomposing the rectification process into multiple procedures, the difficulty of each step is considerably mitigated. First, we specifically perform a rough rectification, and then adopt iterative refinement to gradually achieve optimal rectification. Additionally, to avoid the boundary damage problem in direct iterations, we design an envelope-refinement structure to maintain the integrity of the text during the iterative process. Instead of the rectified images, the text line envelope is tracked and continually refined, which implicitly models the transformation information. Then, the original input image is consistently utilized for transformation based on the refined envelope. In this manner, the original character information is preserved until the final transformation. These designs lead to optimal rectification to boost the performance of succeeding recognition. Extensive experiments on eight challenging datasets demonstrate the superiority of our method, especially on irregular benchmarks.
This is a preview of subscription content, access via your institution.
References
- 1
Shi B G, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 2298–2304
- 2
He P, Huang W L, Qiao Y, et al. Reading scene text in deep convolutional sequences. In: Proceedings of AAAI Conference on Artificial Intelligence, 2016. 3501–3508
- 3
Lee C Y, Osindero S. Recursive recurrent nets with attention modeling for ocr in the wild. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 2231–2239
- 4
Cheng Z Z, Bai F, Xu Y L, et al. Focusing attention: towards accurate text recognition in natural images. In: Proceedings of IEEE International Conference on Computer Vision, 2017. 5086–5094
- 5
Shi B G, Wang X G, Lyu P Y, et al. Robust scene text recognition with automatic rectification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 4168–4176
- 6
Shi B G, Yang M K, Wang X G, et al. ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans Pattern Anal Mach Intell, 2019, 41: 2035–2048
- 7
Yang M K, Guan Y S, Liao M H, et al. Symmetry-constrained rectification network for scene text recognition. In: Proceedings of IEEE International Conference on Computer Vision, 2019
- 8
Jaderberg M, Simonyan K, Zisserman A, et al. Spatial transformer networks. In: Proceedings of Advances in Neural Information Processing Systems, 2015. 2017–2025
- 9
Wang K, Babenko B, Belongie S. End-to-end scene text recognition. In: Proceedings of IEEE International Conference on Computer Vision, 2011. 1457–1464
- 10
Bissacco A, Cummins M, Netzer Y, et al. Photoocr: reading text in uncontrolled conditions. In: Proceedings of IEEE International Conference on Computer Vision, 2013. 785–792
- 11
Jaderberg M, Simonyan K, Vedaldi A, et al. Reading text in the wild with convolutional neural networks. Int J Comput Vis, 2016, 116: 1–20
- 12
Rodriguez-Serrano J A, Gordo A, Perronnin F. Label embedding: a frugal baseline for text recognition. Int J Comput Vis, 2015, 113: 193–207
- 13
Graves A, Fernández S, Gomez F, et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of International Conference on Machine Learning, 2006. 369–376
- 14
Bai F, Cheng Z Z, Niu Y, et al. Edit probability for scene text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 1508–1516
- 15
Fang S C, Xie H T, Zhang Z J, et al. Attention and language ensemble for scene text recognition with convolutional sequence modeling. In: Proceedings of ACM Conference on Multimedia, 2018. 248–256
- 16
Phan T Q, Shivakumara P, Tian S, et al. Recognizing text with perspective distortion in natural scenes. In: Proceedings of IEEE International Conference on Computer Vision, 2013. 569–576
- 17
Yang X, He D F, Zhou Z H, et al. Learning to read irregular text with attention mechanisms. In: Proceedings of International Joint Conference on Artificial Intelligence, 2017. 3280–3286
- 18
Liu W, Chen C F, Wong K Y K. Char-net: a character-aware neural network for distorted scene text recognition. In: Proceedings of AAAI Conference on Artificial Intelligence, 2018
- 19
Cheng Z Z, Liu X Y, Bai F, et al. AON: towards arbitrarily-oriented text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 5571–5579
- 20
Zhan F N, Lu S J. ESIR: end-to-end scene text recognition via iterative rectification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2059–2068
- 21
Chen J, Lian Z H, Wang Y Z, et al. Irregular scene text detection via attention guided border labeling. Sci China Inf Sci, 2019, 62: 220103
- 22
Bookstein F L. Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans Pattern Anal Machine Intell, 1989, 11: 567–585
- 23
Lin C-H, Lucey S. Inverse compositional spatial transformer networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 2568–2576
- 24
He K, Zhang X Y, Ren S Q, et al. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of IEEE International Conference on Computer Vision, 2015. 1026–1034
- 25
Saxe A M, McClelland J L, Ganguli S. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. 2013. ArXiv: 1312.6120
- 26
Jaderberg M, Simonyan K, Vedaldi A, et al. Synthetic data and artificial neural networks for natural scene text recognition. 2014. ArXiv: 1406.2227
- 27
Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 2315–2324
- 28
Risnumawan A, Shivakumara P, Chan C S, et al. A robust arbitrary text detection system for natural scene images. Expert Syst Appl, 2014, 41: 8027–8048
- 29
Karatzas D, Gomez-Bigorda L, Nicolaou A, et al. ICDAR 2015 competition on robust reading. In: Proceedings of International Conference on Document Analysis and Recognition (ICDAR), 2015. 1156–1160
- 30
Mishra A, Alahari K, Jawahar C. Top-down and bottom-up cues for scene text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2012. 2687–2694
- 31
Lucas S M, Panaretos A, Sosa L, et al. ICDAR 2003 robust reading competitions: entries, results, and future directions. Int J Document Anal Recogn, 2005, 7: 105–122
- 32
Karatzas D, Shafait F, Uchida S, et al. ICDAR 2013 robust reading competition. In: Proceedings of International Conference on Document Analysis and Recognition, 2013. 1484–1493
- 33
Ch’ng C K, Chan C S. Total-text: a comprehensive dataset for scene text detection and recognition. In: Proceedings of International Conference on Document Analysis and Recognition, 2017. 935–942
- 34
Zeiler M D. ADADELTA: an adaptive learning rate method. 2012. ArXiv: 1212.5701
- 35
Ketkar N. Introduction to pytorch. In: Deep Learning with Python. Berkeley: Apress, 2017. 195–208
- 36
Liu W, Chen C F, Wong K K. SAFE: scale aware feature encoder for scene text recognition. In: Proceedings of Asian Conference on Computer Vision, 2018. 196–211
- 37
Luo C J, Jin L W, Sun Z H. MORAN: a multi-object rectified attention network for scene text recognition. Pattern Recogn, 2019, 90: 109–118
- 38
Liu Y, Wang Z W, Jin H L, et al. Synthetically supervised feature learning for scene text recognition. In: Proceedings of European Conference on Computer Vision, 2018. 435–451
- 39
Lyu P Y, Yang Z C, Leng X H, et al. 2D attentional irregular scene text recognizer. 2019. ArXiv: 1906.05708
- 40
Liao M H, Zhang J, Wan Z Y, et al. Scene text recognition from two-dimensional perspective. In: Proceedings of AAAI Conference on Artificial Intelligence, 2019. 8714–8721
- 41
Li H, Wang P, Shen C H, et al. Show, attend and read: a simple and strong baseline for irregular text recognition. In: Proceedings of AAAI Conference on Artificial Intelligence, 2019. 8610–8617
- 42
Wang T, Wu D J, Coates A, et al. End-to-end text recognition with convolutional neural networks. In: Proceedings of International Conference on Pattern Recognition, 2012. 3304–3308
- 43
Yao C, Bai X, Shi B G, et al. Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2014. 4042–4049
- 44
Jaderberg M, Vedaldi A, Zisserman A. Deep features for text spotting. In: Proceedings of European Conference on Computer Vision, 2014. 512–528
- 45
Jaderberg M, Simonyan K, Vedaldi A, et al. Deep structured output learning for unconstrained text recognition. 2014. ArXiv: 1412.5903
- 46
Liu W, Chen C F, Wong K K, et al. Star-net: a spatial attention residue network for scene text recognition. In: Proceedings of British Machine Vision Conference, 2016. 7
- 47
Wang J F, Hu X L. Gated recurrent convolution neural network for ocr. In: Proceedings of Neural Information Processing Systems, 2017. 334–343
- 48
Liu Z C, Li Y X, Ren F B, et al. Squeezedtext: a real-time scene text recognition by binary convolutional encoderdecoder network. In: Proceedings of AAAI Conference on Artificial Intelligence, 2018
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 61772527, 61806200).
Author information
Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gao, Y., Chen, Y., Wang, J. et al. Progressive rectification network for irregular text recognition. Sci. China Inf. Sci. 63, 120101 (2020). https://doi.org/10.1007/s11432-019-2710-7
Received:
Accepted:
Published:
Keywords
- irregular text recognition
- progressive rectification
- iterative refinement