Class-Balanced Loss for Scene Text Detection

Huang, Randong; Xu, Bo

doi:10.1007/978-3-030-63833-7_21

Randong Huang^14,15 &
Bo Xu¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12533))

Included in the following conference series:

International Conference on Neural Information Processing

2327 Accesses

Abstract

To address class imbalance issue in scene text detection, we propose two novel loss functions, namely Class-Balanced Self Adaption Loss (CBSAL) and Class-Balanced First Power Loss (CBFPL). Specifically, CBSAL reshapes Cross Entropy (CE) loss to down-weight easy negatives and up-weight positives. However, CBSAL ignores gradient imbalance that CE gives positives and negatives different gradients. Since text detectors need to identify text and background simultaneously, positives and negatives have same importance and should possess equivalent gradients. Thus CBFPL provides equal but opposite gradients for positives and negatives to eliminate this gradient imbalance. Then, CBFPL abandons easy negatives and makes their gradients zero to handle class imbalance. Both CBSAL and CBFPL can focus training on positives and hard negatives. Experimental results show that on the basis of CBSAL and CBFPL, the efficient and accurate scene text detector (EAST) can achieve higher F-score on ICDAR2015, MSRA-TD500 and CASIA-10K datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg, "Ssd: Single shot multibox detector," in European conference on computer vision. Springer, 2016, pp. 21–37
Google Scholar
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in Advances in neural information processing systems, 2015, pp. 91–99
Google Scholar
Yingying Jiang, Xiangyu Zhu, Xiaobing Wang, Shuli Yang, Wei Li, Hua Wang, Pei Fu, and Zhenbo Luo, "R2cnn: rotational region cnn for orientation robust scene text detection," arXiv preprint arXiv:1706.09579, 2017
Liao, Minghui, Shi, Baoguang, Bai, Xiang: Textboxes++: A single-shot oriented scene text detector. IEEE Transactions on Image Processing 27(8), 3676–3690 (2018)
Article MathSciNet Google Scholar
Baoguang Shi, Xiang Bai, and Serge Belongie, "Detecting oriented text in natural images by linking segments," arXiv preprint arXiv:1703.06520, 2017
Pengyuan Lyu, Cong Yao, Wenhao Wu, Shuicheng Yan, and Xiang Bai, "Multi-oriented scene text detection via corner localization and region segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7553–7563
Google Scholar
Minghui Liao, Zhen Zhu, Baoguang Shi, Gui-song Xia, and Xiang Bai, "Rotation-sensitive regression for oriented scene text detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5909–5918
Google Scholar
Yuliang Liu and Lianwen Jin, "Deep matching prior network: Toward tighter multi-oriented text detection," in Proc. CVPR, 2017, pp. 3454–3461
Google Scholar
Wenhao He, Xu-Yao Zhang, Fei Yin, and Cheng-Lin Liu, "Deep direct regression for multi-oriented scene text detection," arXiv preprint arXiv:1703.08289, 2017
Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang, "East: an efficient and accurate scene text detector," in Proc. CVPR, 2017, pp. 2642–2651
Google Scholar
Lichao Huang, Yi Yang, Yafeng Deng, and Yinan Yu, "Densebox: Unifying landmark localization with end to end object detection," arXiv preprint arXiv:1509.04874, 2015
Qiangpeng Yang, Mengli Cheng, Wenmeng Zhou, Yan Chen, Minghui Qiu, and Wei Lin, "Inceptext: A new inception-text module with deformable psroi pooling for multi-oriented scene text detection," arXiv preprint arXiv:1805.01167, 2018
Jingchao Liu, Xuebo Liu, Jie Sheng, Ding Liang, Xin Li, and Qingjie Liu, "Pyramid mask text detector," arXiv preprint arXiv:1903.11800, 2019
Xie, Enze, Zang, Yuhang, Shao, Shuai, Gang, Yu., Yao, Cong, Li, Guangyao: Scene text detection with supervised pyramid context network. Proceedings of the AAAI Conference on Artificial Intelligence 33, 9038–9045 (2019)
Article Google Scholar
Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, and Yichen Wei, "Fully convolutional instance-aware semantic segmentation," arXiv preprint arXiv:1611.07709, 2016
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick, "Mask r-cnn," in Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017, pp. 2980–2988
Google Scholar
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár, "Focal loss for dense object detection," arXiv preprint arXiv:1708.02002, 2017
Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick, "Training region-based object detectors with online hard example mining," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 761–769
Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778
Google Scholar
Olaf Ronneberger, Philipp Fischer, and Thomas Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241
Google Scholar
Dimosthenis Karatzas, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas, Lukas Neumann, Vijay Ramaseshan Chandrasekhar, Shijian Lu, et al., "Icdar 2015 competition on robust reading," in Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. IEEE, 2015, pp. 1156–1160
Google Scholar
Zhuowen Tu, Yi Ma, Wenyu Liu, Xiang Bai, and Cong Yao, "Detecting texts of arbitrary orientations in natural images," in 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012, pp. 1083–1090
Google Scholar
He, Wenhao, Zhang, Xu-Yao, Yin, Fei, Liu, Cheng-Lin: Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Transactions on Image Processing 27(11), 5406–5419 (2018)
Article MathSciNet Google Scholar
Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenyu Liu, and Xiang Bai, "Multi-oriented text detection with fully convolutional networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4159–4167
Google Scholar
Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao, "Detecting text in natural image with connectionist text proposal network," in European conference on computer vision. Springer, 2016, pp. 56–72
Google Scholar
Cong Yao, Xiang Bai, Nong Sang, Xinyu Zhou, Shuchang Zhou, and Zhimin Cao, "Scene text detection via holistic, multi-channel prediction," arXiv preprint arXiv:1606.09002, 2016
Ma, Jianqi, Shao, Weiyuan, Ye, Hao, Wang, Li, Wang, Hong, Zheng, Yingbin, Xue, Xiangyang: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia (2018)
Google Scholar
Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez i Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazan Almazan, and Lluis Pere De Las Heras, "Icdar 2013 robust reading competition," in Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, 2013, pp. 1484–1493
Google Scholar
Yin, Xu-Cheng, Pei, Wei-Yi, Zhang, Jun, Hao, Hong-Wei: Multi-orientation scene text detection with adaptive clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence 9, 1930–1937 (2015)
Article Google Scholar
Yao, Cong, Bai, Xiang, Liu, Wenyu: A unified framework for multioriented text detection and recognition. IEEE Transactions on Image Processing 23(11), 4737–4749 (2014)
Article MathSciNet Google Scholar

Download references

Acknowledgement

This work was supported by the Major Project for New Generation of AI (Grant No. 2018AAA0100400).

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Randong Huang & Bo Xu
University of Chinese Academy of Sciences, Beijing, China
Randong Huang

Authors

Randong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Xu .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, China
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, R., Xu, B. (2020). Class-Balanced Loss for Scene Text Detection. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-63833-7_21
Published: 20 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics