Abstract
To address class imbalance issue in scene text detection, we propose two novel loss functions, namely Class-Balanced Self Adaption Loss (CBSAL) and Class-Balanced First Power Loss (CBFPL). Specifically, CBSAL reshapes Cross Entropy (CE) loss to down-weight easy negatives and up-weight positives. However, CBSAL ignores gradient imbalance that CE gives positives and negatives different gradients. Since text detectors need to identify text and background simultaneously, positives and negatives have same importance and should possess equivalent gradients. Thus CBFPL provides equal but opposite gradients for positives and negatives to eliminate this gradient imbalance. Then, CBFPL abandons easy negatives and makes their gradients zero to handle class imbalance. Both CBSAL and CBFPL can focus training on positives and hard negatives. Experimental results show that on the basis of CBSAL and CBFPL, the efficient and accurate scene text detector (EAST) can achieve higher F-score on ICDAR2015, MSRA-TD500 and CASIA-10K datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg, "Ssd: Single shot multibox detector," in European conference on computer vision. Springer, 2016, pp. 21–37
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in Advances in neural information processing systems, 2015, pp. 91–99
Yingying Jiang, Xiangyu Zhu, Xiaobing Wang, Shuli Yang, Wei Li, Hua Wang, Pei Fu, and Zhenbo Luo, "R2cnn: rotational region cnn for orientation robust scene text detection," arXiv preprint arXiv:1706.09579, 2017
Liao, Minghui, Shi, Baoguang, Bai, Xiang: Textboxes++: A single-shot oriented scene text detector. IEEE Transactions on Image Processing 27(8), 3676–3690 (2018)
Baoguang Shi, Xiang Bai, and Serge Belongie, "Detecting oriented text in natural images by linking segments," arXiv preprint arXiv:1703.06520, 2017
Pengyuan Lyu, Cong Yao, Wenhao Wu, Shuicheng Yan, and Xiang Bai, "Multi-oriented scene text detection via corner localization and region segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7553–7563
Minghui Liao, Zhen Zhu, Baoguang Shi, Gui-song Xia, and Xiang Bai, "Rotation-sensitive regression for oriented scene text detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5909–5918
Yuliang Liu and Lianwen Jin, "Deep matching prior network: Toward tighter multi-oriented text detection," in Proc. CVPR, 2017, pp. 3454–3461
Wenhao He, Xu-Yao Zhang, Fei Yin, and Cheng-Lin Liu, "Deep direct regression for multi-oriented scene text detection," arXiv preprint arXiv:1703.08289, 2017
Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang, "East: an efficient and accurate scene text detector," in Proc. CVPR, 2017, pp. 2642–2651
Lichao Huang, Yi Yang, Yafeng Deng, and Yinan Yu, "Densebox: Unifying landmark localization with end to end object detection," arXiv preprint arXiv:1509.04874, 2015
Qiangpeng Yang, Mengli Cheng, Wenmeng Zhou, Yan Chen, Minghui Qiu, and Wei Lin, "Inceptext: A new inception-text module with deformable psroi pooling for multi-oriented scene text detection," arXiv preprint arXiv:1805.01167, 2018
Jingchao Liu, Xuebo Liu, Jie Sheng, Ding Liang, Xin Li, and Qingjie Liu, "Pyramid mask text detector," arXiv preprint arXiv:1903.11800, 2019
Xie, Enze, Zang, Yuhang, Shao, Shuai, Gang, Yu., Yao, Cong, Li, Guangyao: Scene text detection with supervised pyramid context network. Proceedings of the AAAI Conference on Artificial Intelligence 33, 9038–9045 (2019)
Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, and Yichen Wei, "Fully convolutional instance-aware semantic segmentation," arXiv preprint arXiv:1611.07709, 2016
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick, "Mask r-cnn," in Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017, pp. 2980–2988
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár, "Focal loss for dense object detection," arXiv preprint arXiv:1708.02002, 2017
Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick, "Training region-based object detectors with online hard example mining," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 761–769
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778
Olaf Ronneberger, Philipp Fischer, and Thomas Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241
Dimosthenis Karatzas, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas, Lukas Neumann, Vijay Ramaseshan Chandrasekhar, Shijian Lu, et al., "Icdar 2015 competition on robust reading," in Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. IEEE, 2015, pp. 1156–1160
Zhuowen Tu, Yi Ma, Wenyu Liu, Xiang Bai, and Cong Yao, "Detecting texts of arbitrary orientations in natural images," in 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012, pp. 1083–1090
He, Wenhao, Zhang, Xu-Yao, Yin, Fei, Liu, Cheng-Lin: Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Transactions on Image Processing 27(11), 5406–5419 (2018)
Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenyu Liu, and Xiang Bai, "Multi-oriented text detection with fully convolutional networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4159–4167
Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao, "Detecting text in natural image with connectionist text proposal network," in European conference on computer vision. Springer, 2016, pp. 56–72
Cong Yao, Xiang Bai, Nong Sang, Xinyu Zhou, Shuchang Zhou, and Zhimin Cao, "Scene text detection via holistic, multi-channel prediction," arXiv preprint arXiv:1606.09002, 2016
Ma, Jianqi, Shao, Weiyuan, Ye, Hao, Wang, Li, Wang, Hong, Zheng, Yingbin, Xue, Xiangyang: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia (2018)
Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez i Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazan Almazan, and Lluis Pere De Las Heras, "Icdar 2013 robust reading competition," in Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, 2013, pp. 1484–1493
Yin, Xu-Cheng, Pei, Wei-Yi, Zhang, Jun, Hao, Hong-Wei: Multi-orientation scene text detection with adaptive clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence 9, 1930–1937 (2015)
Yao, Cong, Bai, Xiang, Liu, Wenyu: A unified framework for multioriented text detection and recognition. IEEE Transactions on Image Processing 23(11), 4737–4749 (2014)
Acknowledgement
This work was supported by the Major Project for New Generation of AI (Grant No. 2018AAA0100400).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, R., Xu, B. (2020). Class-Balanced Loss for Scene Text Detection. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-63833-7_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)