Skip to main content

Class-Balanced Loss for Scene Text Detection

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12533))

Included in the following conference series:

  • 2327 Accesses

Abstract

To address class imbalance issue in scene text detection, we propose two novel loss functions, namely Class-Balanced Self Adaption Loss (CBSAL) and Class-Balanced First Power Loss (CBFPL). Specifically, CBSAL reshapes Cross Entropy (CE) loss to down-weight easy negatives and up-weight positives. However, CBSAL ignores gradient imbalance that CE gives positives and negatives different gradients. Since text detectors need to identify text and background simultaneously, positives and negatives have same importance and should possess equivalent gradients. Thus CBFPL provides equal but opposite gradients for positives and negatives to eliminate this gradient imbalance. Then, CBFPL abandons easy negatives and makes their gradients zero to handle class imbalance. Both CBSAL and CBFPL can focus training on positives and hard negatives. Experimental results show that on the basis of CBSAL and CBFPL, the efficient and accurate scene text detector (EAST) can achieve higher F-score on ICDAR2015, MSRA-TD500 and CASIA-10K datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg, "Ssd: Single shot multibox detector," in European conference on computer vision. Springer, 2016, pp. 21–37

    Google Scholar 

  2. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in Advances in neural information processing systems, 2015, pp. 91–99

    Google Scholar 

  3. Yingying Jiang, Xiangyu Zhu, Xiaobing Wang, Shuli Yang, Wei Li, Hua Wang, Pei Fu, and Zhenbo Luo, "R2cnn: rotational region cnn for orientation robust scene text detection," arXiv preprint arXiv:1706.09579, 2017

  4. Liao, Minghui, Shi, Baoguang, Bai, Xiang: Textboxes++: A single-shot oriented scene text detector. IEEE Transactions on Image Processing 27(8), 3676–3690 (2018)

    Article  MathSciNet  Google Scholar 

  5. Baoguang Shi, Xiang Bai, and Serge Belongie, "Detecting oriented text in natural images by linking segments," arXiv preprint arXiv:1703.06520, 2017

  6. Pengyuan Lyu, Cong Yao, Wenhao Wu, Shuicheng Yan, and Xiang Bai, "Multi-oriented scene text detection via corner localization and region segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7553–7563

    Google Scholar 

  7. Minghui Liao, Zhen Zhu, Baoguang Shi, Gui-song Xia, and Xiang Bai, "Rotation-sensitive regression for oriented scene text detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5909–5918

    Google Scholar 

  8. Yuliang Liu and Lianwen Jin, "Deep matching prior network: Toward tighter multi-oriented text detection," in Proc. CVPR, 2017, pp. 3454–3461

    Google Scholar 

  9. Wenhao He, Xu-Yao Zhang, Fei Yin, and Cheng-Lin Liu, "Deep direct regression for multi-oriented scene text detection," arXiv preprint arXiv:1703.08289, 2017

  10. Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang, "East: an efficient and accurate scene text detector," in Proc. CVPR, 2017, pp. 2642–2651

    Google Scholar 

  11. Lichao Huang, Yi Yang, Yafeng Deng, and Yinan Yu, "Densebox: Unifying landmark localization with end to end object detection," arXiv preprint arXiv:1509.04874, 2015

  12. Qiangpeng Yang, Mengli Cheng, Wenmeng Zhou, Yan Chen, Minghui Qiu, and Wei Lin, "Inceptext: A new inception-text module with deformable psroi pooling for multi-oriented scene text detection," arXiv preprint arXiv:1805.01167, 2018

  13. Jingchao Liu, Xuebo Liu, Jie Sheng, Ding Liang, Xin Li, and Qingjie Liu, "Pyramid mask text detector," arXiv preprint arXiv:1903.11800, 2019

  14. Xie, Enze, Zang, Yuhang, Shao, Shuai, Gang, Yu., Yao, Cong, Li, Guangyao: Scene text detection with supervised pyramid context network. Proceedings of the AAAI Conference on Artificial Intelligence 33, 9038–9045 (2019)

    Article  Google Scholar 

  15. Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, and Yichen Wei, "Fully convolutional instance-aware semantic segmentation," arXiv preprint arXiv:1611.07709, 2016

  16. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick, "Mask r-cnn," in Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017, pp. 2980–2988

    Google Scholar 

  17. Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár, "Focal loss for dense object detection," arXiv preprint arXiv:1708.02002, 2017

  18. Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick, "Training region-based object detectors with online hard example mining," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 761–769

    Google Scholar 

  19. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

    Google Scholar 

  20. Olaf Ronneberger, Philipp Fischer, and Thomas Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241

    Google Scholar 

  21. Dimosthenis Karatzas, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas, Lukas Neumann, Vijay Ramaseshan Chandrasekhar, Shijian Lu, et al., "Icdar 2015 competition on robust reading," in Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. IEEE, 2015, pp. 1156–1160

    Google Scholar 

  22. Zhuowen Tu, Yi Ma, Wenyu Liu, Xiang Bai, and Cong Yao, "Detecting texts of arbitrary orientations in natural images," in 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012, pp. 1083–1090

    Google Scholar 

  23. He, Wenhao, Zhang, Xu-Yao, Yin, Fei, Liu, Cheng-Lin: Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Transactions on Image Processing 27(11), 5406–5419 (2018)

    Article  MathSciNet  Google Scholar 

  24. Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenyu Liu, and Xiang Bai, "Multi-oriented text detection with fully convolutional networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4159–4167

    Google Scholar 

  25. Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao, "Detecting text in natural image with connectionist text proposal network," in European conference on computer vision. Springer, 2016, pp. 56–72

    Google Scholar 

  26. Cong Yao, Xiang Bai, Nong Sang, Xinyu Zhou, Shuchang Zhou, and Zhimin Cao, "Scene text detection via holistic, multi-channel prediction," arXiv preprint arXiv:1606.09002, 2016

  27. Ma, Jianqi, Shao, Weiyuan, Ye, Hao, Wang, Li, Wang, Hong, Zheng, Yingbin, Xue, Xiangyang: Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia (2018)

    Google Scholar 

  28. Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez i Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazan Almazan, and Lluis Pere De Las Heras, "Icdar 2013 robust reading competition," in Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, 2013, pp. 1484–1493

    Google Scholar 

  29. Yin, Xu-Cheng, Pei, Wei-Yi, Zhang, Jun, Hao, Hong-Wei: Multi-orientation scene text detection with adaptive clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence 9, 1930–1937 (2015)

    Article  Google Scholar 

  30. Yao, Cong, Bai, Xiang, Liu, Wenyu: A unified framework for multioriented text detection and recognition. IEEE Transactions on Image Processing 23(11), 4737–4749 (2014)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgement

This work was supported by the Major Project for New Generation of AI (Grant No. 2018AAA0100400).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, R., Xu, B. (2020). Class-Balanced Loss for Scene Text Detection. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63833-7_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63832-0

  • Online ISBN: 978-3-030-63833-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics