An anchor-free region proposal network for Faster R-CNN-based text detection approaches

Abstract

The anchor mechanism of Faster R-CNN and SSD framework is considered not effective enough to scene text detection, which can be attributed to its Intersection-over-Union-based matching criterion between anchors and ground-truth boxes. In order to better enclose scene text instances of various shapes, it requires to design anchors of various scales, aspect ratios and even orientations manually, which makes anchor-based methods sophisticated and inefficient. In this paper, we propose a novel anchor-free region proposal network (AF-RPN) to replace the original anchor-based RPN in the Faster R-CNN framework to address the above problem. Compared with the anchor-based region proposal generation approaches (e.g., RPN, FPN–RPN, RRPN and FPN–RRPN), AF-RPN can get rid of complicated anchor design and achieves higher recall rate on both horizontal and multi-oriented text detection benchmark tasks. Owing to the high-quality text proposals, our Faster R-CNN-based two-stage text detection approach achieves the state-of-the-art results on ICDAR-2017 MLT, COCO-Text, ICDAR-2015 and ICDAR-2013 text detection benchmark tasks by only using single-scale and single-model testing.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  1. 1.

    Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: ICDAR, pp. 1491–1496 (2011)

  2. 2.

    Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Gomez, L., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras, L.P.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013)

  3. 3.

    Karatzas, D., Gomez, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S.-J., Shafait, F., Uchida, S., Valveny, E.: ICDAR 2015 robust reading competition. In: ICDAR, pp. 1156–1160 (2015)

  4. 4.

    Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z.-B., Pal, U., Rigaud, C., Chazalon, J., Khlif, W., Luqman, M.M., Burie, J.C., Liu, C.-L., Ogier, J.M.: ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification—RRC-MLT. In: ICDAR, pp. 1454–1459 (2017)

  5. 5.

    Ren, S.-Q., He, K.-M., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. PAMI 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  6. 6.

    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: Single shot multiBox detector. In: ECCV (2016)

  7. 7.

    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: BMVC, pp. 384–393 (2002)

  8. 8.

    Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970 (2010)

  9. 9.

    He, W.-H., Zhang, X.-Y., Yin, F., Liu, C.-L.: Deep direct regression for multi-oriented scene text detection. In: ICCV, pp. 745–753 (2017)

  10. 10.

    Zhong, Z.-Y., Jin, L.-W., Huang, S.-P.: DeepText: a new approach for proposal generation and text detection in natural images. In: ICASSP, pp. 1208–1212 (2017)

  11. 11.

    Liao, M.-H., Shi, B.-G., Bai, X., Wang, X.-G., Liu, W.-Y.: TextBoxes: a fast text detector with a single deep neural network. In: AAAI, pp. 4164–4167 (2016)

  12. 12.

    Ma, J.-Q., Shao, W.-Y., Ye, H., Wang, L., Wang, H., Zheng, Y.-B., Xue, X.-Y.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 20(11), 3111–3122 (2018)

    Article  Google Scholar 

  13. 13.

    Liu, Y.-L., Jin, L.-W.: Deep matching prior network toward tighter multi-oriented text detection. In: CVPR, pp. 1962–1969 (2017)

  14. 14.

    Huang, L.-C., Yang, Y., Deng, T.-F., Yu, Y.-N.: Densebox: unifying landmark localization with end to end object detection. Preprint (2015). arXiv:1509.04874

  15. 15.

    Zhou, X.-Y., Yao, C., Wen, H., Wang, Y.-Z., Zhou, S.-C., He, W.-R., Liang, J.-J.: EAST: An efficient and accurate scene text detector. In: CVPR, pp. 5551–5560 (2017)

  16. 16.

    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. PAMI 39(4), 640–651 (2017)

    Article  Google Scholar 

  17. 17.

    Lin, T.-Y., Dollár, P., Girshick, R.B., He, K.-M., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)

  18. 18.

    Veit, A., Matera, T., Neumann, L., Matas, J., Belongie, S.: COCO-Text: dataset and benchmark for text detection and recognition in natural images. Preprint (2016). arXiv:1601.07140

  19. 19.

    Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: ACCV, pp. 770–783 (2010)

  20. 20.

    Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR, pp. 3538–3545 (2012)

  21. 21.

    Yin, X.-C., Yin, X.-W., Huang, K.-Z., Hao, H.-W.: Robust text detection in natural scene images. IEEE Trans. PAMI 36(5), 970–983 (2014)

    Article  Google Scholar 

  22. 22.

    Huang, W.-L., Qiao, Y., Tang, X.-O.: Robust scene text detection with convolutional neural networks induced MSER trees. In: ECCV, pp. 497–511 (2014)

  23. 23.

    Sun, L., Huo, Q., Jia, W., Chen, K.: A robust approach for text detection from natural scene images. Pattern Recogn. 48(9), 2906–2920 (2015)

    Article  Google Scholar 

  24. 24.

    Yin, X.-C., Pei, W.-Y., Zhang, J., Hao, H.-W.: Multi-orientation scene text detection with adaptive clustering. IEEE Trans. PAMI 37(9), 1930–1937 (2015)

    Article  Google Scholar 

  25. 25.

    Lu, S.-J., Chen, T., Tian, S.-X., Lim, J.-H., Tan, C.-L.: Scene text extraction based on edges and support vector regression. IJDAR 18(2), 125–135 (2015)

    Article  Google Scholar 

  26. 26.

    Gomez, L., Karatzas, D.: A fast hierarchical method for multi-script and arbitrary oriented scene text extraction. IJDAR 19(4), 335–349 (2016)

    Article  Google Scholar 

  27. 27.

    Fabrizio, J., Robert-Seidowsky, M., Dubuisson, S., Calarasanu, S., Boissel, R.: TextCatcher: a method to detect curved and challenging text in natural scenes. IJDAR 19(2), 99–117 (2016)

    Article  Google Scholar 

  28. 28.

    Gomez, L., Karatzas, D.: TextProposals: a text-specific selective search algorithm for word spotting in the wild. Pattern Recogn. 70, 60–74 (2017)

    Article  Google Scholar 

  29. 29.

    Wang, T., Wu, D.-J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: ICPR, pp. 3304–3308 (2012)

  30. 30.

    Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: ECCV, pp. 512–528 (2014)

  31. 31.

    Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: CVPR, pp. 4159–4167 (2016)

  32. 32.

    Yao, C., Bai, X., Sang, N., Zhou, X.-Y., Zhou, S.-C., Cao, Z.-M.: Scene text detection via holistic, multi-channel prediction. Preprint (2016). arXiv:1606.09002

  33. 33.

    Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. IJCV 116(1), 1–20 (2016)

    MathSciNet  Article  Google Scholar 

  34. 34.

    Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localization in natural images. In: CVPR, pp. 2315–2324 (2016)

  35. 35.

    Tian, Z., Huang, W.-L., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: ECCV, pp. 56–72 (2016)

  36. 36.

    Shi, B.-G., Bai, X., Belongiey, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 2550–2558 (2017)

  37. 37.

    Hu, H., Zhang, C.-Q., Luo, Y.-X., Wang, Y.-Z., Han, J.-Y., Ding, E.: WordSup: exploiting word annotations for character based text detection. In: ICCV, pp. 4940–4949 (2017)

  38. 38.

    Jung, K., Kim, K., Jain, A.: Text information extraction in images and video: a survey. Pattern Recogn. 37(5), 977–997 (2004)

    Article  Google Scholar 

  39. 39.

    Renton, G., Soullard, Y., Chatelain, C., Adam, S., Kermorvant, C., Paquet, T.: Fully convolutional network with dilated convolutions for handwritten text line segmentation. IJDAR 21(3), 177–186 (2018)

    Article  Google Scholar 

  40. 40.

    Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)

  41. 41.

    Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)

  42. 42.

    Deng, D., Liu, H.-F., Li, X.-L., Cai, D.: Pixellink: detecting scene text via instance segmentation. In: AAAI (2018)

  43. 43.

    Lin, T.-Y., Goyal, P., Girshick, R.B., He, K.-M., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017)

  44. 44.

    He, K.-M., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: ICCV, pp. 2980–2988 (2017)

  45. 45.

    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

  46. 46.

    He, K.-M., Zhang, X.-Y., Ren, S.-Q., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

  47. 47.

    Girshick, R.B.: Fast R-CNN. In: ICCV (2015)

  48. 48.

    Gomez, R., Shi, B.-G., Gomez, L., Neumann, L., Veit, A., Matas, J., Belongie, S., Karatzas, D.: ICDAR2017 robust reading challenge on COCO-Text. In: ICDAR, pp. 1435–1443 (2017)

  49. 49.

    Liu, X.-B., Liang, D., Yan, S., Chen, D.-G., Qiao, Y., Yan, J.-J.: FOTS: fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018)

  50. 50.

    Girshick, R.B., Radosavovic, I., Gkioxari, G., Dollár, P., He, K.-M.: Detectron (2018). https://github.com/facebookresearch/detectron

  51. 51.

    Lyu, P.-Y., Yao, C., Wu, W.-H., Yan, S.-C., Bai, X.: Multi-Oriented scene text detection via corner localization and region segmentation. In: CVPR, pp. 7553–7563 (2018)

  52. 52.

    Liao, M.-H., Zhu, Z., Shi, B.-G., Xia, G.-S., Bai, X.: Rotation-sensitive regression for oriented scene text detection. In: CVPR, pp. 5909–5918 (2018)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Zhuoyao Zhong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was done when Z. Zhong was an intern in Speech Group, Microsoft Research Asia, Beijing, China.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhong, Z., Sun, L. & Huo, Q. An anchor-free region proposal network for Faster R-CNN-based text detection approaches. IJDAR 22, 315–327 (2019). https://doi.org/10.1007/s10032-019-00335-y

Download citation

Keywords

  • Scene text detection
  • Anchor
  • Anchor-free
  • Region proposal generation
  • Faster R-CNN