Advertisement

Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting

Conference paper
  • 627 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12356)

Abstract

Recent end-to-end trainable methods for scene text spotting, integrating detection and recognition, showed much progress. However, most of the current arbitrary-shape scene text spotters use region proposal networks (RPN) to produce proposals. RPN relies heavily on manually designed anchors and its proposals are represented with axis-aligned rectangles. The former presents difficulties in handling text instances of extreme aspect ratios or irregular shapes, and the latter often includes multiple neighboring instances into a single proposal, in cases of densely oriented text. To tackle these problems, we propose Mask TextSpotter v3, an end-to-end trainable scene text spotter that adopts a Segmentation Proposal Network (SPN) instead of an RPN. Our SPN is anchor-free and gives accurate representations of arbitrary-shape proposals. It is therefore superior to RPN in detecting text instances of extreme aspect ratios or irregular shapes. Furthermore, the accurate proposals produced by SPN allow masked RoI features to be used for decoupling neighboring text instances. As a result, our Mask TextSpotter v3 can handle text instances of extreme aspect ratios or irregular shapes, and its recognition accuracy won’t be affected by nearby text or background noise. Specifically, we outperform state-of-the-art methods by 21.9% on the Rotated ICDAR 2013 dataset (rotation robustness), 5.9% on the Total-Text dataset (shape robustness), and achieve state-of-the-art performance on the MSRA-TD500 dataset (aspect ratio robustness). Code is available at: https://github.com/MhLiao/MaskTextSpotterV3

Keywords

Scene text Detection Recognition 

Supplementary material

504452_1_En_41_MOESM1_ESM.pdf (1.4 mb)
Supplementary material 1 (pdf 1438 KB)

References

  1. 1.
    Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceeding Conference Computer Vision Pattern Recognition, pp. 9365–9374 (2019)Google Scholar
  2. 2.
    Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: PhotoOCR: reading text in uncontrolled conditions. In: Proceeding International Conference Computer Vision, pp. 785–792 (2013)Google Scholar
  3. 3.
    Busta, M., Neumann, L., Matas, J.: Deep textspotter: an end-to-end trainable scene text localization and recognition framework. In: Proceeding International Conference Computer Vision, pp. 2223–2231 (2017)Google Scholar
  4. 4.
    Ch’ng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. In: Proceeding International Conference on Document Analysis and Recognition, pp. 935–942 (2017)Google Scholar
  5. 5.
    Ch’ng, C.-K., Chan, C.S., Liu, C.-L.: Total-Text: toward orientation robustness in scene text detection. Int. J. Doc. Anal. Recogn. (IJDAR) 23(1), 31–52 (2019).  https://doi.org/10.1007/s10032-019-00334-zCrossRefGoogle Scholar
  6. 6.
    Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. In: AAAI Conference on Artificial Intelligence (2018)Google Scholar
  7. 7.
    Feng, W., He, W., Yin, F., Zhang, X.Y., Liu, C.L.: TextDragon: an end-to-end framework for arbitrary shaped text spotting. In: Proceeding International Conference Computer Vision (2019)Google Scholar
  8. 8.
    Girshick, R.B.: Fast R-CNN. In: Proceeding International Conference Computer Vision, pp. 1440–1448 (2015)Google Scholar
  9. 9.
    Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceeding Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  10. 10.
    Hassner, T., Rehbein, M., Stokes, P.A., Wolf, L.: Computation and palaeography: potentials and limits. Dagstuhl Rep. 2(9), 184–199 (2012)Google Scholar
  11. 11.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceeding Conference Computer Vision Pattern Recognition, pp. 2961–2969 (2017)Google Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceeding Conference Computer Vision Pattern Recognition, pp. 770–778 (2016)Google Scholar
  13. 13.
    He, T., Huang, W., Qiao, Y., Yao, J.: Accurate text localization in natural image with cascaded convolutional text network. CoRR abs/1603.09423 (2016)Google Scholar
  14. 14.
    He, T., Huang, W., Qiao, Y., Yao, J.: Text-attentional convolutional neural network for scene text detection. Trans. Image Process. 25(6), 2529–2541 (2016)MathSciNetCrossRefGoogle Scholar
  15. 15.
    He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., Sun, C.: An end-to-end textspotter with explicit alignment and attention. In: Proceeding Conference Computer Vision Pattern Recognition, pp. 5020–5029 (2018)Google Scholar
  16. 16.
    He, W., Zhang, X., Yin, F., Liu, C.: Deep direct regression for multi-oriented scene text detection. In: Proceeding Conference Computer Vision Pattern Recognition (2017)Google Scholar
  17. 17.
    Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision 116(1), 1–20 (2016).  https://doi.org/10.1007/s11263-015-0823-z
  18. 18.
    Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings International Conference on Document Analysis and Recognition, pp. 1156–1160 (2015)Google Scholar
  19. 19.
    Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013)Google Scholar
  20. 20.
    Li, H., Wang, P., Shen, C.: Towards end-to-end text spotting with convolutional recurrent neural networks. In: Proceedings International Conference Computer Vision, pp. 5248–5256 (2017)Google Scholar
  21. 21.
    Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask TextSpotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. Trans. Pattern Anal. Mach. Intell., 1–1 (2019)Google Scholar
  22. 22.
    Liao, M., Shi, B., Bai, X.: TextBoxes++: a single-shot oriented scene text detector. Trans. Image Processing 27(8), 3676–3690 (2018)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: A fast text detector with a single deep neural network. In: AAAI Conference on Artificial Intelligence (2017)Google Scholar
  24. 24.
    Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: AAAI Conference on Artificial Intelligence, pp. 11474–11481 (2020)Google Scholar
  25. 25.
    Liao, M., Zhu, Z., Shi, B., Xia, G.S., Bai, X.: Rotation-sensitive regression for oriented scene text detection. In: Proceedings Conference Computer Vision Pattern Recognition, pp. 5909–5918 (2018)Google Scholar
  26. 26.
    Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceeding Conference Computer Vision Pattern Recognition, pp. 2117–2125 (2017)Google Scholar
  27. 27.
    Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: FOTS: Fast oriented text spotting with a unified network. In: Proceeding Conference Computer Vision Pattern Recognition, pp. 5676–5685 (2018)Google Scholar
  28. 28.
    Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In: Proceeding Conference Computer Vision Pattern Recognition, pp. 9809–9818 (2020)Google Scholar
  29. 29.
    Liu, Z., Lin, G., Yang, S., Feng, J., Lin, W., Goh, W.L.: Learning markov clustering networks for scene text detection. In: Proceeding Conference Computer Vision Pattern Recognition, pp. 6936–6944 (2018)Google Scholar
  30. 30.
    Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask TextSpotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: European Conference Computer Vision, pp. 71–88 (2018)Google Scholar
  31. 31.
    Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings Conference Computer Vision Pattern Recognition, pp. 7553–7563 (2018)Google Scholar
  32. 32.
    Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: International Conference on 3D Vision, pp. 565–571 (2016)Google Scholar
  33. 33.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: European Conference Computer Vision, pp. 483–499 (2016)Google Scholar
  34. 34.
    Qin, S., Bissacco, A., Raptis, M., Fujii, Y., Xiao, Y.: Towards unconstrained end-to-end text spotting. In: Proceedings of the International Conference Computer Vision (2019)Google Scholar
  35. 35.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Neural Information Processing System, 91–99 (2015)Google Scholar
  36. 36.
    Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Int. Conf. on Medical image computing and computer-assisted intervention. pp. 234–241. Springer (2015)Google Scholar
  37. 37.
    Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)CrossRefGoogle Scholar
  38. 38.
    Tian, Z., Shu, M., Lyu, P., Li, R., Zhou, C., Shen, X., Jia, J.: Learning shape-aware embedding for scene text detection. In: Proceeding Conference Computer Vision Pattern Recognition, pp. 4234–4243 (2019)Google Scholar
  39. 39.
    Vatti, B.R.: A generic solution to polygon clipping. Commun. ACM 35(7), 56–64 (1992)CrossRefGoogle Scholar
  40. 40.
    Wang, H., et al.: All you need is boundary: toward arbitrary-shaped text spotting. In: AAAI Conference on Artificial Intelligence, pp. 12160–12167 (2020)Google Scholar
  41. 41.
    Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: International Conference Pattern Recognition (2012)Google Scholar
  42. 42.
    Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., Shao, S.: Shape robust text detection with progressive scale expansion network. In: Proceeding Conference Computer Vision Pattern Recognition, pp. 9336–9345 (2019)Google Scholar
  43. 43.
    Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: Proceeding Conference Computer Vision Pattern Recognition (2019)Google Scholar
  44. 44.
    Xue, C., Lu, S., Zhan, F.: Accurate scene text detection through border semantics awareness and bootstrapping. In: European Conference Computer Vision, pp. 355–372 (2018)Google Scholar
  45. 45.
    Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: Proceeding Conference Computer Vision Pattern Recognition (2012)Google Scholar
  46. 46.
    Zhan, F., Xue, C., Lu, S.: GA-DAN: geometry-aware domain adaptation network for scene text detection and recognition. In: Proceeding Conference Computer Vision Pattern Recognition (2019)Google Scholar
  47. 47.
    Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Proceeding Conference Computer Vision Pattern Recognition, pp. 4159–4167 (2016)Google Scholar
  48. 48.
    Zhong, Z., Jin, L., Zhang, S., Feng, Z.: DeepText: A unified framework for text proposal generation and text detection in natural images. CoRR abs/1605.07314 (2016)Google Scholar
  49. 49.
    Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: EAST: an efficient and accurate scene text detector. In: Proceeding Conference Computer Vision Pattern Recognition, pp. 2642–2651 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Huazhong University of Science and TechnologyWuhanChina
  2. 2.Facebook AIMenlo ParkUSA

Personalised recommendations