Advertisement

Total-Text: toward orientation robustness in scene text detection

  • Chee-Kheng Ch’ng
  • Chee Seng ChanEmail author
  • Cheng-Lin Liu
Original Paper
  • 37 Downloads

Abstract

At present, text orientation is not diverse enough in the existing scene text datasets. Specifically, curve-orientated text is largely out-numbered by horizontal and multi-oriented text, hence, it has received minimal attention from the community so far. Motivated by this phenomenon, we collected a new scene text dataset, Total-Text, which emphasized on text orientations diversity. It is the first relatively large scale scene text dataset that features three different text orientations: horizontal, multi-oriented, and curve-oriented. In addition, we also study several other important elements such as the practicality and quality of ground truth, evaluation protocol, and the annotation process. We believe that these elements are as important as the images and ground truth to facilitate a new research direction. Secondly, we propose a new scene text detection model as the baseline for Total-Text, namely Polygon-Faster-RCNN, and demonstrated its ability to detect text of all orientations. Images of Total-Text and its annotation are available at https://github.com/cs-chan/Total-Text-Dataset.

Keywords

Curved text Scene text detection 

Notes

Acknowledgements

Funding was provided by Fundamental Research Grant Scheme (FRGS) MoHE (Grant No. FP004-2016) and Postgraduate Research Grant (PPP) (Grant No. PG350-2016A). The authors acknowledge all the authors who provided their results for our experiments. Also, we would like to thank Chun Chet Ng for his contribution in aiding the annotation process of Total-Text.

References

  1. 1.
    Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: ICDAR 2013 robust reading competition. In: 12th International Conference on Document Analysis and Recognition (ICDAR). 37(7), pp. 1484–1493 (2013)Google Scholar
  2. 2.
    Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1083–1090 (2012)Google Scholar
  3. 3.
    Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)CrossRefGoogle Scholar
  4. 4.
    Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2558–2567 (2015)Google Scholar
  5. 5.
    Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1241–1248 (2013)Google Scholar
  6. 6.
    Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 97–104 (2013)Google Scholar
  7. 7.
    Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced mser trees. In: European Conference on Computer Vision, pp. 497–511 (2014)Google Scholar
  8. 8.
    Pan, Y.-F., Hou, X., Liu, C.-L.: A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image Process. 20(3), 800–813 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F. (2015) ICDAR 2015 competition on robust reading. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160Google Scholar
  10. 10.
    Veit, A., Matera, T., Neumann, L., Matas, J., Belongie, S.: Coco-text: dataset and benchmark for text detection and recognition in natural images (2016). arXiv preprint arXiv:1601.07140
  11. 11.
    Gupta, A., Vedaldi, A., Zisserman, A.: Building a perception based model for reading cursive script. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)Google Scholar
  12. 12.
    Ch’ng, C.K., Chan, C.S.: Total-Text: a comprehensive dataset for scene text detection and recognition. In: IEEE 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 935–942 (2017)Google Scholar
  13. 13.
    Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. In: Pattern Recognition (2019)Google Scholar
  14. 14.
    Risnumawan, A., Shivakumara, P., Chan, C.S., Tan, C.L.: A robust arbitrary text detection system for natural scene images. Expert Syst. Appl. 41(18), 8027–8048 (2014)CrossRefGoogle Scholar
  15. 15.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015) Google Scholar
  16. 16.
    Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recognit. (IJDAR). 8(4), 280–296 (2006)CrossRefGoogle Scholar
  17. 17.
    Karatzas, D., Gómez, L., Nicolaou, A., Rusiñol, M.: The robust reading competition annotation and evaluation platform. In: 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 61–66 (2018)Google Scholar
  18. 18.
    Yin, X.C., Pei, W.Y., Zhang, J., Hao, H.W.: Multi-orientation scene text detection with adaptive clustering. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1930–1937 (2015)CrossRefGoogle Scholar
  19. 19.
    Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z, et al.: ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT. In: IEEE 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1 pp. 1454–1459 (2017)Google Scholar
  20. 20.
    Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: ICDAR2017 competition on reading Chinese text in the wild (RCTW-17). In: IEEE 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1 pp. 1429–1434 (2017)Google Scholar
  21. 21.
    He, M., Liu, Y., Yang, Z., Zhang, S., Luo, C., Gao, F., Zheng, Q., Wang, Y., Zhang, X., Jin, L.: ICPR2018 contest on robust reading for multi-type web images. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 7–12 (2018)Google Scholar
  22. 22.
    Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2963–2970 (2010)Google Scholar
  23. 23.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. In: Image and Vision Computing, pp. 761–767 (2004)Google Scholar
  24. 24.
    Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: International Conference in Pattern Recognition, pp. 3304–3308 (2012)Google Scholar
  25. 25.
    Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: European Conference on Computer Vision, pp. 512–528 (2014)Google Scholar
  26. 26.
    He, T., Huang, W., Qiao, Y., Yao, J.: Text-attentional convolutional neural network for scene text detection. IEEE Trans. Image Process. 25(6), 2529–2541 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4159–4167 (2016)Google Scholar
  28. 28.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440 (2015)Google Scholar
  29. 29.
    He, T., Huang, W., Qiao, Y., Yao, J.: Accurate text localization in natural image with cascaded convolutional text network (2016). arXiv preprint arXiv:1603.09423
  30. 30.
    Tang, Y., Wu, X.: Scene text detection and segmentation based on cascaded convolution neural networks. IEEE Trans. Image Process. 26(3), 1509–1520 (2017)CrossRefGoogle Scholar
  31. 31.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)Google Scholar
  32. 32.
    Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)Google Scholar
  33. 33.
    Jiang, Y., Zhu, X., Wang, X., Yang, S., Li, W., Wang, H., Fu, P., Luo, Z.: R2CNN: rotational region CNN for orientation robust scene text detection (2017). arXiv preprint arXiv:1706.09579
  34. 34.
    Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 20, 3111–3122 (2018)CrossRefGoogle Scholar
  35. 35.
    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: Single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37 (2016)Google Scholar
  36. 36.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)Google Scholar
  37. 37.
    Liu, Y., Jin, L.: Deep matching prior network: toward tighter multi-oriented text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3454–3461 (2017)Google Scholar
  38. 38.
    Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  39. 39.
    Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: a fast text detector with a single deep neural network. In: AAAI, pp. 4161–4167 (2017)Google Scholar
  40. 40.
    Liao, Minghui, Shi, Baoguang, Bai, Xiang, and and: TextBoxes++: A Single-Shot Oriented Scene Text Detector. IEEE Transactions on Image Processing 27(8), 3676–3690 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE International Conference on Computer Vision (2017)Google Scholar
  42. 42.
    Adams Jr., R.B., Adams, R.B., Ambady, N., Shimojo, S., Nakayama, K. (eds.): The Science of Social Vision: The Science of Social Vision, vol. 7. Oxford University Press, Oxford (2011)Google Scholar
  43. 43.
    Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction (2016). arXiv preprint arXiv:1606.09002
  44. 44.
    Xu, Y., Wang, Y., Zhou, W., Wang, Y., Yang, Z., Bai, X.: TextField: learning a deep direction field for irregular scene text detection. Trans. Image Process. (2019) Google Scholar
  45. 45.
    Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 67–83 (2018)Google Scholar
  46. 46.
    Xue, C., Lu, S., Zhang, W.: MSR: multi-scale shape regression for scene text detection (2019). arXiv preprint arXiv:1901.02596
  47. 47.
    Castrejón, L., Kundu, K., Urtasun, R., Fidler, S.: Annotating object instances with a polygon-RNN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, p. 2 (2017)Google Scholar
  48. 48.
    Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., Murphy, K.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 4 (2017)Google Scholar
  49. 49.
    Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. AAAI 4, 12 (2017)Google Scholar
  50. 50.
    Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: fast oriented text spotting with a unified network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5676–5685 (2018)Google Scholar
  51. 51.
    Sun, Y., Zhang, C., Huang, Z., Liu, J., Han, J., Ding, E.: TextNet: irregular text reading from images with an end-to-end trainable network. In: Asian Conference on Computer Vision (2018)Google Scholar
  52. 52.
    Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: Textsnake: a flexible representation for detecting text of arbitrary shapes In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 20–36 (2018)Google Scholar
  53. 53.
    Dai, Y., Huang, Z., Gao, Y., Xu, Y., Chen, K., Guo, J., Qiu, W.: Fused text segmentation networks for multi-oriented scene text detection. In: Proceedings of the IEEE International Conference on Pattern Recognition (ICPR), pp. 3604–3609 (2018)Google Scholar
  54. 54.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2961–2969 (2018)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Faculty of Computer Science and Information Technology, Center of Image and Signal ProcessingUniversity of MalayaKuala LumpurMalaysia
  2. 2.National Laboratory of Pattern Recognition, Institute of AutomationChinese Academy of SciencesBeijingChina

Personalised recommendations