An Image Dataset of Text Patches in Everyday Scenes

  • Ahmed IbrahimEmail author
  • A. Lynn Abbott
  • Mohamed E. Hussein
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10073)


This paper describes a dataset containing small images of text from everyday scenes. The purpose of the dataset is to support the development of new automated systems that can detect and analyze text. Although much research has been devoted to text detection and recognition in scanned documents, relatively little attention has been given to text detection in other types of images, such as photographs that are posted on social-media sites. This new dataset, known as COCO-Text-Patch, contains approximately 354,000 small images that are each labeled as “text” or “non-text”. This dataset particularly addresses the problem of text verification, which is an essential stage in the end-to-end text detection and recognition pipeline. In order to evaluate the utility of this dataset, it has been used to train two deep convolution neural networks to distinguish text from non-text. One network is inspired by the GoogLeNet architecture, and the second one is based on CaffeNet. Accuracy levels of 90.2% and 90.9% were obtained using the two networks, respectively. All of the images, source code, and deep-learning trained models described in this paper will be publicly available (


Convolutional Neural Network Optical Character Recognition Text Detection Region Proposal Convolution Neural Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Zhu, Y., Yao, C., Bai, X.: Scene text detection and recognition: recent advances and future trends. Front. Comput. Sci. 10(1), 19–36 (2016)CrossRefGoogle Scholar
  2. 2.
    Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1480–1500 (2015)CrossRefGoogle Scholar
  3. 3.
    Veit, A., Matera, T., Neumann, L., Matas, J., Belongie, S.: COCO-Text: dataset and benchmark for text detection and recognition in natural images. arXiv preprint. arXiv:1601.07140 (2016)
  4. 4.
    Li, M., Wang, C.: an adaptive text detection approach in images and video frames. In: IEEE International Joint Conference on Neural Networks, pp. 72–77, June 2008Google Scholar
  5. 5.
    Shivakumara, P., Huang, W., Tan, C.L.: Efficient video text detection using edge features. In: 19th International Conference on Pattern Recognition (ICPR), pp. 1–4, December 2008Google Scholar
  6. 6.
    Shivakumara, P., Huang, W., Phan, T.Q., Tan, C.L.: Accurate video text detection through classification of low and high contrast images. Pattern Recogn. 43(6), 2165–2185 (2010)CrossRefGoogle Scholar
  7. 7.
    Shivakumara, P., Phan, T.Q., Tan, C.L.: A Laplacian approach to multi-oriented text detection in video. IEEE Trans. Pattern Anal. Mach. Intell. 33, 412–419 (2011)CrossRefGoogle Scholar
  8. 8.
    Kim, W., Kim, C.: A new approach for overlay text detection and extraction from complex video scene. IEEE Trans. Image Process. 18, 401–411 (2009)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Hanif, S.M., Prevost, L., Negri, P.A.: A cascade detector for text detection in natural scene images. In 19th International Conference on Pattern Recognition (ICPR), pp. 1–4, December 2008Google Scholar
  10. 10.
    Liu, F., Peng, X., Wang, T., Lu, S.: A density-based approach for text extraction in images. In: 19th International Conference on Pattern Recognition (ICPR), pp. 1–4, December 2008Google Scholar
  11. 11.
    Wang, K., Kangas, J.A.: Character location in scene images from digital camera. Pattern Recogn. 36(10), 2287–2299 (2003)CrossRefzbMATHGoogle Scholar
  12. 12.
    Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)CrossRefGoogle Scholar
  13. 13.
    Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R., Ashida, K., Nagai, H., Okamoto, M., Yamamoto, H., Miyao, H., Zhu, J., Ou, W., Wolf, C., Jolion, J.-M., Todoran, L., Worring, M., Lin, X.: ICDAR: entries, results, and future directions. IJDAR 7(2), 105–122 (2005)CrossRefGoogle Scholar
  14. 14.
    Karatzas, D., Mestre, S.R., Mas, J., Nourbakhsh, F., Roy, P.P.: ICDAR: reading text in born-digital images (web and email). In: International Conference on Document Analysis and Recognition (ICDAR), pp. 1485–1490, September 2011Google Scholar
  15. 15.
    Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: ICDAR 2015 competition on robust reading. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160, August 2015Google Scholar
  16. 16.
    Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1083–1090, June 2012Google Scholar
  17. 17.
    Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10602-1_48 Google Scholar
  18. 18.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)CrossRefzbMATHGoogle Scholar
  19. 19.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)Google Scholar
  20. 20.
    Prewitt, J.M.: Object enhancement and extraction. Picture Proc. Psychopictorics 10(1), 15–19 (1970)Google Scholar
  21. 21.
    BVLC reference CaffeNet model. Accessed June 2016
  22. 22.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 1–9 (2015)Google Scholar
  23. 23.
    Virginia Tech NewRiver high-performance computer. Accessed June 2016

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Ahmed Ibrahim
    • 1
    • 4
    Email author
  • A. Lynn Abbott
    • 1
  • Mohamed E. Hussein
    • 2
    • 3
  1. 1.Virginia Polytechnic Institute and State UniversityBlacksburgUSA
  2. 2.Egypt-Japan University of Science and TechnologyAlexandriaEgypt
  3. 3.Alexandria UniversityAlexandriaEgypt
  4. 4.Benha UniversityBanhaEgypt

Personalised recommendations