Skip to main content

An Image Dataset of Text Patches in Everyday Scenes

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10073))

Abstract

This paper describes a dataset containing small images of text from everyday scenes. The purpose of the dataset is to support the development of new automated systems that can detect and analyze text. Although much research has been devoted to text detection and recognition in scanned documents, relatively little attention has been given to text detection in other types of images, such as photographs that are posted on social-media sites. This new dataset, known as COCO-Text-Patch, contains approximately 354,000 small images that are each labeled as “text” or “non-text”. This dataset particularly addresses the problem of text verification, which is an essential stage in the end-to-end text detection and recognition pipeline. In order to evaluate the utility of this dataset, it has been used to train two deep convolution neural networks to distinguish text from non-text. One network is inspired by the GoogLeNet architecture, and the second one is based on CaffeNet. Accuracy levels of 90.2% and 90.9% were obtained using the two networks, respectively. All of the images, source code, and deep-learning trained models described in this paper will be publicly available (https://aicentral.github.io/coco-text-patch/).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Zhu, Y., Yao, C., Bai, X.: Scene text detection and recognition: recent advances and future trends. Front. Comput. Sci. 10(1), 19–36 (2016)

    Article  Google Scholar 

  2. Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1480–1500 (2015)

    Article  Google Scholar 

  3. Veit, A., Matera, T., Neumann, L., Matas, J., Belongie, S.: COCO-Text: dataset and benchmark for text detection and recognition in natural images. arXiv preprint. arXiv:1601.07140 (2016)

  4. Li, M., Wang, C.: an adaptive text detection approach in images and video frames. In: IEEE International Joint Conference on Neural Networks, pp. 72–77, June 2008

    Google Scholar 

  5. Shivakumara, P., Huang, W., Tan, C.L.: Efficient video text detection using edge features. In: 19th International Conference on Pattern Recognition (ICPR), pp. 1–4, December 2008

    Google Scholar 

  6. Shivakumara, P., Huang, W., Phan, T.Q., Tan, C.L.: Accurate video text detection through classification of low and high contrast images. Pattern Recogn. 43(6), 2165–2185 (2010)

    Article  Google Scholar 

  7. Shivakumara, P., Phan, T.Q., Tan, C.L.: A Laplacian approach to multi-oriented text detection in video. IEEE Trans. Pattern Anal. Mach. Intell. 33, 412–419 (2011)

    Article  Google Scholar 

  8. Kim, W., Kim, C.: A new approach for overlay text detection and extraction from complex video scene. IEEE Trans. Image Process. 18, 401–411 (2009)

    Article  MathSciNet  Google Scholar 

  9. Hanif, S.M., Prevost, L., Negri, P.A.: A cascade detector for text detection in natural scene images. In 19th International Conference on Pattern Recognition (ICPR), pp. 1–4, December 2008

    Google Scholar 

  10. Liu, F., Peng, X., Wang, T., Lu, S.: A density-based approach for text extraction in images. In: 19th International Conference on Pattern Recognition (ICPR), pp. 1–4, December 2008

    Google Scholar 

  11. Wang, K., Kangas, J.A.: Character location in scene images from digital camera. Pattern Recogn. 36(10), 2287–2299 (2003)

    Article  MATH  Google Scholar 

  12. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)

    Article  Google Scholar 

  13. Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R., Ashida, K., Nagai, H., Okamoto, M., Yamamoto, H., Miyao, H., Zhu, J., Ou, W., Wolf, C., Jolion, J.-M., Todoran, L., Worring, M., Lin, X.: ICDAR: entries, results, and future directions. IJDAR 7(2), 105–122 (2005)

    Article  Google Scholar 

  14. Karatzas, D., Mestre, S.R., Mas, J., Nourbakhsh, F., Roy, P.P.: ICDAR: reading text in born-digital images (web and email). In: International Conference on Document Analysis and Recognition (ICDAR), pp. 1485–1490, September 2011

    Google Scholar 

  15. Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: ICDAR 2015 competition on robust reading. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160, August 2015

    Google Scholar 

  16. Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1083–1090, June 2012

    Google Scholar 

  17. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_48

    Google Scholar 

  18. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)

    Book  MATH  Google Scholar 

  19. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)

    Google Scholar 

  20. Prewitt, J.M.: Object enhancement and extraction. Picture Proc. Psychopictorics 10(1), 15–19 (1970)

    Google Scholar 

  21. BVLC reference CaffeNet model. https://github.com/BVLC/caffe/tree/master/models/bvlc_reference_caffenet. Accessed June 2016

  22. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 1–9 (2015)

    Google Scholar 

  23. Virginia Tech NewRiver high-performance computer. http://www.arc.vt.edu/computing/newriver/. Accessed June 2016

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Ibrahim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Ibrahim, A., Abbott, A.L., Hussein, M.E. (2016). An Image Dataset of Text Patches in Everyday Scenes. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2016. Lecture Notes in Computer Science(), vol 10073. Springer, Cham. https://doi.org/10.1007/978-3-319-50832-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50832-0_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50831-3

  • Online ISBN: 978-3-319-50832-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics