An Image Dataset of Text Patches in Everyday Scenes

Ibrahim, Ahmed; Abbott, A. Lynn; Hussein, Mohamed E.

doi:10.1007/978-3-319-50832-0_28

An Image Dataset of Text Patches in Everyday Scenes

Ahmed Ibrahim^25,28,
A. Lynn Abbott²⁵ &
Mohamed E. Hussein^26,27

Conference paper
First Online: 10 December 2016

1857 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10073))

Abstract

This paper describes a dataset containing small images of text from everyday scenes. The purpose of the dataset is to support the development of new automated systems that can detect and analyze text. Although much research has been devoted to text detection and recognition in scanned documents, relatively little attention has been given to text detection in other types of images, such as photographs that are posted on social-media sites. This new dataset, known as COCO-Text-Patch, contains approximately 354,000 small images that are each labeled as “text” or “non-text”. This dataset particularly addresses the problem of text verification, which is an essential stage in the end-to-end text detection and recognition pipeline. In order to evaluate the utility of this dataset, it has been used to train two deep convolution neural networks to distinguish text from non-text. One network is inspired by the GoogLeNet architecture, and the second one is based on CaffeNet. Accuracy levels of 90.2% and 90.9% were obtained using the two networks, respectively. All of the images, source code, and deep-learning trained models described in this paper will be publicly available (https://aicentral.github.io/coco-text-patch/).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Zhu, Y., Yao, C., Bai, X.: Scene text detection and recognition: recent advances and future trends. Front. Comput. Sci. 10(1), 19–36 (2016)
Article Google Scholar
Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1480–1500 (2015)
Article Google Scholar
Veit, A., Matera, T., Neumann, L., Matas, J., Belongie, S.: COCO-Text: dataset and benchmark for text detection and recognition in natural images. arXiv preprint. arXiv:1601.07140 (2016)
Li, M., Wang, C.: an adaptive text detection approach in images and video frames. In: IEEE International Joint Conference on Neural Networks, pp. 72–77, June 2008
Google Scholar
Shivakumara, P., Huang, W., Tan, C.L.: Efficient video text detection using edge features. In: 19th International Conference on Pattern Recognition (ICPR), pp. 1–4, December 2008
Google Scholar
Shivakumara, P., Huang, W., Phan, T.Q., Tan, C.L.: Accurate video text detection through classification of low and high contrast images. Pattern Recogn. 43(6), 2165–2185 (2010)
Article Google Scholar
Shivakumara, P., Phan, T.Q., Tan, C.L.: A Laplacian approach to multi-oriented text detection in video. IEEE Trans. Pattern Anal. Mach. Intell. 33, 412–419 (2011)
Article Google Scholar
Kim, W., Kim, C.: A new approach for overlay text detection and extraction from complex video scene. IEEE Trans. Image Process. 18, 401–411 (2009)
Article MathSciNet Google Scholar
Hanif, S.M., Prevost, L., Negri, P.A.: A cascade detector for text detection in natural scene images. In 19th International Conference on Pattern Recognition (ICPR), pp. 1–4, December 2008
Google Scholar
Liu, F., Peng, X., Wang, T., Lu, S.: A density-based approach for text extraction in images. In: 19th International Conference on Pattern Recognition (ICPR), pp. 1–4, December 2008
Google Scholar
Wang, K., Kangas, J.A.: Character location in scene images from digital camera. Pattern Recogn. 36(10), 2287–2299 (2003)
Article MATH Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Article Google Scholar
Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R., Ashida, K., Nagai, H., Okamoto, M., Yamamoto, H., Miyao, H., Zhu, J., Ou, W., Wolf, C., Jolion, J.-M., Todoran, L., Worring, M., Lin, X.: ICDAR: entries, results, and future directions. IJDAR 7(2), 105–122 (2005)
Article Google Scholar
Karatzas, D., Mestre, S.R., Mas, J., Nourbakhsh, F., Roy, P.P.: ICDAR: reading text in born-digital images (web and email). In: International Conference on Document Analysis and Recognition (ICDAR), pp. 1485–1490, September 2011
Google Scholar
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: ICDAR 2015 competition on robust reading. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160, August 2015
Google Scholar
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1083–1090, June 2012
Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_48
Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Book MATH Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)
Google Scholar
Prewitt, J.M.: Object enhancement and extraction. Picture Proc. Psychopictorics 10(1), 15–19 (1970)
Google Scholar
BVLC reference CaffeNet model. https://github.com/BVLC/caffe/tree/master/models/bvlc_reference_caffenet. Accessed June 2016
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 1–9 (2015)
Google Scholar
Virginia Tech NewRiver high-performance computer. http://www.arc.vt.edu/computing/newriver/. Accessed June 2016

Download references

Author information

Authors and Affiliations

Virginia Polytechnic Institute and State University, Blacksburg, USA
Ahmed Ibrahim & A. Lynn Abbott
Egypt-Japan University of Science and Technology, Alexandria, Egypt
Mohamed E. Hussein
Alexandria University, Alexandria, Egypt
Mohamed E. Hussein
Benha University, Banha, Egypt
Ahmed Ibrahim

Authors

Ahmed Ibrahim
View author publications
You can also search for this author in PubMed Google Scholar
A. Lynn Abbott
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed E. Hussein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Ibrahim .

Editor information

Editors and Affiliations

University of Nevada, Reno, Nevada, USA
George Bebis
NASA Ames Research Center, Moffett Field, California, USA
Richard Boyle
Lawrence Berkeley National Laboratory, Berkeley, California, USA
Bahram Parvin
Desert Research Institute, Reno, Nevada, USA
Darko Koracin
The Australian National University, O’Malley, Aust Capital Terr, Australia
Fatih Porikli
Pilot AI Labs, Redwood City, California, USA
Sandra Skaff
University of Florida, Gainesville, Florida, USA
Alireza Entezari
Google Inc., Mountain View, California, USA
Jianyuan Min
Osaka University, Osaka, Japan
Daisuke Iwai
The MOVES Institute, Monterey, California, USA
Amela Sadagic
University of Arizona, Tucson, Arizona, USA
Carlos Scheidegger
Université Paris-Sud, Orsay, France
Tobias Isenberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ibrahim, A., Abbott, A.L., Hussein, M.E. (2016). An Image Dataset of Text Patches in Everyday Scenes. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2016. Lecture Notes in Computer Science(), vol 10073. Springer, Cham. https://doi.org/10.1007/978-3-319-50832-0_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-50832-0_28
Published: 10 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50831-3
Online ISBN: 978-3-319-50832-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics