An R-CNN Based Method to Localize Speech Balloons in Comics

Wang, Yongtao; Liu, Xicheng; Tang, Zhi

doi:10.1007/978-3-319-27671-7_37

Yongtao Wang¹⁹,
Xicheng Liu¹⁹ &
Zhi Tang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9516))

Included in the following conference series:

International Conference on Multimedia Modeling

3014 Accesses
2 Citations

Abstract

Comic books enjoy great popularity around the world. More and more people choose to read comic books on digital devices, especially on mobile ones. However, the screen size of most mobile devices is not big enough to display an entire comic page directly. As a consequence, without any reflow or adaption to the original books, users often find that the texts on comic pages are hard to recognize when reading comics on mobile devices. Given the positions of speech balloons, it becomes quite easy to do further processing on texts to make them easier to read on mobile devices. Because the texts on a comic page often come along with surrounding speech balloons. Therefore, it is important to devise an effective method to localize speech balloons in comics. However, only a few studies have been done in this direction. In this paper, we propose a Regions with Convolutional Neural Network (R-CNN) based method to localize speech balloons in comics. Experimental results have demonstrated that the proposed method can localize the speech balloons in comics effectively and accurately.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Rigaud, C., Burie, J., Ogier, J., Karatzas, D., Weijer, J.: An active contour model for speech balloon detection in comics. In: International Conference on Document Analysis and Recognition, Washington, DC, pp. 1240–1244 (2013)
Google Scholar
Arai, K., Tolle, H.: Method for real time text extraction of digital manga comic. Int. J. Image Process. 4(6), 669676 (2011)
Google Scholar
Ho, A.N., Burie, J., Ogier, J.: Panel and speech balloon extraction from comic books. In: International Workshop on Document Analysis Systems, Gold Cost, QLD, pp. 424–428 (2012)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition, Columbus, OH, pp. 580–587 (2014)
Google Scholar
Gu, C., Lim, J.J., Arbelaez, P., Malik, J.: Recognition using regions. In: Computer Vision and Pattern Recognition, Miami, FL, pp. 1030–1037 (2009)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, South Lake Tahoe, Nevada, pp. 1097–1105 (2012)
Google Scholar
Girshick, R.: GitHub, May 2014. https://github.com/rbgirshick/rcnn
Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)
Article Google Scholar
Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2189–2202 (2012)
Article Google Scholar
Jia, Y.: Caffe: an open source convolutional architecture for fast feature embedding, May 2013. http://caffe.berkeleyvision.org/
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2009)
Article Google Scholar
Sung, K.-K., Poggio, T.: Example-based learning for view-based human face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 39–51 (1998)
Article Google Scholar

Download references

Acknowledgement

This work is supported by National Natural Science Foundation of China (Grant 61300061) and Beijing Natural Science Foundation (4132033).

Author information

Authors and Affiliations

Institute of Computer Science and Technology of Peking University, Beijing, People’s Republic of China
Yongtao Wang, Xicheng Liu & Zhi Tang

Authors

Yongtao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xicheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xicheng Liu .

Editor information

Editors and Affiliations

University of Texas at San Antonio, San Antonio, USA
Qi Tian
Dept. of Information Engineering, University of Trento, Povo, Trento, Italy
Nicu Sebe
EECS, University of Central Florida, Orlando, Florida, USA
Guo-Jun Qi
EURECOM, Sophia-Antipolis, France
Benoit Huet
Hefei University of Technology, Hefei, Anhui, China
Richang Hong
School of Computing and Information, Hefei University of Technology, Hefei, Anhui, China
Xueliang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Liu, X., Tang, Z. (2016). An R-CNN Based Method to Localize Speech Balloons in Comics. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9516. Springer, Cham. https://doi.org/10.1007/978-3-319-27671-7_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-27671-7_37
Published: 03 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27670-0
Online ISBN: 978-3-319-27671-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics