Abstract
Most of the recent research works on comic document images have focused on the reading and distribution of comics digitally due to the evolution of technologies. In this work, the extraction of narrative text boxes and speech balloons, which contain the conversations among comic characters along with their feelings, is presented. Due to the huge variety of drawing styles, the shape of these speech balloons is complex, and extraction is difficult. We present a shape-aware dual-stream convolutional neural network for the segmentation of narrative text boxes and speech balloons of various shapes. In our dual-stream architecture, an added shape module processes edge information of the speech balloons and narrative texts with the main module. Later, the concatenation of these two modules produces more accurate segmentation of speech balloons and narrative text boxes. The proposed method achieves significant performance improvements in terms of both region accuracy (mIOU) and boundary accuracy (F-measure and Hausdorff distance) compared to other state-of-the-art methods on various publicly available comic datasets (namely eBDtheque, DCM and Manga 109 dataset subset) in different languages. In addition, we have developed a new dataset (BCBId) for comics in Bangla, the eighth most spoken language in the world, and propose a method for the development of ground-truth images in a semiautomatic way.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Codes and data are available at https://github.com/Arpi07/Arpi07-2/tree/Speech_balloon_segmentation.
References
BCBID: sites.google.com/view/banglacomicbookdataset. Accessed 8 Sept 2020
Christophe Rigaud|Gitlab. https://git.univ-lr.fr/u/crigau02. Accessed 7 Jan 2020
Digital Comic Museum. https://digitalcomicmuseum.com/. Accessed 29 May 2019
Arai, K., Tolle, H.: Method for real time text extraction of digital manga comic. Int. J. Image Process. IJIP 4(6), 669–676 (2011)
Augereau, O., Iwata, M., Kise, K.: An overview of comics research in computer science. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 3, pp. 54–59. IEEE (2017)
Augereau, O., Iwata, M., Kise, K.: A survey of comics research in computer science. J. Imaging 4(7), 87 (2018)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
Cao, Y., Pang, X., Chan, A.B., Lau, R.W.: Dynamic manga: animating still manga via camera movement. IEEE Trans. Multimedia 19(1), 160–172 (2016)
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV, pp. 801–818 (2018)
Dubray, D., Laubrock, J.: Deep CNN-based speech balloon detection and segmentation for comic books. arXiv preprint arXiv:1902.08137 (2019)
Dubuisson, M.P., Jain, A.K.: A modified Hausdorff distance for object matching. In: Proceedings of 12th International Conference on Pattern Recognition, vol. 1, pp. 566–568. IEEE (1994)
Dunst, A., Laubrock, J., Wildfeuer, J.: Empirical Comics Research: Digital, Multimodal, and Cognitive Methods. Routledge, Milton Park (2018)
Dutta, A., Biswas, S.: CNN based extraction of panels/characters from bengali comic book page images. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 1, pp. 38–43. IEEE (2019)
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: International Conference on Learning Representations (2019)
Guérin, C., Rigaud, C., Mercier, A., Ammar-Boudjelal, F., Bertet, K., Bouju, A., Burie, J.C., Louis, G., Ogier, J.M., Revel, A.: eBDtheque: a representative database of comics. In: ICDAR, pp. 1145–1149. IEEE (2013)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Ho, A.K.N., Burie, J.C., Ogier, J.M.: Panel and speech balloon extraction from comic books. In: DAS, 2012, pp. 424–428. IEEE (2012)
Huttenlocher, D.P., Klanderman, G.A., Rucklidge, W.J.: Comparing images using the Hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 850–863 (1993)
Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. Int. J. Comput. Vis. 1(4), 321–331 (1988)
Li, L., Wang, Y., Gao, L., Tang, Z., Suen, C.Y.: Comic2cebx: a system for automatic comic content adaptation. In: IEEE/ACM Joint Conference on Digital Libraries, pp. 299–308. IEEE (2014)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Matsui, Y., Ito, K., Aramaki, Y., Fujimoto, A., Ogawa, T., Yamasaki, T., Aizawa, K.: Sketch-based manga retrieval using manga109 dataset. Multimedia Tools Appl. 76(20), 21811–21838 (2017)
Matsui, Y., Yamasaki, T., Aizawa, K.: Interactive manga retargeting. In: SIGGRAPH Posters, p. 35 (2011)
Nguyen, N.V., Rigaud, C., Burie, J.C.: Digital comics image indexing based on deep learning. J. Imaging 4(7), 89 (2018)
Nguyen, N.V., Rigaud, C., Burie, J.C.: Comic MTL: optimized multi-task learning for comic book image analysis. Int. J. Doc. Anal. Recognit. IJDAR 22(3), 265–284 (2019)
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528 (2015)
Ogawa, T., Otsubo, A., Narita, R., Matsui, Y., Yamasaki, T., Aizawa, K.: Object detection for comics using manga109 annotations. arXiv:1803.08670 (2018)
Osserman, R., et al.: The isoperimetric inequality. Bull. Am. Math. Soc. 84(6), 1182–1238 (1978)
Prewitt, J.M.: Object enhancement and extraction. Picture Process. Psychopictorics 10(1), 15–19 (1970)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Ribera, J., Guera, D., Chen, Y., Delp, E.J.: Locating objects without bounding boxes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6479–6489 (2019)
Rigaud, C., Burie, J.C., Ogier, J.M.: Text-independent speech balloon segmentation for comics and manga. In: International Workshop on Graphics Recognition, pp. 133–147. Springer (2015)
Rigaud, C., Burie, J.C., Ogier, J.M., Karatzas, D., Van de Weijer, J.: An active contour model for speech balloon detection in comics. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1240–1244. IEEE (2013)
Rigaud, C., Guérin, C., Karatzas, D., Burie, J.C., Ogier, J.M.: Knowledge-driven understanding of images in comic books. IJDAR 18(3), 199–221 (2015)
Rigaud, C., Le Thanh, N., Burie, J.C., Ogier, J.M., Iwata, M., Imazu, E., Kise, K.: Speech balloon and speaker association for comics and manga understanding. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 351–355. IEEE (2015)
Rigaud, C., Nguyen, V., Burie, J.C.: Confidence criterion for speech balloon segmentation. In: 13th IAPR International Workshop on Graphics Recognition (2019)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sun, W., Kise, K.: Similar manga retrieval using visual vocabulary based on regions of interest. In: 2011 International Conference on Document Analysis and Recognition, pp. 1075–1079. IEEE (2011)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Academic Press, Boca Raton (2008)
Woo, S., Park, J., Lee, J.Y., So Kweon, I.: Cbam: Convolutional block attention module. In: ECCV, pp. 3–19 (2018)
Yamada, M., Budiarto, R., Endo, M., Miyazaki, S.: Comic image decomposition for reading comics on cellular phones. IEICE Trans. Inf. Syst. 87(6), 1370–1376 (2004)
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: NIPS. Curran Associates (2014)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: 4th International Conference on Learning Representations, ICLR 2016
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dutta, A., Biswas, S. & Das, A.K. CNN-based segmentation of speech balloons and narrative text boxes from comic book page images. IJDAR 24, 49–62 (2021). https://doi.org/10.1007/s10032-021-00366-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-021-00366-4