Multi-task Model for Comic Book Image Analysis

  • Nhu-Van NguyenEmail author
  • Christophe Rigaud
  • Jean-Christophe Burie
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11296)


Comic book image analysis methods often propose multiple algorithms or models for multiple tasks like panels and characters detection, balloons segmentation and text recognition, etc. In this work, we aim to reduce the complexity for comic book image analysis by proposing one model which can learn multiple tasks called Comic MTL. In addition to the detection task and segmentation task, we integrate the relation analysis task for balloons and characters into the Comic MTL model. The experiments with our model are carried out on the eBDtheque dataset which contains the annotations for panels, balloons, characters and also the relations balloon-character. We show that the Comic MTL model can detect the association between balloons and their speakers (comic characters) and handle other tasks like panels, characters detection and balloons segmentation with promising results.


Comic book image analysis Association balloon-character Multi-task learning CNN Deep learning 



This work is supported by the CPER NUMERIC programme funded by the Region Nouvelle Aquitaine, CDA, Charente Maritime French Department, La Rochelle conurbation authority (CDA) and the European Union through the FEDER funding”.


  1. 1.
    Abdulnabi, A.H., Wang, G., Lu, J., Jia, K.: Multi-task CNN model for attribute prediction. IEEE Trans. Multimedia 17(11), 1949–1959 (2015)CrossRefGoogle Scholar
  2. 2.
    Arai, K., Tolle, H.: Method for automatic e-comic scene frame extraction for reading comic on mobile devices. In: 7th International Conference on Information Technology: New Generations, pp. 370–375. IEEE Computer Society, Washington DC (2010)Google Scholar
  3. 3.
    Arai, K., Tolle, H.: Method for real time text extraction of digital manga comic. Int. J. Image Process. (IJIP) 4(6), 669–676 (2011)Google Scholar
  4. 4.
    Augereau, O., Iwata, M., Kise, K.: A survey of comics research in computer science. J. Imaging 4 (2018)Google Scholar
  5. 5.
    Chu, W.T., Cheng, W.C.: Manga-specific features and latent style model formanga style analysis. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1332–1336, March 2016Google Scholar
  6. 6.
    Chu, W.T., Li, W.W.: Manga FaceNet: face detection in manga based on deep neural network. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pp. 412–415. ACM (2017)Google Scholar
  7. 7.
    Everingham, M., Eslami, S.M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)CrossRefGoogle Scholar
  8. 8.
    Fujino, S., Mori, N., Matsumoto, K.: Recognizing the order of four-scene comics by evolutionary deep learning. In: De La Prieta, F., Omatu, S., Fernández-Caballero, A. (eds.) DCAI 2018. AISC, vol. 800, pp. 136–144. Springer, Cham (2019). Scholar
  9. 9.
    Guérin, C., et al.: eBDtheque: a representative database of comics. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1145–1149, August 2013Google Scholar
  10. 10.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. CoRR abs/1703.06870 (2017)Google Scholar
  11. 11.
    Ho, A.K.N., Burie, J.C., Ogier, J.M.: Panel and speech balloon extraction from comic books. In: 2012 10th IAPR International Workshop on Document Analysis Systems, pp. 424–428, March 2012Google Scholar
  12. 12.
    In, Y., Oie, T., Higuchi, M., Kawasaki, S., Koike, A., Murakami, H.: Fast frame decomposition and sorting by contour tracing for mobile phone comic images. Int. J. Syst. Appl. Eng. Dev. 5(2), 216–223 (2011)Google Scholar
  13. 13.
    Li, L., Wang, Y., Tang, Z., Gao, L.: Automatic comic page segmentation based on polygon detection. Multimedia Tools Appl. 69(1), 171–197 (2014)CrossRefGoogle Scholar
  14. 14.
    Liu, X., Li, C., Zhu, H., Wong, T.T., Xu, X.: Text-aware balloon extraction from manga. Vis. Computer 32(4), 501–511 (2016)CrossRefGoogle Scholar
  15. 15.
    Matsui, Y., Ito, K., Aramaki, Y., Yamasaki, T., Aizawa, K.: Sketch-based manga retrieval using Manga109 dataset. CoRR abs/1510.04389 (2015)Google Scholar
  16. 16.
    Nguyen, N.V., Rigaud, C., Burie, J.: Comic characters detection using deep learning. In: 2nd International Workshop on coMics Analysis, Processing, and Understanding, MANPU 2017, Kyoto, Japan, 9–15 November 2017, pp. 41–46 (2017)Google Scholar
  17. 17.
    Nguyen, N., Rigaud, C., Burie, J.: Digital comics image indexing based on deep learning. J. Imaging 4(7), 89 (2018)CrossRefGoogle Scholar
  18. 18.
    Obispo, S.L., Kuboi, T.: Element detection in Japanese comic book panels (2014)Google Scholar
  19. 19.
    Ogawa, T., Otsubo, A., Narita, R., Matsui, Y., Yamasaki, T., Aizawa, K.: Object detection for comics using manga109 annotations. CoRR abs/1803.08670 (2018)Google Scholar
  20. 20.
    Pang, X., Cao, Y., Lau, R.W., Chan, A.B.: A robust panel extraction method formanga. In: Proceedings of the 22nd ACM International Conference on Multimedia, MM 2014, pp. 1125–1128. ACM, New York (2014)Google Scholar
  21. 21.
    Ponsard, C., Ramdoyal, R., Dziamski, D.: An OCR-enabled digital comic books viewer. In: Miesenberger, K., Karshmer, A., Penaz, P., Zagler, W. (eds.) ICCHP 2012. LNCS, vol. 7382, pp. 471–478. Springer, Heidelberg (2012). Scholar
  22. 22.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 91–99. Curran Associates, Inc. (2015)Google Scholar
  23. 23.
    Rigaud, C., et al.: Speech balloon and speaker association for comics and manga understanding. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 351–355, August 2015Google Scholar
  24. 24.
    Rigaud, C., Burie, J., Ogier, J.: Segmentation-free speech text recognition for comic books. In: 2nd International Workshop on coMics Analysis, Processing, and Understanding, Kyoto, Japan, 9–15 November, pp. 29–34 (2017)Google Scholar
  25. 25.
    Rigaud, C., Burie, J.-C., Ogier, J.-M.: Text-independent speech balloon segmentation for comics and manga. In: Lamiroy, B., Dueire Lins, R. (eds.) GREC 2015. LNCS, vol. 9657, pp. 133–147. Springer, Cham (2017). Scholar
  26. 26.
    Rigaud, C., Guérin, C., Karatzas, D., Burie, J.C., Ogier, J.M.: Knowledge-driven understanding of images in comic books. Int. J. Doc. Anal. Recogn. (IJDAR) 18(3), 199–221 (2015)CrossRefGoogle Scholar
  27. 27.
    Rigaud, C., Karatzas, D., Van de Weijer, J., Burie, J.C., Ogier, J.M.: An active contour model for speech balloon detection in comics. In: Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1240–1244, August 2013Google Scholar
  28. 28.
    Rigaud, C., Karatzas, D., Van de Weijer, J., Burie, J.C., Ogier, J.M.: Automatic text localisation in scanned comic books. In: Proceedings of the 8th International Conference on Computer Vision Theory and Applications (VISAPP) (2013)Google Scholar
  29. 29.
    Rigaud, C., Tsopze, N., Burie, J.-C., Ogier, J.-M.: Robust frame and text extraction from comic books. In: Kwon, Y.-B., Ogier, J.-M. (eds.) GREC 2011. LNCS, vol. 7423, pp. 129–138. Springer, Heidelberg (2013). Scholar
  30. 30.
    Singh, S.P., Markovitch, S. (eds.): Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 4–9 February 2017, San Francisco, California, USA (2017)Google Scholar
  31. 31.
    Stommel, M., Merhej, L.I., Müller, M.G.: Segmentation-free detection of comic panels. In: Bolc, L., Tadeusiewicz, R., Chmielewski, L.J., Wojciechowski, K. (eds.) ICCVG 2012. LNCS, vol. 7594, pp. 633–640. Springer, Heidelberg (2012). Scholar
  32. 32.
    Sun, W., Burie, J.C., Ogier, J.M., Kise, K.: Specific comic character detection using local feature matching. In: 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, pp. 275–279 (2013)Google Scholar
  33. 33.
    Yamada, M., Budiarto, R., Endo, M., Miyazaki, S.: Comic image decomposition for reading comics on cellular phones. IEICE Trans. 87–D(6), 1370–1376 (2004)Google Scholar
  34. 34.
    Yim, J., Jung, H., Yoo, B., Choi, C., Park, D., Kim, J.: Rotating your face using multi-task deep neural network. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 676–684, June 2015Google Scholar
  35. 35.
    Zhang, Y., Yang, Q.: A survey on multi-task learning. CoRR abs/1707.08114 (2017).
  36. 36.
    Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 94–108. Springer, Cham (2014). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Nhu-Van Nguyen
    • 1
    Email author
  • Christophe Rigaud
    • 1
  • Jean-Christophe Burie
    • 1
  1. 1.Laboratoire L3iUniversité de La RochelleLa Rochelle CEDEX 1France

Personalised recommendations