Knowledge-driven understanding of images in comic books

  • Christophe Rigaud
  • Clément Guérin
  • Dimosthenis Karatzas
  • Jean-Christophe Burie
  • Jean-Marc Ogier
Original Paper


Document analysis is an active field of research, which can attain a complete understanding of the semantics of a given document. One example of the document understanding process is enabling a computer to identify the key elements of a comic book story and arrange them according to a predefined domain knowledge. In this study, we propose a knowledge-driven system that can interact with bottom-up and top-down information to progressively understand the content of a document. We model the comic book’s and the image processing domains knowledge for information consistency analysis. In addition, different image processing methods are improved or developed to extract panels, balloons, tails, texts, comic characters and their semantic relations in an unsupervised way.


Document understanding Comics analysis Expert system 



The authors would like to thank Karell Bertet and Arnaud Revel for their help with the high-level processing. This work was supported by a European Doctorate scholarship of the University of La Rochelle, European Regional Development Fund, the region Poitou-Charentes (France), the General Council of Charente Maritime (France), the municipality of La Rochelle (France) and the Spanish research projects TIN2011-24631, RYC-2009-05031. We are grateful to all authors and publishers of comics and manga from the eBDtheque dataset for having allowed us to show (Figs. 124), use and share their works.

Conflict of interest

The authors declare that there is no conflict of interests. This article does not contain any studies with human or animal subjects.


  1. 1.
    Arai, K., Tolle, H.: Method for automatic e-comic scene frame extraction for reading comic on mobile devices. In: IEEE Computer Society Seventh International Conference on Information Technology: New Generations, ITNG ’10, pp. 370–375, Washington, DC, USA, (2010)Google Scholar
  2. 2.
    Arai, K., Tolle, H.: Method for real time text extraction of digital manga comic. Int. J. Image Proces. (IJIP) 4(6), 669–676 (2011)Google Scholar
  3. 3.
    Back, M., Gold, R., Balsamo, A., Chow, M., Gorbet, M., Harrison, S., MacDonald, D., Minnerman, S.: Designing innovative reading experiences for a museum exhibition. Computer 34(1), 80–87 (2001)CrossRefGoogle Scholar
  4. 4.
    Blaschke, T., Hay, G.J., Kelly, M., Lang, S., Hofmann, P., Addink, E., Feitosa, R.Q., van der Meer, F., van der Werff, H., van Coillie, F., Tiede, D.: Geographic object-based image analysis: towards a new paradigm. J. Photogramm. Remote Sens. 87, 180–191 (2014)CrossRefGoogle Scholar
  5. 5.
    Borodo, M.: Multimodality, translation and comics. Perspectives 1–20 (2014)Google Scholar
  6. 6.
    Brandon, D.C.: Graphic novels and comics for the visually impaired exploredin award-winning paper. (2014)
  7. 7.
    Di Sciascio, E., Donini, F.M., Mongiello, M.: Structured knowledge representation for image retrieval. J. Artif. Intell. Res. 16(1), 209–257 (2002)zbMATHGoogle Scholar
  8. 8.
    Duc, B.: L’art de la BD—Tome 1—Du scénario à la réalisation. Glénat (1982)Google Scholar
  9. 9.
    Duda, R.O., Hart, P.E.: Use of the hough transformation to detect lines and curves in pictures. Commun. ACM 15, 11–15 (1972)zbMATHCrossRefGoogle Scholar
  10. 10.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  11. 11.
    Fidler, S., Yao, J., Urtasun, R.: Describing the scene as a whole: joint object detection, scene classification and semantic segmentation. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 702–709. (2012)Google Scholar
  12. 12.
    Guérin, C.: Ontologies nd spatial relations applied to comic books reading. In: PhD Symposium of Knowledge Engineering and Knowledge Management (EKAW), Galway, Ireland (2012)Google Scholar
  13. 13.
    Guérin, C., Rigaud, C., Mercier, A., et al.: ebdtheque: a representative database of comics. In: Proceedings of International Conference on Document Analysis and Recognition (ICDAR), Washington DC (2013)Google Scholar
  14. 14.
    Haarslev, V., Hidde, K., Möller, R., Wessel, M.: The RacerPro knowledge representation and reasoning system. Semant. Web 3(3), 267–277 (2012)Google Scholar
  15. 15.
    Han, E., Kim, K., Yang, H., Jung, K.: Frame segmentation used mlp-based x-y recursive for mobile cartoon content. In: Proceedings of the 12th International Conference on Human–Computer Interaction: Intelligent Multimodal Interaction Environments, HCI’07, pp. 872–881. Springer, Berlin (2007)Google Scholar
  16. 16.
    Hayes-Roth, F., Waterman, D., Lenat, D.: Building expert systems. Addison-Wesley, Reading (1984)Google Scholar
  17. 17.
    Hermann, A., Ferré, S., Ducassé, M.: Guided semantic annotation of comic panels with sewelis. In: EKAW, volume 7603 of Lecture Notes in Computer Science, pp. 430–433. Springer (2012)Google Scholar
  18. 18.
    Ho, A. K. N., Burie, J.-C., Ogier, J.-M.: Comics page structure analysis based on automatic panel extraction. In: GREC 2011, Nineth IAPR International Workshop on Graphics Recognition, Seoul, Korea, pp. 15–16 (2011)Google Scholar
  19. 19.
    Ho, A. K. N., Burie, J.-C., Ogier, J.-M.: Panel and speech balloon extraction from comic books. In: 2012 10th IAPR International Workshop on Document Analysis Systems, pp. 424–428 (2012)Google Scholar
  20. 20.
    Ho, H. N., Rigaud, C., Burie, J.-C., Ogier, J.-M.: Redundant structure detection in attributed adjacency graphs for character detection in comics books. In: Proceedings of the 10th IAPR International Workshop on Graphics Recognition (GREC), Bethlehem, PA, USA, (2013)Google Scholar
  21. 21.
    Hu, B., Dasmahapatra, S., Lewis, P., Shadbolt, N.: Ontology-based medical image annotation with description logics. In: Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence (2003)Google Scholar
  22. 22.
    Hudelot, C., Atif, J., Bloch, I.: Fuzzy spatial relation ontology for image interpretation. Fuzzy Sets Syst. 159(15), 1929–1951 (2008)MathSciNetCrossRefGoogle Scholar
  23. 23.
    IBISWorld. Comic book publishing in the US: Market research report, (2013)Google Scholar
  24. 24.
    In, Y., Oie, T., Higuchi, M., Kawasaki, S., Koike, A., Murakami, H.: Fast frame decomposition and sorting by contour tracing for mobile phone comic images. Int. J. Syst. Appl. Eng. Dev. 5(2), 216–223 (2011)Google Scholar
  25. 25.
    Japan Book Publishers Association: An Introduction to Publishing inJapan 2012–2013. Japan Book Publishers Association, Tokyo (2012)
  26. 26.
    Jérémy, R., Vincent, B.: Comics reading: an automatic script generation. In: Proceedings of the 21st International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG), pp. 88–96 (2013)Google Scholar
  27. 27.
    Khan, F. S., Rao, M. A., van de Weijer, J., Bagdanov, A. D., Vanrell, M., Lopez, A.: Color attributes for object detection. In: Twenty-Fifth IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012) (2012)Google Scholar
  28. 28.
    Lainé, J.-M., Delzant, S.: Le lettrage des bulles. Eyrolles, Paris (2010)Google Scholar
  29. 29.
    Lamiroy, B., Ogier, J.-M.: Analysis and interpretation of graphical documents. In: Doermann, D., Tombre, K. (eds.) Handbook of Document Image Processing and Recognition. Springer, Berlin (2014)Google Scholar
  30. 30.
    Li, C., Kowdle, A., Saxena, A., Chen, T.: Toward holistic scene understanding: feedback enabled cascaded classification models. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1394–1408 (2012)CrossRefGoogle Scholar
  31. 31.
    Li, L., Wang, Y., Tang, Z., Gao, L.: Automatic comic page segmentation based on polygon detection. Multimed. Tools Appl. 69(1), 171–197 (2014)CrossRefGoogle Scholar
  32. 32.
    Li, L., Wang, Y., Tang, Z., Lu, X., Gao, L.: Unsupervised speech text localization in comic images. In: 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1190–1194 (2013)Google Scholar
  33. 33.
    Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. In: Kanungo, T., Smith, E.H.B., Hu, J., Kantor, P.B. (eds.) Document Recognition and Retrieval X, volume 5010 of SPIE Proceedings, pp. 197–207. SPIE, Bellingham (2003)Google Scholar
  34. 34.
    McCloud, S.: Understanding Comics. William Morrow Paperbacks, New York (1994)Google Scholar
  35. 35.
    McGuinness, D. L., Van Harmelen, F.: OWL Web Ontology Language Overview. Technical report, W3C (2004)Google Scholar
  36. 36.
    Mezaris, V., Kompatsiaris, I., Strintzis, M.G.: An ontology approach to object-based image retrieval. In: International Conference on Image Processing (ICIP) vol 2, pp. 511–514 (2003)Google Scholar
  37. 37.
    Ogier, J., Mullot, R., Labiche, J., Lecourtier, Y.: Semantic coherency: the basis of an image interpretation device-application to the cadastral map interpretation. IEEE Trans. Syst. Man Cybern. Part B Cybern. 30(2), 322–338 (2000)CrossRefGoogle Scholar
  38. 38.
    Otsu, N.: A threshold selection method from gray level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979)CrossRefGoogle Scholar
  39. 39.
    Ponsard, C: Enhancing the accessibility for all of digital comic books. e-Minds, 1(5), (2009)Google Scholar
  40. 40.
    Ponsard, C., Ramdoyal, R., Dziamski, D.: An ocr-enabled digital comic books viewer. In: Computers Helping People with Special Needs, pp. 471–478. Springer, (2012)Google Scholar
  41. 41.
    Ratier, G.: 2013 : l’année de la décélération—, (2013)Google Scholar
  42. 42.
    Rhoades, S.: A Complete History of American Comic Books. Peter Lang, New York (2008)Google Scholar
  43. 43.
    Rigaud, C., Karatzas, D., Burie, J.-C., Ogier, J.-M.: Speech balloon contour classification in comics. In: Proceedings of the 10th IAPR International Workshop on Graphics Recognition (GREC), pp. 23–25, Bethlehem, PA, USA, (2013)Google Scholar
  44. 44.
    Rigaud, C., Karatzas, D., Burie, J.-C., Ogier, J.-M.: Color descriptor for content-based drawing retrieval. In: Proceedings of International Workshop on Document Analysis Systems (DAS), Tours, France, (2014)Google Scholar
  45. 45.
    Rigaud, C., Karatzas, D., Van de Weijer, J., Burie, J.-C., Ogier, J.-M.: An active contour model for speech balloon detection in comics. In: IEEE Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), (2013)Google Scholar
  46. 46.
    Rigaud, C., Karatzas, D., Van de Weijer, J., Burie, J.-C., Ogier, J.-M.: Automatic text localisation in scanned comic books. In: Proceedings of the 8th International Conference on Computer Vision Theory and Applications (VISAPP). SCITEPRESS Digital Library, (2013)Google Scholar
  47. 47.
    Rigaud, C., Tsopze, N., Burie, J.-C., Ogier, J.-M.: Robust frame and text extraction from comic books. In: Kwon, Y.-B., Ogier, J.-M. (eds.) Graphics Recognition. New Trends and Challenges. Lecture Notes in Computer Science, vol. 7423, pp. 129–138. Springer, Berlin (2013)CrossRefGoogle Scholar
  48. 48.
    Robin Varnum, G., Christina, T.: The Language of Comics: Word and Image. University Press of Mississippi, Mississippi (2007). Studies in Popular CultureGoogle Scholar
  49. 49.
    Sarwar, S., Qayyum, Z. U., Majeed, S.: Ontology based image retrieval framework using qualitative semantic image descriptions. In: Procedia Computer Science, 17th International Conference in Knowledge Based and Intelligent Information and Engineering Systems—KES2013 22:285–294, (2013)Google Scholar
  50. 50.
    Singh, S., Cheok, A. D., Ng, G. L., Farbiz, F.: 3d augmented reality comic book and notes for children using mobile phones. In: Proceedings of the 2004 Conference on Interaction Design and Children: Building a Community, IDC ’04, pp. 149–150, ACM, New York, (2004)Google Scholar
  51. 51.
    Sirin, E., Parsia, B., Cuenca Grau, B., Kalyanpur, A., Katz, Y.: Pellet: a practical OWL-DL reasoner. Web Semant. Sci. Serv. Agents World Wide Web 5(2), 51–53 (2007)CrossRefGoogle Scholar
  52. 52.
    Smith, R.: An overview of the tesseract ocr engine. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition—vol. 02, ICDAR ’07, pp. 629–633, IEEE Computer Society, Washington, DC, (2007)Google Scholar
  53. 53.
    Stommel, M., Merhej, L. I., Müller, M. G.: Segmentation-free detection of comic panels. In: Computer Vision and Graphics, pp. 633–640. Springer, (2012)Google Scholar
  54. 54.
    Su, C.-Y., Chang, R.-I., Liu, J.-C.: Recognizing text elements for svg comic compression and its novel applications. In: Proceedings of the 11th International Conference on Document Analysis and Recognition, ICDAR ’11, pp. 1329–1333, IEEE Computer Society, Washington, DC, (2011)Google Scholar
  55. 55.
    Sun, W., Kise, K.: Detection of exact and similar partial copies for copyright protection of manga. Int. J. Doc. Anal. Recognit. (IJDAR) 16(4), 331–349 (2013)CrossRefGoogle Scholar
  56. 56.
    Sun, W., Kise, K., Burie, J.-C., Ogier, J.-M.: Specific comic character detection using local feature matching. In: Proceedings of International Conference on Document Analysis and Recognition (ICDAR 2013), Washington, USA, (2013)Google Scholar
  57. 57.
    Suzuki, S., et al.: Topological structural analysis of digitized binary images by border following. Comput. Vision Graph. Image Process. 30(1), 32–46 (1985)zbMATHCrossRefGoogle Scholar
  58. 58.
    Tanaka, T., Shoji, K., Toyama, F., Miyamichi, J.: Layout analysis of tree-structured scene frames in comic images. In: IJCAI’07, pp. 2885–2890, (2007)Google Scholar
  59. 59.
    Thomas, E.: Invisible Art, Invisible Planes, Invisible People. Multicultural Comics: From Zap to Blue Beetle. University of Texas Press, Texas (2010)Google Scholar
  60. 60.
    Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM (JACM) 21(1), 168–173 (1974)zbMATHMathSciNetCrossRefGoogle Scholar
  61. 61.
    Yamada, M., Budiarto, R., Endo, M., Miyazaki, S.: Comic image decomposition for reading comics on cellular phones. IEICE Trans. 87–D(6), 1370–1376 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Christophe Rigaud
    • 1
    • 2
  • Clément Guérin
    • 1
  • Dimosthenis Karatzas
    • 2
  • Jean-Christophe Burie
    • 1
  • Jean-Marc Ogier
    • 1
  1. 1.Laboratoire L3iUniversité de La RochelleLa Rochelle Cedex 1France
  2. 2.Computer Vision CenterUniversitat Autònoma de BarcelonaBellaterra (Barcelona)Spain

Personalised recommendations