Advertisement

Character segmentation and transcription system for historical Japanese books with a self-proliferating character image database

  • Chulapong PanichkriangkraiEmail author
  • Liang Li
  • Takaaki Kaneko
  • Ryo Akama
  • Kozaburo Hachimura
Original Paper

Abstract

This paper describes an interactive system for assisting transcription work for digitized historical woodblock-printed Japanese books published in the seventeenth to nineteenth centuries. The main functions of the system include layout analysis, character segmentation, transcription, and the generation of a character image database. The procedures for using the system consist of two major phases. In the first phase, the system automatically produces provisional character segmentation data, and users interactively edit the segmentation results and transcribe them into text data. Information obtained in this phase is stored in the character image database. In the second phase, the system performs automatic character segmentation and transcription by using the database generated in the first phase. Through repeated applications of these two phases to a variety of materials, the contents of the character image database will be enhanced, and the system’s performance in character segmentation and transcription will increase accordingly. Since the scheme looks like the fact that the parents produce their children and the children produce grandchildren and so on, successively, this database is called as self-proliferating database. The experiment showed that when the number of character images in the database increased, the transcription accuracy also increased accordingly. In the experiment, when the size of the database increased to 37,000, the segmentation accuracy reached 83.7%, whereas the transcription accuracy reached 69.1%.

Notes

Acknowledgements

The authors would like to thank Mr. Hideo Toyama, who is the researcher in classical Japanese literature, for giving us valuable comments and advices. We would also like to show our gratitude to Prof. Keiko Suzuki and Dr. Worawat Choensawat for supporting this research.

References

  1. 1.
    Hioki, K.: Japanese printed books of the Edo period (1603–1867): history and characteristics of block-printed books. J. Inst. Conserv. 32(1), 79–101 (2009)CrossRefGoogle Scholar
  2. 2.
    Smith, H.D., Henry, D.: The history of the book in Edo and Paris. In: McClain, J.L., Merriman, J., Ugawa, K. (eds.) Edo and Paris: Urban Life and the State in the Early Modern Era, pp. 332–352. Cornell University Press, Ithaca (1994)Google Scholar
  3. 3.
    Kimura, F., Wakabayashi, T., Tsuruoka, S., Miyake, Y.: Improvement of handwritten Japanese character recognition using weighted direction code histogram. Pattern Recognit. 30(8), 1329–1337 (1997)CrossRefGoogle Scholar
  4. 4.
    Srihari, S.N., Srikantan, G., Hong, T., Lam, S.: Research in Japanese OCR. In: Bunke, H., Wang, P.S.P. (eds.) Handbook of Character Recognition and Document Image Analysis, pp. 357–380. World Scientific Publishing Company, Singapore (1997)Google Scholar
  5. 5.
    Umeda, M.: Advances in recognition methods for handwritten Kanji characters. IEICE Trans. Inf. Syst. 79(5), 401–410 (1996)Google Scholar
  6. 6.
    Altamura, O., Esposito, F., Malerba, D.: Transforming paper documents into XML format with WISDOM++. Int. J. Doc. Anal. Recognit. 4(1), 2–17 (2001)CrossRefGoogle Scholar
  7. 7.
    Ntzios, K., Gatos, B., Pratikakis, I., Konidaris, T., Perantonis, S.J.: An old Greek handwritten OCR system based on an efficient segmentation-free approach. Int. J. Doc. Anal. Recognit. 9(2–4), 179–192 (2007)CrossRefGoogle Scholar
  8. 8.
    Mello, C.A., Oliveira, A.L., Sánchez, A.: PROHIST: an environment for image processing of historical documents. In: Proceedings of the Workshop on Digital Libraries, pp. 143–147 (2007)Google Scholar
  9. 9.
    Ramel, J.Y., Busson, S., Demonet, M.L.: AGORA: the interactive document image analysis tool of the BVH project. In: Proceedings of Second International Conference on Document Image Analysis for Libraries 2006 (DIAL’06) (2006)Google Scholar
  10. 10.
    Roy, P.P., Ramel, J.Y., Ragot, N.: Word retrieval in historical document using character-primitives. In: 2011 Proceedings of International Conference on Document Analysis and Recognition (ICDAR), pp. 678–682 (2011)Google Scholar
  11. 11.
    Manmatha, R., Rothfeder, J.L.: A scale space approach for automatically segmenting words from historical handwritten documents. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1212–1225 (2005)CrossRefGoogle Scholar
  12. 12.
    Toselli, A.H., Romero, V., Pastor, M., Vidal, E.: Multimodal interactive transcription of text images. Pattern Recognit. 43(5), 1814–1825 (2010)CrossRefzbMATHGoogle Scholar
  13. 13.
    Ramel, J.Y., Sidère, N., Rayar, F.: Interactive layout analysis, content extraction, and transcription of historical printed books using Pattern Redundancy Analysis. Lit. Linguist. Comput. 28(2), 301–314 (2013)CrossRefGoogle Scholar
  14. 14.
    Bosch, V., Bordes-Cabrera, I., Muñoz, P.C., Hernández-Tornero, C., Leiva, L.A., Pastor, M., Vidal, E.: Computer-assisted transcription of a historical botanical specimen book: organization and process overview. In: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage, pp. 125–130 (2014)Google Scholar
  15. 15.
    Yamada, S.: Japanese Old Kana OCR based on higher order local autocorrelation features. IPSJ-SIG Report 25-3, pp. 21–30 (1995) (in Japanese)Google Scholar
  16. 16.
    Hioki, S., Uehara, K., Kawaguchi, H.: Recognition of hand writing historical Japanese characters. IPSJ-SIG Report 37-6, pp. 35–42 (1998) (in Japanese)Google Scholar
  17. 17.
    Yamada, S., Shibayama, M.: A study of a historical document research supporting system using n-gram. In: Proceedings of Computers and the Humanities Symposium, pp. 185–192 (2000) (in Japanese)Google Scholar
  18. 18.
    Yamada, S., Kato, N., Kawaguchi, H., Hara, S., Ishitani, Y., Shibayama, M., Kasaya, K., Kojima, M., Umeda, M., Yamamoto, K.: Project report on development for a historical document research supporting system (1) Project outline. IPSJ-SIG Report, CH 45–1, pp. 1–8 (2000) (in Japanese)Google Scholar
  19. 19.
    Yamada, S., Kato, N., Namiki, M., Kawaguchi, H., Hara, S., Ishitani, Y., Kojima, M., Umeda, M., Yamamoto, K., Shibayama, M.: Historical Character Recognition (HCR) Project Report (2). IPSJ-SIG Report, CH 50–2, pp. 9–16 (2001) (in Japanese)Google Scholar
  20. 20.
    Kitadai, A., Takakura, J., Ishikawa, M., Nakagawa, M., Baba, H., Watanabe, A.: Document image retrieval to support reading mokkans. In: Proceedings of The Eighth IAPR International Workshop on Document Analysis Systems 2008 (DAS’08), pp. 533–538 (2008)Google Scholar
  21. 21.
    Terasawa, K., Nagasaki, T., Kawashima, T.: Eigenspace method for text retrieval in historical document images. In: Proceedings of The Eighth International Conference on Document Analysis and Recognition 2005 (ICDAR’05), pp. 437–441 (2005)Google Scholar
  22. 22.
    Terasawa, K., Tanaka, Y.: Slit style HOG feature for document image word spotting. In: Proceedings of 10th International Conference on Document Analysis and Recognition, 2009 (ICDAR’09), pp. 116–120 (2009)Google Scholar
  23. 23.
    Hara, S.: OCR for Japanese classical documents segmentation of cursive characters. In: Proceedings of the International Conference on Information Technology and Applications (2002)Google Scholar
  24. 24.
    Ozaki, K., Shibayama, M., Yamada, S., Araki, Y.: Title character segmentation for historical document images. In: Proceedings of the Humanities and Computer Symposium, pp. 279–286 (2000) (In Japanese)Google Scholar
  25. 25.
    Panichkriangkrai, C., Li, L., Walker, R., Hachimura, K.: Image analysis for historical Japanese book archives. Int. J. Asian Bus. Inf. Manag. 5(2), 1–11 (2014).  https://doi.org/10.4018/ijabim.2014040101 CrossRefGoogle Scholar
  26. 26.
    Panichkriangkrai, C., Li, L., Hachimura, K.: Character segmentation for Japanese woodblock printed historical books. In: Proceedings of 10th IAPR International Workshop on Document Analysis Systems (2012)Google Scholar
  27. 27.
    Panichkriangkrai, C., Li, L., Suzuki, K., Akama, R., Hachimura, K.: An interactive transcription support system for woodblock-printed Japanese historical book images. In: Proceedings of 1st International Conference on Advanced Imaging (ICDAR), pp. 551–554 (2015)Google Scholar
  28. 28.
    Panichkriangkrai, C., Li, L., Suzuki, K., Akama, R., Hachimura, K.: Character image database of woodblock-printed Japanese historical book images. In: Proceedings of International Conference on Culture and Computing, pp. 201–202 (2015)Google Scholar
  29. 29.
    Aithal, P.K., Rajesh, G., Acharya, D.U., Siddalingaswamy, P.C.: A fast and novel skew estimation approach using radon transform. Int. J. Comput. Inf. Syst. Indus. Manag. Appl. 5, 337–344 (2013)Google Scholar
  30. 30.
    Gatos, B., Pratikakis, I., Perantonis, S.J.: An adaptive binarization technique for low quality historical documents. In: Proceedings of International Workshop on Document Analysis Systems, pp. 102–113. Springer, Berlin (2004)Google Scholar
  31. 31.
    Newell, A.J., Griffin, L.D.: Multiscale histogram of oriented gradient descriptors for robust character recognition. In: Proceedings of 2011 International Conference on Document Analysis and Recognition, pp. 1085–1089 (2011)Google Scholar
  32. 32.
    Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. VLDB 99(6), 518–529 (1999)Google Scholar
  33. 33.
    Goto, T. (ed.): Chinsetsuyumiharizuki, Vols. 60 and 61 of Nihon koten bungaku taikei, Iwanami Shoten (1958)Google Scholar
  34. 34.
    Satake, A. (ed.): Mukashigatariinazumabyoshi, Vol. 85 of Shin nihon koten bungaku taikei, Iwanami Shoten (1990)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Graduate School of Science and EngineeringRitsumeikan UniversityShigaJapan
  2. 2.College of Information Science and EngineeringRitsumeikan UniversityShigaJapan
  3. 3.Kinugasa Research OrganizationRitsumeikan UniversityKyotoJapan
  4. 4.College of LettersRitsumeikan UniversityKyotoJapan

Personalised recommendations