Pattern Recognition Approaches to Japanese Character Recognition

  • Soumendu Das
  • Sreeparna Banerjee
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 166)


Optical character recognition is a crucial step in the document retrieval and analysis. However this process could be error prone, especially in Japanese language, where the text is composed from over 3000 characters which can be classified as syllabic characters, or Kana, and ideographic characters, called Kanji. Moreover, Japanese text does not have delimiters like spaces, separating different words. Also, the fact that several characters could be homomorphic, i.e. having similar shape definition could add to the complexity of the recognition process. In the note, a survey has been conducted of some of the approaches that have been attempted to address these issues and devise schemes for Japanese character recognition in texts. Also, our efforts to extract Japanese text using image processing techniques have been described and some of the results have been presented.


Japanese character recognition hiragana katakana and kanji document retrieval 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dahl, D.A., Norton, L.M., Taylor, S.L.: Improving OCR accuracy with linguistic knowledge. In: Proc. Second Ann. Symp. Document Analysis and Information Retrieval, pp. 169–177 (1993)Google Scholar
  2. 2.
    Niwa, N., Kayashima, K., Shimeki, Y.: Postprocessing for character recognition using keyword information. In: IAPR Workshop Machine Vision Applications, pp. 519–522 (1992)Google Scholar
  3. 3.
    Hull, J.J., Li, Y.: Word recognition result interpretation using the vector space model for information retrieval. In: Proc. Second Ann. Syrup. Document Analysis and Information Retrieval, pp. 147–155 (1993)Google Scholar
  4. 4.
    Tanaka, Y., Torii, H.: Transmedia machine and its keyword search over image texts. In: Proc. RlAO 1988, pp. 248–258 (1988)Google Scholar
  5. 5.
    Trenkle, J.M., Vogt, R.C.: Word recognition forinformation retrieval in the image domain. In: Proc. Second Ann. Symp. Document Analysis and Information Retrieval, pp. 105–122 (1993)Google Scholar
  6. 6.
    Hull, J.J.: Document image matching and retrieval with multiple distortion–invariant descriptors. In: Proc. IAPR Workshop on Document Analysis Systems, pp. 383–399 (1994)Google Scholar
  7. 7.
    Fujisawa, H., Hatakeyama, A., Nakano, Y., Higashino, J., Hananoi, T.: Document storage and retrieval system. U.S. Patent 4985863 (1986)Google Scholar
  8. 8.
    Senda, S., Minoh, M., lkeda, K.: Document image retrieval system using character candidates generated by character recognition process. In: Proc. Second Int. Conf. Document Analysis and Recognition, pp. 541–546 (1993)Google Scholar
  9. 9.
    Marukawa, K., Hu, T., Fujisawa, H., Shima, Y.: Document retrieval tolerating character recognition errors—evaluation and application. Pattern Recognition 30(8), 1361–1371 (1997); Oriental Character RecognitionCrossRefGoogle Scholar
  10. 10.
    Itoh, N.: Japanese language model based on bigrams and its application to on-line character recognition. PR 28(2), 135–141 (1995)Google Scholar
  11. 11.
    Maruyama, K.-I., Maruyama, M., Miyao, H., Nakano, Y.: Handprinted Hiragana recognition using upport vector machines. In: Proceedings of Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 55–60 (2002), doi:10.1109/IWFHR.2002.1030884Google Scholar
  12. 12.
    Sainio, M., Bingushi, K., Bertram, R.: The role of interword spacing in reading Japanese: An eye movement study. Vision Research 47(20), 2577–2586 (2007)CrossRefGoogle Scholar
  13. 13.
    Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 59–69 (1982)CrossRefMATHMathSciNetGoogle Scholar
  14. 14.
    Kohonen, T.: Self-Organization and Associative Memory, 2nd edn. Springer, New York (1988)CrossRefMATHGoogle Scholar
  15. 15.
    Hung, D., Cheng, H., Sengkhamyong, S.: Design of a Hardware Accelerator for Real-Time Moment Computation: A Wavefront Array Approach. IEEE Transactions on Industrial Electronics 46(1) (February 1999)Google Scholar
  16. 16.
    Barnes, D., Manic, M.: STRICR-FB, a Novel Size-Translation- Rotation-Invariant Character Recognition Method. In: 2010 3rd Conference on Human System Interactions (HSI), pp. 163–168 (May 2010)Google Scholar
  17. 17.
    Kim, S.H.: Performance Improvement Strategies on Template Matching for Large Set Character Recognition. In: Proc. 17th International Conference on Computer Processing of Oriental Languages, Hong Kong, pp. 250–253 (April 1997)Google Scholar
  18. 18.
    Tung, C.H., Lee, H.J., Tsai, J.Y.: Multi-stage pre-candidate selection in handwritten Chinese character recognition systems. Pattern Recognition 27(8), 1093–1102 (1994)CrossRefGoogle Scholar
  19. 19.
    Takahashi, H., Griffin, T.D.: Recognition enhancement by linear tournament verification. In: Proc. 2nd ICDAR, Tsukuba, Japan, pp. 585–588 (1993)Google Scholar
  20. 20.
    Kim, D.H., Hwang, Y.S., Park, S.T., Kim, E.J., Paek, S.H., Bang, S.Y.: Handwritten Korean character image database PE92. In: Proc. 2nd ICDAR, Tsukuba, Japan, pp. 470–473 (1993)Google Scholar

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.School of Engineering TechnologyWest Bengal University of TechnologyKolkataIndia

Personalised recommendations