Segmentation of Handwritten Characters for Digitalizing Korean Historical Documents

  • Min Soo Kim
  • Kyu Tae Cho
  • Hee Kue Kwag
  • Jin Hyung Kim
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3163)

Abstract

The historical documents are valuable cultural heritages and sources for the study of history, social aspect and life at that time. The digitalization of historical documents aims to provide instant access to the archives for the researchers and the public, who had been endowed with limited chance due to maintenance reasons. However, most of these documents are not only written by hand in ancient Chinese characters, but also have complex page layouts. As a result, it is not easy to utilize conventional OCR(optical character recognition) system about historical documents even if OCR has received the most attention for several years as a key module in digitalization. We have been developing OCR-based digitalization system of historical documents for years. In this paper, we propose dedicated segmentation and rejection methods for OCR of Korean historical documents. Proposed recognition-based segmentation method uses geometric feature and context information with Viterbi algorithm. Rejection method uses Mahalanobis distance and posterior probability for solving out-of-class problem, especially. Some promising experimental results are reported.

Keywords

Linear Discriminant Analysis Mahalanobis Distance Chinese Character Historical Document Optical Character Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Hara, S.: OCR for CJK classical texts preliminary examination. In: Proc. Pacific Neighborhood Consortium(PNC) Annual Meeting, Taipei, Taiwan, pp. 11–17 (2000)Google Scholar
  2. 2.
    Lixin, Z., Ruwei, D.: Off-line handwritten Chinese characterrecognition with nonlinear pre-classification. In: Proc. Inc. Conf. On Multimodal Interfaces (ICMI 2000), pp. 473–479 (2000)Google Scholar
  3. 3.
    Kim, M.S., Jang, M.D., Choi, H.I., Rhee, T.H., Kim, J.H.: Digitalizing Scheme of Handwritten Hanja Historical Documents. In: Proc. Document Image Analysis of Libraries(DIAL 2004), Palo Alto, California, pp. 321–327 (2004)Google Scholar
  4. 4.
    Tung, C.H., Lee, H.J., Tsai, J.Y.: Multi-stage precandidate selection in handwritten Chinese character recognition system. Pattern Recognition 27(8), 1093–1102 (1994)CrossRefGoogle Scholar
  5. 5.
    Tong, L.C., Tan, S.L.: Speeding up Chinese character recognition in an automatic document reading system. Pattern Recognition 31(11), 1601–1612 (1998)CrossRefGoogle Scholar
  6. 6.
    Chen, Q., Zhen, L.: Word Segmentation in Handwritten Chinese Text Image Based on Component Clustering Techniques. In: Proc. 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering, vol. 1, pp. 435–440 (2002)Google Scholar
  7. 7.
    Zhao, S., Chi, Z., Shi, P., Yan, H.: Two-stage segmentation of unconstrained handwritten Chinese characters. Pattern Recognition 36, 145–156 (2003)MATHCrossRefGoogle Scholar
  8. 8.
    Tseng, Y.H., Lee, H.J.: Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm. Pattern Recognition Letters 20, 791–806 (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Min Soo Kim
    • 1
  • Kyu Tae Cho
    • 1
  • Hee Kue Kwag
    • 2
  • Jin Hyung Kim
    • 1
  1. 1.CS DivKorea Advanced Institute of Science and TechnologyDaejeonRepublic of Korea
  2. 2.Dongbang SnC Co., Ltd.SeoulKorea

Personalised recommendations