Chinese Handwritten Character Segmentation in Form Documents

  • Jiun-Lin Chen
  • Chi-Hong Wu
  • Hsi-Jian Lee
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1655)

Abstract

This paper presents a pojection based method for segmenting handwritten Chinese characters in form documents with known structures. In the preprocessing phase, a noise removal method is proposed that preserves strike connections and character edge points. In the character segmentation phase, the projection profile analysis method is used to segment a text line image into projection blocks. In addition, projection blocks are classified into one of four types; mark, half-word, single-word, and two word. Large blocks are then split and small blocks are merged. In addition, an OCR system is adopted to eliminate errors resulting from the inappropriate merging of Chinese numerical characters with other characters. As for 1319 Chinese characters are tested during our experiments, the correct segmentation rates of 92.34% and 91.76% are obtained with and without the OCR module.

Keywords

Noise removal Projection profile analysis Form Document Processing Character segmentation Optical character recognition 

Reference]

  1. [1]
    Srihari, S.N., “ Document Image Understanding,” Proc. IEEE Computer Society Fall Joint Computer Conf., pp.87–96, 1886.Google Scholar
  2. [2]
    Wang, D. and Srihari, “ Analysis of Form Images,” Proc. 1st Internat. Conf. Document Anal. Recognition, pp.181–191, 1991.Google Scholar
  3. [3]
    Casey, R. G., D. R. Ferguson, K. Mohiuddin and E. Walach, “ Intelligent forms processing system,” Machine Vision and Applications, Vol. 5, pp. 143–155, 1992.CrossRefGoogle Scholar
  4. [4]
    Lam, S.W., L. Javanbakht, and S. N. Srihari, “ Anatomy of a form reader,” Proc. 2nd Intern. Conf. on December Analysis and Recognition, pp. 506–509, 1993.Google Scholar
  5. [5]
    L.A. Fletcher and R. Kasturi, “ A robust algorithm for text string separation from mixed text/graphic images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 10, No. 6, pp.910–918, 1998.CrossRefGoogle Scholar
  6. [6]
    R.G. Casey and E. Lecolinet, “ Survey of methods and strategies in character segmentation,” IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 18, No. 7, pp.680–706, July 1996.CrossRefGoogle Scholar
  7. [7]
    Y. Lu, “ Machine printed character segmentation-An overview,” Pattern Recognition, Vol. 28, No. 1, pp. 67–80, 1995.CrossRefGoogle Scholar
  8. [8]
    G. Seni and E. Cohen, “ External word segmentation of off-line handwritten text lines,” Pattern Recognition, Vol. 27, No. 1, pp. 41–52, 1994.CrossRefGoogle Scholar
  9. [9]
    C. C. Chiang and S. S. Yu, “ An interactive character segmentation method for irregularly formatted Chinese documents,” in Proceedings of the 5th Optical Character Recognition and Document Analysis, Chung Li, Taiwan, 1996, pp. 61–67.Google Scholar
  10. [10]
    E. Lecolinet and J. V. Moreau, “ A new system for automatic segmentation and recognition of unconstrained zip codes,” in Proceedings Sixth Scandinavian Conference Image Analysis, Oulu, Finland, June 1989, pp. 585.Google Scholar
  11. [11]
    Y. Lu and M. Shridhar, “ Character segmentation in handwritten words-An overview,” Pattern Recognition, Vol. 29, No. 1, 1996, pp. 77–96.CrossRefGoogle Scholar
  12. [12]
    W. Niblack, An Introduction to Digital Image Processing, Prentice Hall, 1986.Google Scholar
  13. [13]
    J. L. Chen and H. J. Lee, “ An efficient algorithm for form structure Extraction using Strip Projection,” Pattern Recognition, Vol. 31, No. 9, pp.1353–1368, 1998.CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Jiun-Lin Chen
    • 1
  • Chi-Hong Wu
    • 1
  • Hsi-Jian Lee
    • 1
  1. 1.Department of Computer Science and Information EngineeringNational Chiao Tung UniversityHsinchuTaiwan

Personalised recommendations