The Segmentation and Identification of Handwriting in Noisy Document Images

  • Yefeng Zheng
  • Huiping Li
  • David Doermann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2423)

Abstract

In this paper we present an approach to the problem of segmenting and identifying handwritten annotations in noisy document images. In many types of documents such as correspondence, it is not uncommon for handwritten annotations to be added as part of a note, correction, clarification, or instruction, or a signature to appear as an authentication mark. It is important to be able to segment and identify such handwriting so we can 1) locate, interpret and retrieve them efficiently in large document databases, and 2) use different algorithms for printed/handwritten text recognition and signature verification. Our approach consists of two processes: 1) a segmentation process, which divides the text into regions at an appropriate level (character, word, or zone), and 2) a classification process which identifies the segmented regions as handwritten. To determine the approximate region size where classification can be reliably performed, we conducted experiments at the character, word and zone level. We found that the reliable results can be achieved at the word level with a classification accuracy of 97.3%. The identified handwritten text is further grouped into zones and verified to reduce false alarms. Experiments show our approach is promising and robust.

Keywords

Document Image Gabor Filter Text Line Identification Accuracy Word Level 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    S. N. Srihari, Y. C. Shim and V. Ramanaprasad. A system to read names and address on tax forms. Technical Report CEDAR-TR-94-2, CEDAR, SUNY, Buffalo, 1994Google Scholar
  2. 2.
    K. C. Fan, L. S. Wang and Y. T. Tu. Classification of machine-printed and handwritten texts using character block layout variance. Pattern Recognition, 31(9), pages 1275–1284, 1998CrossRefGoogle Scholar
  3. 3.
    V. Pal and B. B. Chaudhuri. Machine-printed and handwritten text lines identification. Pattern Recognition Letters, 22, pages 431–441, 2001MATHCrossRefGoogle Scholar
  4. 4.
    J. Fanke and M. Oberlander. Writing style detection by statistical combination of classifier in form reader applications. In Proc. of the 2 nd Inter. Conf. On Document Analysis & Recognition, pages 581–584, 1993Google Scholar
  5. 5.
    J. K. Guo and M. Y. Ma. Separating handwritten material from machine printed text using hidden Markov models. In Proc. of the 6 th Inter. Conf. On Document Analysis & Recognition, pages 439–443, 2001Google Scholar
  6. 6.
    Y. Zheng, C. Liu and X. Ding. Single character type identification. In Proc. of SPIE Vol. 4670, Document Recognition & Retrieval IX, pages 49–56, 2001Google Scholar
  7. 7.
    K. Kuhnke, L. Simoncini and Zs. M. Kovacs-V. A system for machine-written and handwritten character distinction. In Proc. of the 3 rd Inter. Conf. On Document Analysis & Recognition, pages 811–814, 1995Google Scholar
  8. 8.
    S. Mao and T. Kanungo. Empirical performance evaluation methodology and its application to page segmentation algorithms. IEEE Trans. on Pattern Analysis and Machine Intelligence, 23(3), pages 242–256, 2001CrossRefGoogle Scholar
  9. 9.
    L. O'Gorman. The document spectrum for page layout analysis. IEEE Trans. on Pattern Analysis & Machine Intelligence, 15(11), pages 1162–1173, 1993CrossRefGoogle Scholar
  10. 10.
    G. Nagy, S. Seth and S. Stoddard. Document analysis with an expert system. Pattern Recognition in Practice II, Elsevier Science, pages 149–155, 1984Google Scholar
  11. 11.
    D. Doermann and J. Liang. Binary document image using similarity multiple texture features. In Proc. of Symposium on Document Image Understanding Technology, pages 181–193, 2001Google Scholar
  12. 12.
    A. K. Jain and S. Bhattacharjee. Text segmentation using Gabor filters for automatic document processing. Machine Vision Application, 5, pages 169–184, 1992CrossRefGoogle Scholar
  13. 13.
    A. Soffer. Image categorization using texture features. In Proc. of the 4 th Inter. Conf. on Document Analysis & Recognition, pages 233–237, 1997Google Scholar
  14. 14.
    D. Gabor. Theory of communication. J. Inst. Elect. Engr. 93, pages 429–459, 1946Google Scholar
  15. 15.
    K. Fukunaga. Introduction to statistical pattern recognition. Second edition, Academic Press Inc. 1990Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Yefeng Zheng
    • 1
  • Huiping Li
    • 1
  • David Doermann
    • 1
  1. 1.Laboratory for Language and Media Processing Institute for Advanced Computer StudiesUniversity of MarylandCollege Park

Personalised recommendations