Abstract
In this paper we present an approach to the problem of segmenting and identifying handwritten annotations in noisy document images. In many types of documents such as correspondence, it is not uncommon for handwritten annotations to be added as part of a note, correction, clarification, or instruction, or a signature to appear as an authentication mark. It is important to be able to segment and identify such handwriting so we can 1) locate, interpret and retrieve them efficiently in large document databases, and 2) use different algorithms for printed/handwritten text recognition and signature verification. Our approach consists of two processes: 1) a segmentation process, which divides the text into regions at an appropriate level (character, word, or zone), and 2) a classification process which identifies the segmented regions as handwritten. To determine the approximate region size where classification can be reliably performed, we conducted experiments at the character, word and zone level. We found that the reliable results can be achieved at the word level with a classification accuracy of 97.3%. The identified handwritten text is further grouped into zones and verified to reduce false alarms. Experiments show our approach is promising and robust.
Chapter PDF
References
S. N. Srihari, Y. C. Shim and V. Ramanaprasad. A system to read names and address on tax forms. Technical Report CEDAR-TR-94-2, CEDAR, SUNY, Buffalo, 1994
K. C. Fan, L. S. Wang and Y. T. Tu. Classification of machine-printed and handwritten texts using character block layout variance. Pattern Recognition, 31(9), pages 1275–1284, 1998
V. Pal and B. B. Chaudhuri. Machine-printed and handwritten text lines identification. Pattern Recognition Letters, 22, pages 431–441, 2001
J. Fanke and M. Oberlander. Writing style detection by statistical combination of classifier in form reader applications. In Proc. of the 2 nd Inter. Conf. On Document Analysis & Recognition, pages 581–584, 1993
J. K. Guo and M. Y. Ma. Separating handwritten material from machine printed text using hidden Markov models. In Proc. of the 6 th Inter. Conf. On Document Analysis & Recognition, pages 439–443, 2001
Y. Zheng, C. Liu and X. Ding. Single character type identification. In Proc. of SPIE Vol. 4670, Document Recognition & Retrieval IX, pages 49–56, 2001
K. Kuhnke, L. Simoncini and Zs. M. Kovacs-V. A system for machine-written and handwritten character distinction. In Proc. of the 3 rd Inter. Conf. On Document Analysis & Recognition, pages 811–814, 1995
S. Mao and T. Kanungo. Empirical performance evaluation methodology and its application to page segmentation algorithms. IEEE Trans. on Pattern Analysis and Machine Intelligence, 23(3), pages 242–256, 2001
L. O'Gorman. The document spectrum for page layout analysis. IEEE Trans. on Pattern Analysis & Machine Intelligence, 15(11), pages 1162–1173, 1993
G. Nagy, S. Seth and S. Stoddard. Document analysis with an expert system. Pattern Recognition in Practice II, Elsevier Science, pages 149–155, 1984
D. Doermann and J. Liang. Binary document image using similarity multiple texture features. In Proc. of Symposium on Document Image Understanding Technology, pages 181–193, 2001
A. K. Jain and S. Bhattacharjee. Text segmentation using Gabor filters for automatic document processing. Machine Vision Application, 5, pages 169–184, 1992
A. Soffer. Image categorization using texture features. In Proc. of the 4 th Inter. Conf. on Document Analysis & Recognition, pages 233–237, 1997
D. Gabor. Theory of communication. J. Inst. Elect. Engr. 93, pages 429–459, 1946
K. Fukunaga. Introduction to statistical pattern recognition. Second edition, Academic Press Inc. 1990
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zheng, Y., Li, H., Doermann, D. (2002). The Segmentation and Identification of Handwriting in Noisy Document Images. In: Lopresti, D., Hu, J., Kashi, R. (eds) Document Analysis Systems V. DAS 2002. Lecture Notes in Computer Science, vol 2423. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45869-7_12
Download citation
DOI: https://doi.org/10.1007/3-540-45869-7_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44068-0
Online ISBN: 978-3-540-45869-2
eBook Packages: Springer Book Archive