Abstract
This paper describes a two level classification algorithm to discriminate the handwritten elements from the printed text in a printed document. The proposed technique is independent of size, slant, orientation, translation and other variations in handwritten text. At the first level of classification, we use two classifiers and present a comparison between the nearest neighbour classifier and Support Vector Machines(SVM) classifier to localize the handwritten text. The features that are extracted from the document are seven invariant central moments and based on these features, we classify the text as hand-written. At the second level, we use Delaunay triangulation to reclassify the misclassified elements. When Delaunay triangulation is imposed on the centroid points of the connected components, we extract features based on the triangles and reclassify the misclassified elements. We remove the noise components in the document as part of the pre-processing step.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Imade, S., Tatsuta, S., Wada, T.: Segmentation and Classification for Mixed Text/Image Documents Using Neural Network. In: Proceedings of 2nd ICDAR, pp. 930–934 (1993)
Kuhnke, K., Simonicini, L., Kovacs-V, Z.: A System for Machine-Written and Hand-Written Character Distinction. In: Proceedings of 3rd ICDAR, pp. 811–814 (1995)
Guo, J.K., Ma, M.Y.: Separating handwritten material from machine printed text using hidden markov models. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 439–443 (2001)
Fan, K., Wang, L., Tu, Y.: Classification of Machine-Printed and Hand-Written Texts Using Character Block Layout Variance. Pattern Recognition 31(9), 1275–1284 (1998)
Pal, U., Chaudhuri, B.: Automatic Separation of Machine-Printed and Hand-Written Text Lines. In: Proceedings of 5th ICDAR, Bangalore, India, pp. 645–648 (1999)
Ramteke, R.J., Mehrotra, S.C.: Feature Extraction Based on Invariants Moment for Handwriting Recognition. In: Proc. of 2nd IEEE Int. Conf. on Cybernetics Intelligent System (CIS2006), Bangkok (June 2006)
Gonzalez, R.C., Woods, R.E.: Digital Image Processing, Pearson Education (2002)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines, Software (2001), available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Davoine, F., et al.: Fractal image compression based on Delaunay triangulation and vector quantization. IEEE Trans. Image Process 5(2), 338–346 (1996)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kandan, R., Reddy, N.K., Arvind, K.R., Ramakrishnan, A.G. (2007). A Robust Two Level Classification Algorithm for Text Localization in Documents. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2007. Lecture Notes in Computer Science, vol 4842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76856-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-76856-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76855-5
Online ISBN: 978-3-540-76856-2
eBook Packages: Computer ScienceComputer Science (R0)