Skip to main content

A Robust Two Level Classification Algorithm for Text Localization in Documents

  • Conference paper
Book cover Advances in Visual Computing (ISVC 2007)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4842))

Included in the following conference series:

Abstract

This paper describes a two level classification algorithm to discriminate the handwritten elements from the printed text in a printed document. The proposed technique is independent of size, slant, orientation, translation and other variations in handwritten text. At the first level of classification, we use two classifiers and present a comparison between the nearest neighbour classifier and Support Vector Machines(SVM) classifier to localize the handwritten text. The features that are extracted from the document are seven invariant central moments and based on these features, we classify the text as hand-written. At the second level, we use Delaunay triangulation to reclassify the misclassified elements. When Delaunay triangulation is imposed on the centroid points of the connected components, we extract features based on the triangles and reclassify the misclassified elements. We remove the noise components in the document as part of the pre-processing step.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Imade, S., Tatsuta, S., Wada, T.: Segmentation and Classification for Mixed Text/Image Documents Using Neural Network. In: Proceedings of 2nd ICDAR, pp. 930–934 (1993)

    Google Scholar 

  2. Kuhnke, K., Simonicini, L., Kovacs-V, Z.: A System for Machine-Written and Hand-Written Character Distinction. In: Proceedings of 3rd ICDAR, pp. 811–814 (1995)

    Google Scholar 

  3. Guo, J.K., Ma, M.Y.: Separating handwritten material from machine printed text using hidden markov models. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 439–443 (2001)

    Google Scholar 

  4. Fan, K., Wang, L., Tu, Y.: Classification of Machine-Printed and Hand-Written Texts Using Character Block Layout Variance. Pattern Recognition 31(9), 1275–1284 (1998)

    Article  Google Scholar 

  5. Pal, U., Chaudhuri, B.: Automatic Separation of Machine-Printed and Hand-Written Text Lines. In: Proceedings of 5th ICDAR, Bangalore, India, pp. 645–648 (1999)

    Google Scholar 

  6. Ramteke, R.J., Mehrotra, S.C.: Feature Extraction Based on Invariants Moment for Handwriting Recognition. In: Proc. of 2nd IEEE Int. Conf. on Cybernetics Intelligent System (CIS2006), Bangkok (June 2006)

    Google Scholar 

  7. Gonzalez, R.C., Woods, R.E.: Digital Image Processing, Pearson Education (2002)

    Google Scholar 

  8. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines, Software (2001), available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  9. Davoine, F., et al.: Fractal image compression based on Delaunay triangulation and vector quantization. IEEE Trans. Image Process 5(2), 338–346 (1996)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

George Bebis Richard Boyle Bahram Parvin Darko Koracin Nikos Paragios Syeda-Mahmood Tanveer Tao Ju Zicheng Liu Sabine Coquillart Carolina Cruz-Neira Torsten Müller Tom Malzbender

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kandan, R., Reddy, N.K., Arvind, K.R., Ramakrishnan, A.G. (2007). A Robust Two Level Classification Algorithm for Text Localization in Documents. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2007. Lecture Notes in Computer Science, vol 4842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76856-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76856-2_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76855-5

  • Online ISBN: 978-3-540-76856-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics