Cursive Handwritten Text Document Preprocessing Methodologies

Chapter
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 247)

Abstract

Handwritten text recognition offers a new way of improving the human computer interface and offer integrating computers better into human society. This chapter details the complete preprocessing stage in handwritten text document recognition. The input for this type of text document recognition system is a scanned image of handwritten text and the output is the normalized and segmented text lines. We have explained simple and efficient general sub steps of preprocessing namely binarization, line separation, skew normalization and slant correction of handwritten text. For binarization we have proposed a fast adaptive approach which gives good results. Line segmentation is done using bottom up grouping approach. We have further used a connectivity strength parameter for extraction of connected components (strokes) of same line from minimum spanning tree of given connected components. Quantitative analysis shows that this approach gives better results compared to others for line separation in the presence of touched or intermingled lines and also solving the Hill-and-Dale writing styles. Further for Skew normalization we use orthogonal projection approach which detects the exact skew angle without calculating the baseline separately. A variety of approaches have been explored for each step. We present a complete preprocessing suite for handling cursive handwritten text documents.

Keywords

Binarization Handwritten text recognition Line segmentation Normalization Preprocessing Skew 

References

  1. 1.
    Horst B (2003) Recognition of cursive Roman handwriting- past, peresent and future. Proceeding of the seventeen international conference on document analysis and recognition ICDAR, In, pp 448–459Google Scholar
  2. 2.
    Neeta N, Ankit A, Anshul T, Gourav J (2012) Neural network based cursive handwritting recognition, lecture notes in engineering and computer science. In: Proceedings of the World congress on engineering and computer science, WCECS 2012, 24–26 October, San Francisco, USA pp 692–698Google Scholar
  3. 3.
    Otsu N (January 1978) A threshold selection method from Gray levelhistogram. IEEE Trans Syst, Man, Cybern 19:62–66Google Scholar
  4. 4.
    Niblack W (1986) An introduction to digital image processing. Prentice-Hall, New JerseyGoogle Scholar
  5. 5.
    Wu V, Manmatha (Jan 1998) Document image clean-up and binarization. Proc SPIE Conf Document Recog 3:18–23Google Scholar
  6. 6.
    Liu Y, Srihari SN (May 1997) Document image binarization based on texture features. IEEE Trans PAMI 19(5):540–544Google Scholar
  7. 7.
    Zamora-Martinez F, Castro-Bleda MJ, Espaa-Boquera S, Gorbe-Moya J (2010) The 2010 international joint conference on unconstrained offline handwriting recognition sing connectionist character N-grams, neural networks (IJCNN) pp 1–7, 18–23July 2010.Google Scholar
  8. 8.
    Kumar M, Jindal MK, Sharma RK (2011) K-nearest neighbour based offline handwritten gurumukhi Character recognition. In: International IEEE conference on image, information processing(ICHP 2011) vol 1, pp 7–11.Google Scholar
  9. 9.
    Sarfraj M, Rasheed Z (2008) Skew estimation and correction of text using bounding box. Fifth IEEE conference on computer graphics, imaging and visualization, In, pp 259–264Google Scholar
  10. 10.
    Sarfraz M, Mahmoud SA, Rasheed Z (2007) On skew estimation and correction of text, computer graphics, imaging and visualization IEEE computer society USA pp 308–313.Google Scholar
  11. 11.
    Nagy G, Seth S, Viswanathan M (1992) A prototype document image analysis system for technical journals. Computer 25(7):10–22CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.Malaviya National Institute of Technology JaipurJaipurIndia

Personalised recommendations