An Efficient Technique for Detection and Removal of Lines with Text Stroke Crossings in Document Images

Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 14)

Abstract

Precise automatic reading of the characters in a document image is the functionality of Optical Character Recognition (OCR) systems. The overall recognition accuracy can be accomplished only through efficient pre-processing procedures. The recognition of characters in pre-printed document images is a highly challenging task as it desires unique pre-processing methods and it depends on the layout of document. In this paper we propose a pre-processing technique for removal of horizontal/vertical lines in the pre-printed documents. The major challenge involved in removal of the horizontal lines is retention of the pixels overlapped between line and characters in document. The proposed algorithm works in two phases; image enhancement and line detection is made in the first phase and the second phase comprises convolution process using rectangular structuring element for detection of text stroke crossings on lines which are detected in phase one. The output image is further subjected to undergo post enhancement and analysis operations using connected component analysis and area features for removal of broken/dotted line structures. The experimental outcomes achieved are quite satisfactory and consistent enough for subsequent processing of document.

Keywords

Line removal Structuring elements Character crossings Character reconstruction Pre-printed documents Connected components 

References

  1. 1.
    Zheng Y, Li H, Doermann D (2005) A parallel-line detection algorithm based on HMM decoding. Pattern Anal Mach Intell IEEE Trans 27(5):777–792CrossRefGoogle Scholar
  2. 2.
    Zheng Y, Li H, Doermann D (2003) A model-based line detection algorithm in documents. In: Proceedings of seventh international conference on document analysis and recognition, 2003, pp 44–48. IEEEGoogle Scholar
  3. 3.
    Yoo J-Y, Kim M-K, Yong Han S, Kwon Y-B (1997) Information extraction from a skewed form document in the presence of crossing characters. In: Graphics recognition algorithms and systems, pp 139–148. Springer, BerlinGoogle Scholar
  4. 4.
    Abd-Almageed W, Kumar J, Doermann D (2009) Page rule-line removal using linear subspaces in monochromatic handwritten arabic documents. In: 10th international conference on document analysis and recognition, 2009 (ICDAR’09), pp 768–772. IEEEGoogle Scholar
  5. 5.
    Pietikäinen M, Okun, O (2001) Edge-based method for text detection from complex document images. In: Proceedings of sixth international conference on document analysis and recognition 2001, pp 286–291. IEEEGoogle Scholar
  6. 6.
    Chen J-L, Lee H-J (1998) An efficient algorithm for form structure extraction using strip projection. Pattern Recogn 31(9):1353–1368CrossRefGoogle Scholar
  7. 7.
    Gatos B, Danatsas D, Pratikakis I, Perantonis SJ (2005) Automatic table detection in document images. In: Pattern recognition and data mining, pp 609–618. Springer, BerlinGoogle Scholar
  8. 8.
    Al-Faris AQ, Mohamad D, Ngah UK, Isa NAM (2011) Handwritten characters extraction from form based on line shape characteristics. J Comput Sci 7(12):1778Google Scholar
  9. 9.
    Kong B, Chen SS, Haralick RM, Phillips IT (1995) Automatic line detection in document images using recursive morphological transforms. In: IS&T/SPIE’s symposium on electronic imaging: science & technology, pp 163–174. International Society for Optics and PhotonicsGoogle Scholar
  10. 10.
    Parker JR (2010) Algorithms for image processing and computer vision. WileyGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceAmrita UniversityMysoreIndia
  2. 2.Maharaja Research FoundationMaharaja Institute of TechnologyMysoreIndia

Personalised recommendations