A Robust Thinning Algorithm for Straightening of Curved Text Line

  • Brijmohan Singh
  • Sudhir Goswami
  • Puneet Goyal
  • Ankush Mittal
Conference paper
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 131)

Abstract

The text in stylistic documents may have different orientations; the text lines may be curved in shape and they also may not be parallel to each other within a page. As a result, extraction and subsequent recognition of individual text lines and words in such documents is a difficult task. Thinning is one of the most crucial phases in the process of text recognition of characters to a single pixel notation and its success lies in its property to retain the original character shape. Thinning algorithms pose problems due to presence of distinct non-isolated boundaries and complex character shapes in different scripts and produce unwanted edges. This paper presents an improved thinning algorithm which does not produce unwanted edges to get the path of the text for the development of curved straightening system of Optical Character Recognition (OCR). When experimented on documents with either English or Hindi curved text, visual inspection of the results show that proposed method yields promising results.

Keywords

OCR Document analysis and recognition Curve straightening Thinning Stylish documents 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Marinai, S.: Introduction to document analysis and recognition. SCI, vol. 90, pp. 1–20 (2008)Google Scholar
  2. 2.
    Tang, C., Suen, Y., Yan, C.D., Cheriet, M.: Document analysis and understanding: a brief survey. In: Proceeding of 1st Int. Conf. on Document Analysis and Recognition, Saint-Malo, France, pp. 17–31 (October 1991)Google Scholar
  3. 3.
    Plamondon, R., Srihari, S.N.: On-line and off-line handwritten recognition: a comprehensive survey. IEEE Trans. on PAMI 22, 62–84 (2000)CrossRefGoogle Scholar
  4. 4.
    Nagy, G., Seth, S., Viswanathan, M.: A prototype document image analysis system for technical journals. Computer 25, 10–22 (1992)CrossRefGoogle Scholar
  5. 5.
    Pal, U., Tripathy, N.: Multi-oriented and curved text lines extraction from Indian documents. IEEE Trans. on Systems, Man, and Cybernetics—Part B: Cybernetics 34(4), 1676–1684 (2004)CrossRefGoogle Scholar
  6. 6.
    Roy, P.P., Pal, U., Lladós, J., Kimura, F.: Convex hull based approach for multioriented character recognition from graphical documents. In: Proceeding of ICPR, pp. 1–4. IEEE (2008)Google Scholar
  7. 7.
    Goto, H., Aso, H.: Extracting curved lines using local linearity of the text line. Int. J. Doc. Anal. Recognit. 2, 111–118 (1999)CrossRefGoogle Scholar
  8. 8.
    Gonzalez, R.C., Woods, R.E.: Digital image processing (DIP/3e), 3rd edn. Pearson Education, AsiaGoogle Scholar
  9. 9.
    Arcelli, C.: A condition for digital points removal. Signal Processing 1(4), 283–285 (1974)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Arcelli, C., Sanniti di Baja, G.: Medial lines and figure analysis. In: Proceeding of 5th Int. Conf. on Pattern Recognition, pp. 1016–1018 (1980)Google Scholar
  11. 11.
    Lam, L., Lee, S.W., Suen, S.Y.: Thinning methodologies-a comprehensive survey. IEEE Trans. PAMI, 869–885 (1992)Google Scholar
  12. 12.
    Arcelli, C.: Pattern thinning by contour tracing. Comput. Vision Graphics Image Process. 17, 130–144 (1981)CrossRefGoogle Scholar
  13. 13.
    Latecki, L., Ma, C.M.: An algorithm for a 3D simplicity test. Computer Vision and Image Understanding 63, 388–393 (1996)CrossRefGoogle Scholar
  14. 14.
    Eckhardt, U., Maderlechner, G.: Thinning of binary images. Hamb. Beitr. Angew. Math. B 11 (April 1989)Google Scholar
  15. 15.
    Heijmans, H.J.A.M., Ronse, C.: The algebraic basis of mathematical morphology. Part I. Dilations and Erosions, Comput. Vision Graphics Image Process 50, 245–295 (1990)MATHCrossRefGoogle Scholar
  16. 16.
    Kong, T.Y., Rosenfeld, A.: Digital topology: Introduction and survey. Comput. Vision Graphics Image Process. 48, 357–393 (1989)CrossRefGoogle Scholar
  17. 17.
    Naccache, N.J., Shinghal, R.: SPTA: A proposed algorithm for thinning binary patterns. IEEE Trans. Systems Man Cybernet SMC 14, 409–418 (1984)Google Scholar
  18. 18.
    Tanura, H.: A comparison of line thinning algorithm from a digital geometry viewpoint. In: Proceeding of 6th Int. Conf. of Pattern Recognition, pp. 715–719 (1978)Google Scholar
  19. 19.
    Arcelli, C., Sanniti di Baja, G.: Text recognition. Signal Processing 41, 49–76 (1995)CrossRefGoogle Scholar
  20. 20.
    Huang, L., Wan, G., Liu, C.: An improved parallel thinning algorithm. In: Proceedings of the 7th Int. Conf. on Doc. Ana. and Rec., vol. 2, pp. 780–786 (2003)Google Scholar
  21. 21.
    Cowell, J., Fiaz, H.: Thinning Arabic characters for feature extraction. In: IEEE Proceedings of 5th Int. Conf. on Information Visualization, pp. 181–187 (2001)Google Scholar
  22. 22.
    Shaikh, N.A., Shaikh, Z.A.: Delimiting factors in the automation of Sindhi language. Internal Technical report submitted to National University of Computer and Emerging Sciences, Karachi (March 2004)Google Scholar
  23. 23.
    Kavianafar, M., Amin, A.: Pre-processing and structural feature extraction for multi fonts Arabic/ Persian OCR. In: Proceedings of 5th Int. Conf. on Doc. Ana. and Rec., pp. 213–220 (1999)Google Scholar
  24. 24.
    Shaikh, N.A., Shaikh, Z.A.: A comparative analysis on the applications of various thinning algorithms on Arabic scripting languages. Technical report submitted to National University of Computer and Emerging Sciences, Karachi (December 2004)Google Scholar
  25. 25.
    Kanungo, T., Haralick, R.M.: Character recognition using mathematical morphology. In: Proceedings of USPS 4th Advanced Technology Conference, Washington, D.C., pp. 973–986 (1990)Google Scholar
  26. 26.
    Chaudhuri, B.B., Majumdar, A.: Curvelet–based multi SVM recognizer for offline handwritten Bangla: A major Indian script. In: Proceeding of Int. Conf. on Doc. Ana. and Rec. ICDAR, pp. 491–495 (2007)Google Scholar
  27. 27.
    Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Systems Man Cybernet. 9(1), 62–66 (1979)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer India Pvt. Ltd. 2012

Authors and Affiliations

  • Brijmohan Singh
    • 1
  • Sudhir Goswami
    • 1
  • Puneet Goyal
    • 1
  • Ankush Mittal
    • 2
  1. 1.Research CellCollege of Engineering RoorkeeRoorkeeIndia
  2. 2.Graphic Era UniversityDehradunIndia

Personalised recommendations