Skip to main content
Log in

Identification of scripts and orientations of degraded document images

  • Short Paper
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Document scripts and document orientations are important information for the document digitalization. Prior work has been reported to identify document scripts and document orientations, whereas most reported methods are very sensitive to document skew and low image resolution. This paper reports a document script and document orientation identification method that addresses this issue by converting a document image into a pair of document vectors using the density and distribution of character strokes. Experiments over 3,024 document images of 12 scripts show that the proposed methods are accurate and tolerant to various types of document degradation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  1. Lu S, Tan CL (2006) Automatic document orientation detection and categorization through document vectorization. The 14th annual ACM international conference on multimedia, pp 113–116

  2. Cao Y, Wang S, Li H (2005) Skew detection and correction in document images based on straight-line fitting. Pattern Recognit Lett 24(12):1871–1879

    Article  Google Scholar 

  3. Caprari RS (2000) Algorithm for text page up/down orientation determination. Pattern Recognit Lett 21(4):311–317

    Article  Google Scholar 

  4. Hrishikesh BA (2005) A generic method for determining up/down orientation of text in roman and non-roman scripts. Pattern Recognit 38(11):2114–2131

    Article  Google Scholar 

  5. Ávila BT, Lins RD (2005) A fast orientation and skew detection algorithm for monochromatic document images. ACM symposium on document engineering, pp 118–126

  6. Bloomberg D, Kopec G, Dasari L (1995) Measuring document image skew and orientation. SPIE 2422, pp 302–316

  7. Akiyama T, Hagita N (1990) Automated entry system for printed documents. Pattern Recognit 23(11):1141–1154

    Article  Google Scholar 

  8. Lu S, Tan CL (2008) Script and language identification in noisy and degraded document images. IEEE Trans Pattern Anal Mach Intell 30(1):14–24

    Article  Google Scholar 

  9. Busch A, Boles WW, Sridharan S (2005) Texture for script identification. IEEE Trans Pattern Anal Mach Intell 27(11):1720–1732

    Article  Google Scholar 

  10. Jain AK, Zhong Y (1996) Page segmentation using texture analysis. Pattern Recognit 29(5):743–770

    Article  Google Scholar 

  11. Tan TN (1998) Rotation invariant texture features and their use in automatic script identification. IEEE Trans Pattern Anal Mach Intell 20(7):751–756

    Article  Google Scholar 

  12. Hochberg J, Kerns L, Kelly P, Thomas T (1997) Automatic script identification from images using cluster-based templates. IEEE Trans Pattern Anal Mach Intell 19(2):176–181

    Article  Google Scholar 

  13. Spitz AL (1997) Determination of script and language content of document images. IEEE Trans Pattern Anal Mach Intell 19(3):235–245

    Article  Google Scholar 

  14. Ding J, Lam L, Suen CY (1997) Classification of oriental and European scripts by using characteristic features. International conference on document analysis and recognition, pp 1023–1027

  15. Zheng Y, Li H, Doermann D (2004) Machine printed text and handwriting identification in noisy document images. IEEE Trans Pattern Anal Mach Intell 26(3):337–353

    Article  Google Scholar 

  16. Legendre P, Legendre L (1998) Numerical ecology. Elsevier Science, Amsterdam, pp 115–116

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shijian Lu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, S., Li, L. & Tan, C.L. Identification of scripts and orientations of degraded document images. Pattern Anal Applic 13, 469–475 (2010). https://doi.org/10.1007/s10044-009-0169-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-009-0169-7

Keywords

Navigation