Skip to main content
Log in

Document image analysis: A primer

  • Published:
Sadhana Aims and scope Submit manuscript

Abstract

Document image analysis refers to algorithms and techniques that are applied to images of documents to obtain a computer-readable description from pixel data. A well-known document image analysis product is the Optical Character Recognition (OCR) software that recognizes characters in a scanned document. OCR makes it possible for the user to edit or search the document’s contents. In this paper we briefly describe various components of a document analysis system. Many of these basic building blocks are found in most document analysis systems, irrespective of the particular domain or language to which they are applied. We hope that this paper will help the reader by providing the background necessary to understand the detailed descriptions of specific techniques presented in other papers in this issue.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arcelli C, Sanniti di Baja G 1985 A width-independent fast thinning algorithm.IEEE Trans. Pattern Anal. Machine Intell. PAMI-7: 463–74

    Google Scholar 

  • Arcelli C, Sanniti di Baja G 1993 Euclidean skeleton via center-of-maximal-disc extraction.Image Vision Comput. 11: 163–173

    Article  Google Scholar 

  • Akiyama T, Hagita N 1990 Automated entry system for printed documents.Pattern Recogn. 23: 1141–1154

    Article  Google Scholar 

  • Baird H S 1987 The skew angle of printed documents.Proceedings of the Conference of the Society of Photographic Scientists and Engineers on Hybrid Imaging Systems (Springfield, VA: Soc. Photogr. Sci. Eng.) pp 14–21

    Google Scholar 

  • Bharati A, Chaitanya V, Sangal R 1998 Computational linguistics in India: An overview. Technical Report, Indian Institute of Information Technologies, Hyderabad

    Google Scholar 

  • Dengel A, Bleisinger R, Hoch R, Fein F, Hones F 1992 From paper to office document standard representation.IEEE Comput. 25: 63–67

    Google Scholar 

  • Fletcher A, Kasturi R 1988 A robust algorithm for text string separation from mixed text/graphics images.IEEE Trans. Pattern Anal. Machine Intell. PAMI-10: 910–918

    Article  Google Scholar 

  • Freeman H 1974 Computer processing of line drawing images.Comput. Surv. 6: 57–98

    Article  MATH  Google Scholar 

  • Freeman H, Davis L 1977 A corner-finding algorithm for chain-coded curves.IEEE Trans. Comput. C-26: 297–303

    Article  Google Scholar 

  • Fukunaga K, Hostetler L D 1975 K-nearest-neighbour Bayes-risk estimation.IEEE Trans. Inf. Theor. 21: 285–293

    Article  MATH  MathSciNet  Google Scholar 

  • Garris M D, Dimmick D L 1996 Form design for high accuracy optical character recognition.IEEE Trans. Pattern Anal. Machine Intel. PAMI-18: 653–656

    Article  Google Scholar 

  • GREC 1995, 97, 99 Selected papers from the International Workshops on Graphics Recognition 1995, 1997, and 1999.Lecture Notes in Computer Science series (Springer Verlag) vols. 1072 (1996), 1389 (1998), 1941 (2000)

  • Haralick R M, Shapiro L G 1992Computer and robot vision (Reading, MA: Addison-Wesley)

    Google Scholar 

  • Haralick R M, Sternberg S R, Zhuang X 1987 Image analysis using mathematical morphology.IEEE Trans. Pattern Anal. Machine Intell. PAMI-9: 532–550

    Article  Google Scholar 

  • Hashizume A, Yeh P S, Rosenfeld A 1986 A method of detecting the orientation of aligned components.Pattern Recogn. Lett. 4: 125–132

    Article  Google Scholar 

  • Hart P E 1968 The condensed nearest neighbour rule.IEEE Trans. Inf. Theor. 14: 515–516

    Article  Google Scholar 

  • ICDAR 19995th Int. Conf. on Document Analysis and Recognition (Los Alamitos, CA: IEEE Comput. Soc.)

    Google Scholar 

  • Illingworth J, Kittler J 1988 A survey of the Hough transform.Comput. Graphics Image Process. 44: 87–116

    Article  Google Scholar 

  • Karnik R P 1999 Identifying Devnagari characters.Proc. Int. Conf. on Document Analysis and Recognition (Los Alamitos, CA: IEEE Comput. Soc.) pp. 669–672

    Google Scholar 

  • Jain A K, Bhattacharjee S K 1992 Text segmentation using Gabor filters for automatic document processing.Machine Vision Appl. J. 5: 169–184

    Article  Google Scholar 

  • Lai C P, Kasturi R 1991 Detection of dashed lines in engineering drawings and maps.Proc. First Int. Conf. on Document Analysis and Recognition, St. Malo, France, pp. 507–515

  • Lam L, Lee S-W, Suen C Y 1992 Thinning methodologies -A comprehensive survey.IEEE Trans. Pattern Anal. Machine Intell. PAMI-14: 869–885

    Article  Google Scholar 

  • Lam L, Suen C Y 1995 An evaluation of parallel thinning algorithms for character recognition.IEEE Trans. Pattern Recogn. Machine Intell. 17: 914–919

    Article  Google Scholar 

  • Medioni G, Yasumoto Y 1987 Corner detection and curve representation using cubic B-splines.Comput. Vision, Graphics, Image Process. 29: 267–278

    Google Scholar 

  • Murthy B K, Deshpande W R 1998 Optical character recognition (OCR) for Indian languages.Proc. Int. Conf. on Comput. Vision, Graphics, Vision, Image Process. ICVGIP, New Delhi

    Google Scholar 

  • Nartker T A, Rice S V, Kanai J 1994 OCR Accuracy. UNLV’s Second Annual Test. Technical Journal INFORM, University of Nevada, Las Vegas

    Google Scholar 

  • O’Gorman L 1988 Curvilinear feature detection from curvature estimation.9th Int. Conference on Pattern Recognition, Rome, Italy, pp 1116–1119

  • O’Gorman L 1990 k x k Thinning.Comput. Vision, Graphics, Image Process. 51: 195–215

    Article  Google Scholar 

  • O’Gorman L 1992 Image and document processing techniques for the right pages electronic library system.Int. Conf. Pattern Recognition (ICPR), The Netherlands, pp 260–263

  • O’Gorman L 1993 The document spectrum for structural page layout analysis.IEEE Trans. Pattern Anal. Machine Intelli. PAMI-15: 1162–73

    Article  Google Scholar 

  • O’Gorman L 1994 Binarization and multi-thresholding of document images using connectivity.CVGIP: Graphical Models Image Process. 56: 494–506

    Article  Google Scholar 

  • O’Gorman L, Kasturi R 1997 Document image analysis.IEEE Computer Society Press Executive Briefing Series, Los Alamitos, CA

  • Pavlidis T 1982Algorithms for graphics and image processing (Rockville, MD: Comput. Sci. Press)

    Google Scholar 

  • Pavlidis T, Zhou J 1991 Page segmentation by white streams.Proc. 1st Int. Conf. on Document Analysis and Recognition (ICDAR), St. Malo, France, pp 945–953

  • Postl W 1986 Detection of linear oblique structures and skew scan in digitized documents.Proc. 8th Int. Conf. on Pattern Recognition (ICPR), Paris, France, pp 687–689

  • Ramanujan P 1999 Development of a general-purpose Sanskrit parser, M Sc thesis, Dept. of Computer Science & Automation, Indian Institute of Science, Bangalore

    Google Scholar 

  • Ramer U E 1972 An iterative procedure for the polygonal approximation of plane curvesComput. Graphics Image Process. 1: 244–256

    Google Scholar 

  • Reddi S S, Rudin S F, Keshavan H R 1984 An optimal multiple threshold scheme for image segmentation.IEEE Trans. Syst. Man Cybern. SMC-14: 661–665

    Google Scholar 

  • Rice S V, Kanai J, Nartker T A 1992 A report on the accuracy of OCR devices. Technical Report, Information Science Research Institute of Nevada, Las Vegas

    Google Scholar 

  • Sawaki M, Hagita K 1998 Text-line extraction and character recognition of document headlines with graphical design using complimentary similarity measure.IEEE Trans. Pattern Anal. Machine Intell. PAMI-20: 1103–1109

    Article  Google Scholar 

  • Sahoo P K, Soltani S, Wong A K C, Chen Y C 1988 A survey of thresholding techniques.Comput. Vision, Graphics, Image Process. 41: 233–260

    Article  Google Scholar 

  • Sanniti di Baja G 1994 Well-shaped, stable and reversible skeletons from the (3,4)-distance transform.Visual Commun. Image Representation 5: 107–115

    Article  Google Scholar 

  • Serra J 1982Image analysis and mathematical morphology (London: Academic Press)

    MATH  Google Scholar 

  • Shih C-C, Kasturi R 1988 Generation of a line-description file for graphics recognition.Proc. SPIE Conf. on Applications of Artificial Intelligence 937: 568–575

    Google Scholar 

  • Spitz L 1997 Determination of the Script and Language Content of Document Images.IEEE Trans. Pattern Analy. Machine Intell. PAMI-19: 235–245

    Article  Google Scholar 

  • Srihari S N, Govindaraju V 1989 Analysis of textual images using the Hough Transform.Machine Vision Appl. 2: 141–153

    Article  Google Scholar 

  • Trier O D, Taxt T 1995 Evaluation of binarization methods for document imagesIEEE Trans. Pattern Anal. Machine Intell. PAMI-17: 312–315

    Article  Google Scholar 

  • Tsai W-H 1985 Moment-preserving thresholding: A new approach.Comput. Vision, Grapics, Image Process. 29: 377–393

    Article  Google Scholar 

  • Wilson C L, Geist J, Garris M D, Chellapa R 1996 Design, integration, and evaluation of form-based handprint and OCR systems. Technical Report, NISTIR5932, National Institute of Standards & Technology, US; download fromhttp://www.itl.nist.gov/iad/894.03/pubs.html

    Google Scholar 

  • Wong K Y, Casey R G, Wahl F M 1982 Document analysis system.IBM J. Res. Dev. 6: 647–656

    Article  Google Scholar 

  • Wu W-Y, Wang M-J J 1993 Detecting the dominant points by the curvature-based polygonal approximation.CVGIP: Graphical Models Image Process. 55: 79–88

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kasturi, R., O’Gorman, L. & Govindaraju, V. Document image analysis: A primer. Sadhana 27, 3–22 (2002). https://doi.org/10.1007/BF02703309

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02703309

Keywords

Navigation