Advertisement

Document Analysis System for Automating Workflows

  • Steven J. Simske
  • Jordi Arnabat
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3872)

Abstract

When a user places a document in a capture device—copier, multi-functional printer [MFP], or scanner—the user expects good output to be produced regardless of the document type. There are a variety of means to achieve improved output, in which the settings on the copying device are tuned to the content characteristics of the document. These settings can be automated across the range of scanned context extremes from photo (blurring, no snapping) to fully-text (sharpening, aggressive snapping) documents. This procedure is “document auto typing”, and relies on a fast and accurate assessment of the content of the captured image. We herein describe the development of seven distinct systems for document analysis, and through the comparison of these systems arrive at an efficient and accurate document analysis system for automating the copying settings. We discuss the applicability of this method to other automated workflows in document capture.

Keywords

Optical Character Recognition Black Pixel Solid Region Projection Profile Document Analysis System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Wahl, F.M., Wong, K.Y., Casey, R.G.: Block segmentation and text extraction in mixed/image documents. Computer Vision Graphics and Image Processing 2, 375–390 (1982)Google Scholar
  2. 2.
    Kittler, J., Illingworth, J.: Minimum error thresholding. Pattern Recognition 19(1), 41–47 (1986)CrossRefGoogle Scholar
  3. 3.
    Lee, J.P., Simske, S.J., Dawe, J.T.: Segmenting a document into regions associated with a data type, and assigning pipelines to process such regions. U.S. Patent 6,880,122, Apr. 12 (2005)Google Scholar
  4. 4.
    Simske, S.J., Arnabat, J.: User-directed analysis of scanned images. In: Proc. DocEng 2003, Grenoble, pp. 212–221 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Steven J. Simske
    • 1
  • Jordi Arnabat
    • 2
  1. 1.Hewlett-Packard LabsFort CollinsUSA
  2. 2.Hewlett-Packard EspanolaSant Cugat del VallesSpain

Personalised recommendations