DAN: An Automatic Segmentation and Classification Engine for Paper Documents

  • L. Cinque
  • S. Levialdi
  • A. Malizia
  • F. De Rosa
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2423)


The paper documents recognition is fundamental for office automation becoming every day a more powerful tool in those fields where information is still on paper. Document recognition follows from data acquisition, from both journals, and entire books in order to transform them in digital objects. We present a new system DAN (Document Analysis on Network) for Document recognition that follows the Open Source methodologies, XML description for documents segmentation and classification, which turns to be beneficial in terms of classification precision, and general-purpose availability.


Automatic Segmentation Document Image Text Region Document Image Analysis Document Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    F. Bapst, R. Brugger, and R. Ingold. Towards an Interactive Document Recognition System. Internal working paper 95-09, IIUF-Université de Fribourg, March 1995.Google Scholar
  2. 2.
    B. Gatos, S. L. Mantzaris, K. V. Chandrios, A. Tsigris, and S. J. Perantonis. Integrated algorithms for newspaper page decomposition and article tracking. In ICDAR’99: Fifth International Conference on Document Analysis and Recogntion, pages 559–562, Bangalore, India, Sept. 1999.Google Scholar
  3. 3.
    D. Lewis. Representation and Learning in Information Retrieval. PhD thesis, Department of Computer Science, University of Massachusetts, 1992.Google Scholar
  4. 4.
    Y. Yang and J. Pedersen. A Comparative Study on Feature Selection in Text Categorization. Machine Learning. Proceedings of the 14th International Conference (ICML97), pages 412–420, Nashville, TE, USA, July 6-12 1997.Google Scholar
  5. 5.
    M. Junker and R. Hoch. An Experimental Evaluation of OCR Text Representations for Learning Document Classifiers. International Journal on Document Analysis and Recognition, 1(2):116–122, June 1998.CrossRefGoogle Scholar
  6. 6.
    G. Nagy, S. Seth, S. Stoddard. Document analysis with an expert system.Pattern Recognition, Vol. 19 N.1, pp 149–159, 1986.Google Scholar
  7. 7.
    K. C. Fan, L. S. Wang, Y. K. Yang. Page segmentation and identification for intelligent signal processing. Signal Processing, Vol 45 N.2, pp 329–346, 1995.zbMATHCrossRefGoogle Scholar
  8. 8.
    T. Pavlidis, J. Zhou. Page segmentation and classification. Graphical Models and Image Processing. CVGIP, Vol 54, pp 484–496, November 1992.Google Scholar
  9. 9.
    L. Fletcher, R. Katsuri. A robust algorithm for text string separation from mixed text/graphics images. IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol 10, pp 910–918, 1998.CrossRefGoogle Scholar
  10. 10.
    J. Litcher, F. Hones. Layout extraction of mixed mode document. IEEE Transaction on Machine Vision and Application, Vol 6, pp 477–486, 1994.Google Scholar
  11. 11.
    L. O'Gorman, R. Katsuri. Document image analysis. IEEE Computer Society. Press Los Alamos, California, pp 161–181, 1995.Google Scholar
  12. 12.
    F. Esposito, D. Malerba, G. Semeraro, E. Annese, G. Scafuro. An experimental page layout recognition system for office document automatic classification: an integrated approach for inductive generalization. Proceedings of 10th ICPR, pp 557–562.Google Scholar
  13. 13.
    G. Nagy, S. Seth, and M. Viswanathan. A prototype document image analysis system for technical journals. Computer, 25(7):10–22, July 1992CrossRefGoogle Scholar
  14. 14.
    M. Span, R. Wilson. A quad-tree approach to image segmentation which combines statistical and spatial information. Pattern Recognition, Vol.18:257–269,1985.CrossRefGoogle Scholar
  15. 15.
    T. Ojala, M. Pietikainen. Unsupervised texture segmentation using feature distributions. Pattern Recognition, Vol.32:477–486,1999.CrossRefGoogle Scholar
  16. 16.
    L. Cinque, F. Lecca, S. Levialdi, S. Tanimoto-“Retrieval of images using Rich Region Descriptions”. Proceeding of the International Conference of Pattern Recognition, Brisbane, Australia, 1998, Volume I, pp. 899–109.Google Scholar
  17. 17.
    L. Cinque, S. Levialdi, A. Malizia, K.A. Olsen. “A Multidimensional Image Browser”. Journal of Visual Language and Computing,Vol. 9, 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • L. Cinque
    • 1
  • S. Levialdi
    • 1
  • A. Malizia
    • 1
  • F. De Rosa
    • 1
  1. 1.Dept. of Information ScienceUniversita’ “La Sapienza”RomaItaly

Personalised recommendations