DAN: An Automatic Segmentation and Classification Engine for Paper Documents
Conference paper
First Online:
Abstract
The paper documents recognition is fundamental for office automation becoming every day a more powerful tool in those fields where information is still on paper. Document recognition follows from data acquisition, from both journals, and entire books in order to transform them in digital objects. We present a new system DAN (Document Analysis on Network) for Document recognition that follows the Open Source methodologies, XML description for documents segmentation and classification, which turns to be beneficial in terms of classification precision, and general-purpose availability.
Keywords
Automatic Segmentation Document Image Text Region Document Image Analysis Document Recognition
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Download
to read the full conference paper text
References
- 1.F. Bapst, R. Brugger, and R. Ingold. Towards an Interactive Document Recognition System. Internal working paper 95-09, IIUF-Université de Fribourg, March 1995.Google Scholar
- 2.B. Gatos, S. L. Mantzaris, K. V. Chandrios, A. Tsigris, and S. J. Perantonis. Integrated algorithms for newspaper page decomposition and article tracking. In ICDAR’99: Fifth International Conference on Document Analysis and Recogntion, pages 559–562, Bangalore, India, Sept. 1999.Google Scholar
- 3.D. Lewis. Representation and Learning in Information Retrieval. PhD thesis, Department of Computer Science, University of Massachusetts, 1992.Google Scholar
- 4.Y. Yang and J. Pedersen. A Comparative Study on Feature Selection in Text Categorization. Machine Learning. Proceedings of the 14th International Conference (ICML97), pages 412–420, Nashville, TE, USA, July 6-12 1997.Google Scholar
- 5.M. Junker and R. Hoch. An Experimental Evaluation of OCR Text Representations for Learning Document Classifiers. International Journal on Document Analysis and Recognition, 1(2):116–122, June 1998.CrossRefGoogle Scholar
- 6.G. Nagy, S. Seth, S. Stoddard. Document analysis with an expert system.Pattern Recognition, Vol. 19 N.1, pp 149–159, 1986.Google Scholar
- 7.K. C. Fan, L. S. Wang, Y. K. Yang. Page segmentation and identification for intelligent signal processing. Signal Processing, Vol 45 N.2, pp 329–346, 1995.MATHCrossRefGoogle Scholar
- 8.T. Pavlidis, J. Zhou. Page segmentation and classification. Graphical Models and Image Processing. CVGIP, Vol 54, pp 484–496, November 1992.Google Scholar
- 9.L. Fletcher, R. Katsuri. A robust algorithm for text string separation from mixed text/graphics images. IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol 10, pp 910–918, 1998.CrossRefGoogle Scholar
- 10.J. Litcher, F. Hones. Layout extraction of mixed mode document. IEEE Transaction on Machine Vision and Application, Vol 6, pp 477–486, 1994.Google Scholar
- 11.L. O'Gorman, R. Katsuri. Document image analysis. IEEE Computer Society. Press Los Alamos, California, pp 161–181, 1995.Google Scholar
- 12.F. Esposito, D. Malerba, G. Semeraro, E. Annese, G. Scafuro. An experimental page layout recognition system for office document automatic classification: an integrated approach for inductive generalization. Proceedings of 10th ICPR, pp 557–562.Google Scholar
- 13.G. Nagy, S. Seth, and M. Viswanathan. A prototype document image analysis system for technical journals. Computer, 25(7):10–22, July 1992CrossRefGoogle Scholar
- 14.M. Span, R. Wilson. A quad-tree approach to image segmentation which combines statistical and spatial information. Pattern Recognition, Vol.18:257–269,1985.CrossRefGoogle Scholar
- 15.T. Ojala, M. Pietikainen. Unsupervised texture segmentation using feature distributions. Pattern Recognition, Vol.32:477–486,1999.CrossRefGoogle Scholar
- 16.L. Cinque, F. Lecca, S. Levialdi, S. Tanimoto-“Retrieval of images using Rich Region Descriptions”. Proceeding of the International Conference of Pattern Recognition, Brisbane, Australia, 1998, Volume I, pp. 899–109.Google Scholar
- 17.L. Cinque, S. Levialdi, A. Malizia, K.A. Olsen. “A Multidimensional Image Browser”. Journal of Visual Language and Computing,Vol. 9, 1998.Google Scholar
Copyright information
© Springer-Verlag Berlin Heidelberg 2002