Feature Approach for Printed Document Image Analysis

  • Jean Duong
  • Myrian Côté
  • Hubert Emptoz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2396)

Abstract

This paper presents advances in zone classification for printed document image analysis. It firstly introduces entropic heuristic for text separation problem. Then a brief recall on existing texture and geometric discriminant parameters proposed in a previous research is done. Several of them are chosen and modified to perform statistical pattern recognition. For each of these two aspects, experiments are done. A document image database with groundtruth is used. Available results are discussed.

Keywords

Support Vector Machine Linear Discriminant Analysis Document Image Radial Basis Function Kernel Horizontal Projection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Nedjem E. Ayat, Mohamed Cheriet, and Ching Y. Suen. Kmod-a two parameter svm kernel for pattern recognition, 2002. To appear in ICPR 2002. Quebec city, Canada, 2002.Google Scholar
  2. 2.
    C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20, 1995.Google Scholar
  3. 3.
    Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification. Wiley Interscience, 2001.Google Scholar
  4. 4.
    Jean Duong, Myriam Côté, and Hubert Emptoz. Extraction des régions textuelles dans les images de documents imprimés. In Reconnaissance de Formes et Intelligence Artificielle (RFIA), Angers (France), Janvier 2002.Google Scholar
  5. 5.
    Jean Duong, Myriam Côté, Hubert Emptoz, and Ching Y. Suen. Extraction of text areas in printed document images. In ACM Symposium on Document Engineering (DocEng), pages 157–165, Atlanta (Georgia, USA), November 2001.Google Scholar
  6. 6.
    K.C. Fan, C.H. Liu, and Y.K. Wang. Segmentation and classification of mixed text/graphics/image documents. Pattern Recognition Letters, 15:1201–1209, 1994.CrossRefGoogle Scholar
  7. 7.
    K.C. Fan and L.S. Wang. Classification of document blocks using density feature and connectivity histogram. Pattern Recognition Letters, 16:955–962, 1995.CrossRefGoogle Scholar
  8. 8.
    Robert M. Haralick. Document image understanding: Geometric and logical layout. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), volume 4, pages 384–390, 1994.Google Scholar
  9. 9.
    Anil K. Jain, Robert P. W. Duin, and Jianchang Mao. Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 22(1):4–37, Januray 2000.Google Scholar
  10. 10.
    Anil K. Jain and Bin Yu. Document representation and its application to page decomposition. IEEE Transaction on Pattern Analysis and Machine Intelligence (PAMI), 20(3):294–308, March 1998.Google Scholar
  11. 11.
    George Nagy. Twenty years of document image ananlysis in pami. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 22(1):38–62, January 2000.Google Scholar
  12. 12.
    University of Oulu (Finland). Mediateam document database, 1998.Google Scholar
  13. 13.
    Oleg Okun, David Dœrmann, and Matti Pietikäinen. Page segmentation and zone classification: The state of the art, November 1999.Google Scholar
  14. 14.
    B. Scholkopf, C. Burges, and A. Smola. Advances in Kernel Methods: Support Vector Learning, chapter 1. MIT Press, 1999.Google Scholar
  15. 15.
    Vladimir Vapnik. The nature of Statistical Learning Theory. Springer Verlag, New-York (USA), 1995.MATHGoogle Scholar
  16. 16.
    Kwan Y. Wong, Richard G. Casey, and Friedrich M. Wahl. Document analysis system. IBM Journal of Research and Developpment, 26(6):647–656, November 1982.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Jean Duong
    • 1
    • 2
  • Myrian Côté
    • 1
  • Hubert Emptoz
    • 2
  1. 1.Ecole de Technologie SupérieureLaboratoire d’Imagerie Vision et Intelligence Artificielle (LIVIA)MontréalCanada
  2. 2.Institut National des Sciences Appliquées (INSA) de LyonLaboratoire de Reconnaissance de Formes et Vision (RFV)Villeurbanne CEDEX

Personalised recommendations