Document modeling for form class identification

  • Sébastien Diana
  • Eric Trupin
  • Yves Lecourtier
  • Jacques Labiche
Oral Presentations B. Document Processing and Retrieval
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1339)


This article deals with the description of a document system analysis based on document modeling. This system is applied to forms which are used by the CAF, the French national family allowance Department -Caisse d 'Allocations Familiales. The system is composed by three different modules which deals with the different form processes. The first module - low-level processing - is divided into three stages : acquisition, binarisation and skew correction. These stages allow the transformation of a paper form into an image with correct qualities. The second module - document structuration - processes this image to extract the information contained in the form. The information is arranged to obtain a tree. This tree shows the organisation of the form content into a hierarchical way. In addition to the tree extraction, the document structuration module allows the creation of a form model base. The last module -form class identification - uses the tree and the form model base. It is composed with two pre-classifiers to extract possible lists of forms and a structural classifier. The two pre-classifiers filter the form classes among the 250 classes in order to reduce the treatment of the classifier. This classifier is based on graph matching to compare the tree of the particular form and the possible list of form extracted during the two pre-classifiers.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Lamouche, I., Bellissant, C.: Séparation recto/verso d'images de manuscrits anciens. Colloque National sur l'Ecrit et le Document Nantes France (1996) 199–206.Google Scholar
  2. [2]
    Chatterjee, C., Raychowdhury, V.P.: Models and algorithms for real-time hybrid image enhancement methodology. Pattern Recognition 29-9 (1996) 1531–1542.Google Scholar
  3. [3]
    Sauvola, J., Pietikainen, M.: Page segmentation and classification using fast feature extraction and connectivity analysis. International Conference on Document Analysis and Recognition Montreal Canada 2 (1995) 1127–1131.Google Scholar
  4. [4]
    Esposito, F., Malerba, D., Semeraro, G.: Automated acquisition of rules for document understanding. International Conference on Document Analysis and Recognition Tsukuba Science City Japan (1993) 650–654.Google Scholar
  5. [5]
    Tang, Y.Y., Suen, C.Y.: Document structures: A survey. International Conference on Document Analysis and Recognition Tsukuba Science City Japan (1993) 99–102.Google Scholar
  6. [6]
    Brink, A.D., Pendock, N.E.: Minimum cross-entropy threshold selection. Pattern Recognition 29-1 (1996) 179–188.Google Scholar
  7. [7]
    Kittler, J., Illingworth, J.: Minimum error thresholding. Pattern Recognition 19-1 (1986) 41–47.Google Scholar
  8. [8]
    Kurita, T., Otsu, N., Abdelmalek, N.: Maximum likelihood thresholding based on population mixture models. Pattern Recognition 25-10 (1992) 1231–1240.Google Scholar
  9. [9]
    Le, D.S., Thoma, G.R., Wechsler, H.: Automated page orientation and skew angle detection for binary document images. Pattern Recognition 27-10 (1994) 1325–1344.Google Scholar
  10. [10]
    Leroux, M.: P.A.B.L.O, Procédure de saisie de bordereaux par lecture optique. Colloque National sur l'Ecrit et le Document Nantes France (1996) 259–266.Google Scholar
  11. [11]
    Trupin, E.: A modified contour following algorithm applied to document segmentation. Intelligence Artificial and Pattern Recognition The Hague Netherlands (1992) 525–528.Google Scholar
  12. [12]
    Wahl, F., Wong, F., Casey, R.: Block segmentation and text extraction in mixed text/image documents. Computer Graphics and Image Processing 20 (1982) 375–390.Google Scholar
  13. [13]
    Trier, O.D., Taxt, T.: Evaluation of binarisation methods for document images. Pattern Analysis and Machine Intelligence 17-3 (1995) 312–315.Google Scholar
  14. [14]
    Watanabe, T., Luo, Q., Sugie, N.: Layout recognition of multi-kinds of table form documents. Pattern Analysis and Machine Intelligence 17-4 (1995) 432–445.Google Scholar
  15. [15]
    Casey, R., Ferguson, D., Mohiuddin, K., Walach, E.: Intelligent forms processing system. Machine Vision and Applications 5 (1992) 143–155.Google Scholar
  16. [16]
    Dengel, A., Dubiel, F.: Clustering and classification of document structure-A machine learning approach. International Conference on Document Analysis and Recognition 2 (1995) 587–591.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Sébastien Diana
    • 1
  • Eric Trupin
    • 1
  • Yves Lecourtier
    • 1
  • Jacques Labiche
    • 2
  1. 1.Laboratoire PSI / La3IUniversité de RouenMont Saint Aignan CédexFrance
  2. 2.Laboratoire ISMRA / LACPUniversité de CaenCaen CédexFrance

Personalised recommendations