Multi-view hac for Semi-supervised Document Image Classification

  • Fabien Carmagnac
  • Pierre Héroux
  • Éric Trupin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3163)


This paper presents a semi-supervised document image classification system that aims to be integrated into a commercial document reading software.

This system is asserted like an annotation help. From a set of unknown document images given by a human operator, the system computes regrouping hypothesis of same physical layout images and proposes them to the operator. Then he can correct them, validate them, keeping in mind that his objective is to have homogeneous groups of images. These groups will be used for the training of the supervised document image classifier. Our system contains N feature spaces and a metric function for each of them. These allow to compute the similarity between two points of the same space. After projecting each image in these N feature spaces, the system builds N hierarchical agglomerative classification trees (hac) corresponding to each feature space. The proposals for regroupings formulated by the various hac are confronted and merged. Results, evaluated by the number of corrections done by the operator are presented on different image sets.


Feature Space Human Operator Image Sample Document Image Graphical Object 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Koch, G., Heutte, L., Paquet, T.: Numerical sequence extraction in handwritten incoming mail documents. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, ICDAR 2003, pp. 369–373 (2003)Google Scholar
  2. 2.
    Clavier, E.: Stratégies de tri: un système de tri des formulaires. Thèse de doctorat, Université de Caen (2000)Google Scholar
  3. 3.
    Carmagnac, F., Héroux, P., Trupin, E.: Distance Based Strategy for Document Image Classification. In: Advances in Pattern Recognition. LNCS, Springer, Heidelberg (2004) (to be published)Google Scholar
  4. 4.
    Muslea, I., Minton, S., Knoblock, C.: Active + semi-supervised learning = robust multi-view learning. In: Proceedings of the 19th International Conference on Machine Learning (ICML 2002), pp. 435–442 (2002)Google Scholar
  5. 5.
    Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press Inc., London (1990)zbMATHGoogle Scholar
  6. 6.
    Cornuéjols, A., Miclet, L.: Apprentissage artificiel - concepts et algorithmes. Eyrolles (2002)Google Scholar
  7. 7.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31, 264–323 (1999)CrossRefGoogle Scholar
  8. 8.
    Ribert, A.: Structuration évolutive de données: Application à la construction de classifieurs distribués. Thèse de doctorat, Université de Rouen (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Fabien Carmagnac
    • 1
    • 2
  • Pierre Héroux
    • 2
  • Éric Trupin
    • 1
  1. 1.A2iA SAParis cedexFrance
  2. 2.Laboratoire PSICNRS FRE 2645 Université de RouenMont-Saint-Aignan cedexFrance

Personalised recommendations