DocMining: A Document Analysis System Builder

  • Sébastien Adam
  • Maurizio Rigamonti
  • Eric Clavier
  • Éric Trupin
  • Jean-Marc Ogier
  • Karl Tombre
  • Joël Gardes
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3163)

Abstract

In this paper, we present DocMining, a general framework that allows the construction of scenarios dedicated to document image processing. The framework is the result of the collaboration between four academic partners and one industrial partner. The main issues of DocMining are the description and the execution of document analysis scenarios. The explicit declaration of scenarios and the plug-ins oriented approach of the framework allow to integrate easily new Document Processing Units and to create new application prototypes. Moreover, this paper highlights the interest of the platform to solve the problem of performance evaluation.

Keywords

Graphical Object XPath Expression France Telecom Processing Library Page Segmentation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Clavier, E., Héroux, P., Gardes, J., Trupin, E.: Ground-truth production and benchmarking scenarios creation with DocMining. In: 3rd International Workshop on Document Layout Interpretation and its application DLIA 2003, Edinburgh, Scotland (August 2003)Google Scholar
  2. 2.
    Clavier, E., Masini, G., Delalandre, M., Rigamonti, M., Tombre, K., Gardes, J.: DocMining: A cooperative platform for heterogeneous document interpretation according to user-defined scenarios. In: International Workshop on Graphic Recognition GREC 2003, Barcelona, Spain (July 2003)Google Scholar
  3. 3.
    Coüasnon, B.: DMOS: A generic document recognition method. Application to an automatic generator of musical scorers, mathematical formulae and table structures recognition systems. In: Proceedings of 6th International Conference on Document Analysis and recognition ICDAR 2001, Seattle, USA (2001)Google Scholar
  4. 4.
    Parodi, P., Piccioli, G.: An efficient pre-processing of mixed-content document images for OCR systems. 13th Int. Conf. On Pattern Recognition 3, 778–782 (1996)CrossRefGoogle Scholar
  5. 5.
    Pasternak, B.: Adaptierbares Kernsystem zur Interpretation von Zeichnungen. Dissertation zur Erlangung des akademisch Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.), Universität Hamburg (1996)Google Scholar
  6. 6.
    Phelps, T.A., Wilensky, R.: The multivalent browser: A platform for new ideas. Document Engineering 2001, Atlanta, Georgia, USA (2001)Google Scholar
  7. 7.
    Yanikoglu, B.A., Vincent, L.: Pink panther: a complete environment for ground-truthing and benchmarking document page segmentation. Pattern Recognition 31, 1191–1204 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Sébastien Adam
    • 1
  • Maurizio Rigamonti
    • 2
  • Eric Clavier
    • 3
  • Éric Trupin
    • 1
  • Jean-Marc Ogier
    • 4
  • Karl Tombre
    • 5
  • Joël Gardes
    • 3
  1. 1.Laboratoire PSI – CNRS FRE 2645Université de RouenMont Saint Aignan CEDEXFrance
  2. 2.DIVA GroupDIUF, Université de FribourgFribourgSwitzerland
  3. 3.France Telecom R&DLannion CEDEXFrance
  4. 4.Laboratoire L3iUniversité de la RochelleLa Rochelle CEDEXFrance
  5. 5.LORIAINRIAVandoeuvre-lès-Nancy CEDEXFrance

Personalised recommendations