Document Logical Structure Analysis Based on Perceptive Cycles

  • Yves Rangoni
  • Abdel Belaïd
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3872)

Abstract

This paper describes a Neural Network (NN) approach for logical document structure extraction. In this NN architecture, called Transparent Neural Network (TNN), the document structure is stretched along the layers, allowing an interpretation decomposition from physical (NN input) to logical (NN output) level. The intermediate layers represent successive interpretation steps. Each neuron is apparent and associated to a logical element. The recognition proceeds by repetitive perceptive cycles propagating the information through the layers. In case of low recognition rate, an enhancement is achieved by error backpropagation leading to correct or pick up a more adapted input feature subset. Several feature subsets are created using a modified filter method. The first experiments performed on scientific documents are encouraging.

Keywords

Feature Subset Multi Layer Perceptron Plane Space Logical Element Text Block 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Mao, S., Rosenfelda, A., Kanungo, T.: Document structure analysis algorithms: A literature survey. SPIE Electronic Imaging (2003)Google Scholar
  2. 2.
    Nagy, G.: Twenty years of document image analysis in pami. PAMI (2000)Google Scholar
  3. 3.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature extraction. Journal of Machine Learning Research (2003)Google Scholar
  4. 4.
    Rangoni, Y., Belaïd, A.: Data categorization for a context return applied to logical document structure recognition. In: ICDAR 2005 (2005)Google Scholar
  5. 5.
    Cattell, R.: The scree test for the number of factors. Multivariate Behavioral Research (1966)Google Scholar
  6. 6.
    Zwick, W.R., Velicer, W.F.: Comparison of five rules for determining the number of components to retain. Psychological Bulletin (1986)Google Scholar
  7. 7.
  8. 8.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yves Rangoni
    • 1
  • Abdel Belaïd
    • 1
  1. 1.Loria Research Center – Read GroupVandœuvre-lès-NancyFrance

Personalised recommendations