Grammatical formalism for document understanding system: From document towards HTML text

  • S. Tayeb-bey
  • A. S. Saidi
Oral Presentations B. Document Processing and Retrieval
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1339)


This paper deals with the use of grammatical formalisms to recognize the physical and the logical structures of a composite document. We propose a new system for document recognition and analysis. The goal of this system is to identify particularly the summaries, and as an application, to convert them into machine readable form. We translate a summary paper into a HTML (HyperText Markup Language) text.

Key Words

document analysis logical structure physical structure two level grammar HTML 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Ak93]
    O. T. Akindele and A. Belaid. Page Segmentation by Segment Tracing. Second International Conference of Document Analysis and Recognition( ICDAR 93). 1993. pp. 341–344.Google Scholar
  2. [ASa92]
    A. S. Saidi. Extensions Grammaticales de la Programmation (en) Logique: Application á la Validation des Grammaires Affixes. Ph. D Thesis. Ecole Centrale de Lyon. 1992.Google Scholar
  3. [Be90]
    A. Belaid, J. J. Brault and Y. Chenevoy. Knowledge-Based System for Structured Document Recognition. In MVA'90 IAPR Workshop on Machine Vision Applications, November 1990.Google Scholar
  4. [Ch92]
    Y. Chenevoy. Reconnaissance structurelle de documents imprimés: Etudes et Réalisations. Ph.D. Thesis. INRIA-Lorraine. December 1992.Google Scholar
  5. [Hi93]
    Y. Hirayama. A Block Segmentation Method for Document Images with Complicated Column Structures. In Proceedings of ICDAR'93: 2nd International Conference on Document Analysis and Recognition. Tsukuba, Japan. 1993. Pp. 91–94.Google Scholar
  6. [Hor85]
    W. Horak. Office Document Architecture and Office Document Interchange Formats. Current status of international standardization. IEEE Computer. Vol. 18. N°10. October 1985. pp.50–57.Google Scholar
  7. [In89]
    R. Ingold. Une nouvelle Approche de la Lecture Optique Integrant la Reconnaissance des Structures de Documents. Ph.D. Thesis. Ecole Polytechnique Federale de lausanne. 1989.Google Scholar
  8. [In91 ]
    R. Ingold. A Document Description Language to Drive Document Analysis. First International Conference of Document Analysis and Recognition( ICDAR 91). Vol 1. pp. 294–301, 1991.Google Scholar
  9. [Le96]
    F. Lebourgeois. Localisation de Textes dans une Image á Niveaux de Gris. CNED'96. France. 1996. pp. 207–214.Google Scholar
  10. [Li96]
    J. LIANG, J. HA, R. ROGERS, I.T. PHILLIPS, R.M. HARALICK, B. CHANDA. The Prototype of a Complete Document Image Understanding System, DAS'96. Malvern, October 1996. pp. 131–154Google Scholar
  11. [Mau87]
    P. Maurice. L'Architecture d'un Document électronique: concepts et applications. L'écho des Recherches. N°130. 4st term 1987. pp. 15–24.Google Scholar
  12. [Na86]
    G. Nagy, S. C. Seth and S. D. Stoddard. Document Analysis with an Expert System. Pattern Recognition in Practice II (E. S. Gelsema and C. N. Kanal, Eds.). 1986. Pp. 147–159.Google Scholar
  13. [Na92]
    G. Nagy. A Prototype Document Image Analysis System for Technical Journals. IEEE Computer Magazine. July 1992.Google Scholar
  14. [Pe90]
    D. Peden-Derrien. Analyse des structures de documents: une approche objet. Ph. D. Thesis Université de Rennes 1. 1990.Google Scholar
  15. [Pet95]
    J. Petrak. An Object-Oriented Case-Based Learning System. Ph. D. Thesis. 1995Google Scholar
  16. [Sa92]
    A. Sanfeliu. Syntactic and Structural Methods in Document Image Analysis. In Structured Document Image Analysis. H.S Baird, H. Bunke& K. Yamamoto (Eds.). 1992. pp-479–499.Google Scholar
  17. [Sa94]
    T. Saitoh, T. Yamaai and M. Tachikawa. Document Image Segmentation and Layout Analysis. IEICE Transactions in Information and Systems. Vol. E77-D. N° 7. July 1994. pp. 778–784.Google Scholar
  18. [Ta96]
    S. TAYEB-BEY, S. SAIDI, H. EMPTOZ Grammatical Approach for the Physical and the Logical Structure of Documents Analysis: Application to Summary Documents. MVA'96. IAPR Workshop on Machine Vision Applications, November 1996, Tokyo. pp. 341–343.Google Scholar
  19. [Wi65]
    A. Van Wijngaarden. Orthogonal Design and Description of Formal Languages. Mathematish Centrum Amsterdam, MR 76, 1965.Google Scholar
  20. [Wo82]
    K. Y. Wong, R. G. Casey and F. M. Wahl. Document Analysis System. IBM Journal of Research and Development 26. 1982. pp. 647–655.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • S. Tayeb-bey
    • 1
  • A. S. Saidi
    • 2
  1. 1.Reconnaissance de Forme et Vision Bât 403Villeurbanne Cedex
  2. 2.Département Math-Info-SystèmeEcole Centrale de LyonEcully

Personalised recommendations