Advertisement

The T-Recs Table Recognition and Analysis System

  • Thomas Kieninger
  • Andreas Dengel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1655)

Abstract

This paper presents a new approach to table structure recognition as well as to layout analysis. The discussed recognition process differs significantly from existing approaches as it realizes a bottom-up clustering of given word segments, whereas conventional table structure recognizers all rely on the detection of some separators such as delineation or significant white space to analyze a page from the top-down. The following analysis of the recognized layout elements is based on the construction of a tile structure and detects row- and/or column spanning cells as well as sparse tables with a high degree of confidence. The overall system is completely domain independent, optionally neglects textual contents and can thus be applied to arbitrary mixed-mode documents (with or without tables) of any language and even operates on low quality OCR documents (e.g. facsimiles).

Keywords

Segmentation Algorithm Text Line White Space Table Column Layout Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Reference

  1. 1.
    Surekha Chandran and Rangachar Kasturi: Structural Recognition of Tabulated Data. In Proc. of International Conference on Document Analysis and Recognition-ICDAR 93, 1993.Google Scholar
  2. 2.
    Allen S. Condit.: Autotag-A tool for creating Structured Document Collections from Printed Materials. Master's thesis, Dept. of Computer Science, University of Nevada, Las Vegas, 1995.Google Scholar
  3. 3.
    Andreas Dengel: About the Logical Partitioning of Document Images. In Proceedings SDAIR-94, Int'l Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, pages 209–218, April 1994.Google Scholar
  4. 4.
    Lawrence O’Gorman: The Document Spectrum for Bottom-Up Page Layout Analysis. In H. Bunke, editor, Advances in Structural and Syntactic Pattern Recognition, pages 270–279. World Scientific, 1992.Google Scholar
  5. 5.
    E. Green and M. Krishnamoorthy: Recognition of Tables using Table Grammars. In Proc. of the 4-th Symposium on Document Analysis and Information Retrieval-SDAIR95, Las Vegas, Nevada, 1995.Google Scholar
  6. 6.
    Yuki Hirayama: A Method for Table Structure Analysis using DP Matching. In Proc. of International Conference on Document Analysis and Recognition-ICDAR 95, Montreal, Canada, 1995.Google Scholar
  7. 7.
    Tao Hu: New Methods for Robust and Efficient Recognition of the Logical Structures in Documents. PhD thesis, Institute of Informatics of the University of Fribourg, Switzerland, 1994.Google Scholar
  8. 8.
    Katsuhiko Itonori: Table Structure Recognition based on Textblock Arrangement and Ruled Line Position. In Proc. of International Conference on Document Analysis and Recognition-ICDAR 93, 1993.Google Scholar
  9. 9.
    Thomas Kieninger: The T-Recs Table Converting System. available at http://www.dfki.uni-kl.de/~kieni/doc/trecs3.ps.gz, April 1998.
  10. 10.
    Koich Kise, Akinori Sato, and Keinosuke Matsumoto: Document Image Segmentation as Selection of Voronoi Edges. In Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR 97, June 1997.Google Scholar
  11. 11.
    George Nagy and S. Seth: Hierarchical Representation of Optically Scanned Documents. In Proc. of the 7th Intl. Conference on Pattern Recognition (ICPR), 1984.Google Scholar
  12. 12.
    T. Ohya, M. Iri, and K. Murota: A fast Voronoi Diagram Algorithm with Quaternary Tree Bucketing. In Information Processing Letters, Vol. 18, No. 4, 1984.Google Scholar
  13. 13.
    M. Armon Rahgozar, Zhigang Fan, and Emil V. Rainero: Tabular Document Recognition. In Proc. of the SPIE Conference on Document Recognition, 1994.Google Scholar
  14. 14.
    Stephen Rice, Frank Jenkins, and Thomas Nartker: The Fourth Annual Test of OCR Accuracy. Technical report, Information Science Research Institute (ISRI), Univ. of Nevada, Las Vegas, 1995.Google Scholar
  15. 15.
    Stephen V. Rice, Frank R. Jenkins, and Thomas A. Nartker: The Fifth Annual Test of OCR Accuracy. Technical report, Information Science Research Institute (ISRI), Univ. of Nevada, Las Vegas, 1996.Google Scholar
  16. 16.
    Daniela Rus and Kristen Summers: Using White Space for Automated Document Structuring. Technical Report TR 94-1452, Department of Computer Science, Cornell University, 1994.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Thomas Kieninger
    • 1
  • Andreas Dengel
    • 1
  1. 1.DFKI-GmbHPostfachKaiserslautern, FRG

Personalised recommendations