A Theoretical Foundation and a Method for Document Table Structure Extraction and Decompositon

  • Howard Wasserman
  • Keitaro Yukawa
  • Bon Sy
  • Kui-Lam Kwok
  • Ihsin Tsaiyun Phillips
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2423)

Abstract

The algorithm described in this paper is designed to detect potential table regions in the document, to decide whether a potential table region is, in fact, a table, and, when it is, to analyze the table structure. The decision and analysis phases of the algorithm and the resulting system are based primarily on a precise definition of table, and it is such a definition that is discussed in this paper. An adequate definition need not be complete in the sense of encompassing all possible structures that might be deemed to be tables, but it should encompass most such structures, it should include essential features of tables, and it should exclude features never or very rarely possessed by tables.

Keywords

Word Segmentation Horizontal Projection Vertical Range Document Page Layout Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Y. Wang, I. T. Phillips, and R. Haralick. Automatic table ground truth generation and a background-analysis-based table structure extraction method. Sixth International Conference on Document Analysis and Recognition(ICDAR01), pages 528–532, September 2001.Google Scholar
  2. 2.
    E. Green and M. Krishnamoorthy. Model-based analysis of printed tables. Proceedings of the 3rd ICDAR, pages 214–217, August 1995.Google Scholar
  3. 3.
    J. H. Shamilian, H. S. Baird, and T. L. Wood. A retargetable table reader. Proceedings of the 4th ICDAR, pages 158–163, August 1997.Google Scholar
  4. 4.
    T. G. Kieninger. Table structure recognition based on robust block segmentation. Document Recognition V., pages 22–32, January 1998.Google Scholar
  5. 5.
    T. Kieninger and A. Dengel. Applying the t-rec table recognition system to the business letter domain. Sixth International Conference on Document Analysis and Recognition(ICDAR01), pages 518–522, September 2001.Google Scholar
  6. 6.
    B. Klein, S. Gokkus, T. Kieninger, and A. Dengel. Three approaches to “industrial” table spotting. Sixth International Conference on Document Analysis and Recognition(ICDAR01), pages 513–517, September 2001.Google Scholar
  7. 7.
    K. Zuyev. “Table image segmentation”. Proceedings of ICDAR, Germany, 1997. pp.705–708, 1997.Google Scholar
  8. 8.
    J. Hu, R. Kashi, D. Lopresti and G. Wilfong. A system for understanding and reformulating tables. Proc. 4th IAPR Intl. Workshop on Document Analysis Systems-DAS’2000. pp.361–372, 2000.Google Scholar
  9. 9.
    J. Hu, R. Kashi, D. Lopresti, and G. Wilfong. Medium-independent table detection. SPIE Document Recognition and Retrieval VII, pages 291–302, January 2000.Google Scholar
  10. 10.
    J. Liang, I. T. Phillips, and R. M. Haralick. Consistent partition and labeling of text blocks. Journal of Pattern Analysis and Applications, 3:196–208, 2000.CrossRefGoogle Scholar
  11. 11.
    Yalin Wang, Ihsin. T. Phillips, Robert M. Haralick, Statistical-based Approach to Word Segmentation. Proceedings of the 15th International Conference on Pattern Recognition(ICPR2000), Barcelona, Span, September 3–7, 2000, p.555–558.Google Scholar
  12. 12.
    Jisheng Liang, Ihsin. T. Phillips, and Robert M. Haralick, “An Optimization Methodology for Document Structure Extraction on Latin Character Documents”, the Journal of IEEE Transactions for Pattern Analysis and Machine Intelligence, 2000.Google Scholar
  13. 13.
    Jisheng Liang, Ihsin. T. Phillips, and Robert M. Haralick, “Performance Evaluation of Document Structure Extraction Algorithms”, the Journal of computer Vision and Image Understanding, 2000.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Howard Wasserman
    • 1
  • Keitaro Yukawa
    • 1
  • Bon Sy
    • 1
  • Kui-Lam Kwok
    • 1
  • Ihsin Tsaiyun Phillips
    • 1
  1. 1.Department of Computer ScienceQueens College, the City University of New YorkFlushing

Personalised recommendations