Advertisement

Table Detection via Probability Optimization

  • Yalin Wang
  • Ihsin T. Phillips
  • Robert M. Haralick
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2423)

Abstract

In this paper, we define the table detection problem as a probability optimization problem. We begin, as we do in our previous algorithm, finding and validating each detected table candidates. We proceed to compute a set of probability measurements for each of the table entities. The computation of the probability measurements takes into consideration tables, table text separators and table neighboring text blocks. Then, an iterative updating method is used to optimize the page segmentation probability to obtain the final result. This new algorithm shows a great improvement over our previous algorithm. The training and testing data set for the algorithm include 1, 125 document pages having 518 table entities and a total of 10, 934 cell entities. Compared with our previouswork, it raised the accuracy rate to 95.67% from 90.32% and to 97.05% from 92.04%.

Keywords

Probability Optimization Text Line Text Block Document Page Cell Entity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Y. Wang, I. T. Phillips, and R. Haralick. Automatic table ground truth generation and a background-analysis-based table structure extraction method. In Sixth International Conference on Document Analysis and Recognition(ICDAR01), pages 528–532, Seattle, WA, September 2001.Google Scholar
  2. 2.
    J. Hu, R. Kashi, D. Lopresti, and G. Wilfong. Medium-independent table detection. In SPIE Document Recognition and Retrieval VII, pages 291–302, San Jose, California, January 2000.Google Scholar
  3. 3.
    E. Green and M. Krishnamoorthy. Model-based analysis of printed tables. In Proceedings of the 3rd ICDAR, pages 214–217, Canada, August 1995.Google Scholar
  4. 4.
    J. H. Shamilian, H. S. Baird, and T. L. Wood. A retargetable table reader. In Proceedings of the 4th ICDAR, pages 158–163, Germany, August 1997.Google Scholar
  5. 5.
    T. G. Kieninger. Table structure recognition based on robust block segmentation. Document Recognition V., pages 22–32, January 1998.Google Scholar
  6. 6.
    T. Kieninger and A. Dengel. Applying the t-rec table recognition system to the business letter domain. In Sixth International Conference on Document Analysis and Recognition(ICDAR01), pages 518–522, Seattle,WA, September 2001.Google Scholar
  7. 7.
    B. Klein, S. Gokkus, T. Kieninger, and A. Dengel. Three approaches to “industrial” table spotting. In Sixth International Conference on Document Analysis and Recognition(ICDAR01), pages 513–517, Seattle,WA, September 2001.Google Scholar
  8. 8.
    Y. Wang, R. Haralick, and I. T. Phillips. Improvement of zone content classification by using background analysis. In Fourth IAPR InternationalWorkshop on Document Analysis Systems. (DAS2000), Rio de Janeiro, Brazil, December 2000.Google Scholar
  9. 9.
    J. Liang, I. T. Phillips, and R. M. Haralick. Consistent partition and labeling of text blocks. Journal of Pattern Analysis and Applications, 3:196–208, 2000.CrossRefGoogle Scholar
  10. 10.
    R. Haralick and L. Shapiro. Computer and Robot Vision, volume 1. AddisonWesley, 1997.Google Scholar
  11. 11.
    J. Liang. Document Structure Analysis and Performance Evaluation. Ph.D thesis, Univ. of Washington, Seattle,WA, 1999.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Yalin Wang
    • 1
  • Ihsin T. Phillips
    • 2
  • Robert M. Haralick
    • 3
  1. 1.Dept. of Elect. Eng. Univ. of WashingtonSeattleUS
  2. 2.Dept. of Comp. Science, Queens CollegeCity Univ. of NewYorkFlushingUS
  3. 3.The Graduate SchoolCity Univ. Of NewYorkNewYorkUS

Personalised recommendations