Automated Table Understanding Using Stub Patterns

  • Roya Rastan
  • Hye-young Paik
  • John Shepherd
  • Armin Haller
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9642)

Abstract

Tables in documents are a rich source of information, but not yet well-utilised computationally because of the difficulty of extracting their structure and data automatically. In this paper, we progress the state-of-the-art in automatic table extraction by identifying common patterns in table headers to develop rules and heuristics for determining table structure. We describe and evaluate a table understanding system using these patterns and rules.

Keywords

Table understanding Table logical structure Table stub analysis Table categories Category hierarchy 

References

  1. 1.
    Alrayes, N., Luk, W.-S.: Automatic transformation of multi-dimensional web tables into data cubes. Data Warehousing and Knowledge Discovery. LNCS, vol. 7448, pp. 81–92. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  2. 2.
    e Silva, A.C., Jorge, A., Torgo, L.: Design of an end-to-end method to extract information from tables. IJDAR 82(2–3), 144–171 (2006)CrossRefGoogle Scholar
  3. 3.
    Embley, D.W., Hurst, M., Lopresti, D., Nagy, G.: Table-processing paradigms: a research survey. IJDAR 8(2–3), 66–86 (2006)CrossRefGoogle Scholar
  4. 4.
    Fang, J., Mitra, P., Tang, Z., Giles, C.L.: Table header detection and classification. In: AAAI (2012)Google Scholar
  5. 5.
    Jha, P., Nagy, G.: Wang notation tool: layout independent representation of tables. In: ICPR, pp. 1–4. IEEE (2008)Google Scholar
  6. 6.
    Nagy, G.: Learning the characteristics of critical cells from web tables. In: ICPR, pp. 1554–1557. IEEE (2012)Google Scholar
  7. 7.
    Nagy, G., Seth, S., Embley, D.W.: End-to-end conversion of html tables for populating a relational database. In: DAS, pp. 222–226. IEEE (2014)Google Scholar
  8. 8.
    Nagy, G., Tamhankar, M.: Vericlick: an efficient tool for table format verification. In: IS&T/SPIE Electronic Imaging, pp. 1–9 (2012)Google Scholar
  9. 9.
    Oro, E., Ruffolo, M.: PDF-TREX: an approach for recognizing and extracting tables from pdf documents. In: ICDAR, pp. 906–910. IEEE (2009)Google Scholar
  10. 10.
    Padmanabhan, R.K.: Table abstraction tool. PhD thesis, Citeseer (2009)Google Scholar
  11. 11.
    Rastan, R., Paik, H.-Y., Shepherd, J.: TEXUS: a task-based approach for table extraction and understanding. In: DocEng2015, pp. 25–34 (2015)Google Scholar
  12. 12.
    Seth, S., Jandhyala, R., Krishnamoorthy, M., Nagy, G.: Analysis and taxonomy of column header categories for web tables. In: IAPR, pp. 81–88. ACM (2010)Google Scholar
  13. 13.
    Seth, S., Nagy, G.: Segmenting tables via indexing of value cells by table headers. In: ICDAR, pp. 887–891. IEEE (2013)Google Scholar
  14. 14.
    Wang, X.: Tabular abstraction, editing, and formatting. PhD thesis, University of Waterloo (1996)Google Scholar
  15. 15.
    Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition. Doc. Anal. Recogn. 7(1), 1–16 (2004)Google Scholar
  16. 16.
    Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18(6), 1245–1262 (1989)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Roya Rastan
    • 1
  • Hye-young Paik
    • 1
  • John Shepherd
    • 1
  • Armin Haller
    • 2
  1. 1.The University of New South WalesSydneyAustralia
  2. 2.Australian National UniversityCanberraAustralia

Personalised recommendations