DASFAA 2016: Database Systems for Advanced Applications pp 533-548 | Cite as
Automated Table Understanding Using Stub Patterns
Conference paper
First Online:
Abstract
Tables in documents are a rich source of information, but not yet well-utilised computationally because of the difficulty of extracting their structure and data automatically. In this paper, we progress the state-of-the-art in automatic table extraction by identifying common patterns in table headers to develop rules and heuristics for determining table structure. We describe and evaluate a table understanding system using these patterns and rules.
Keywords
Table understanding Table logical structure Table stub analysis Table categories Category hierarchyReferences
- 1.Alrayes, N., Luk, W.-S.: Automatic transformation of multi-dimensional web tables into data cubes. Data Warehousing and Knowledge Discovery. LNCS, vol. 7448, pp. 81–92. Springer, Heidelberg (2012)CrossRefGoogle Scholar
- 2.e Silva, A.C., Jorge, A., Torgo, L.: Design of an end-to-end method to extract information from tables. IJDAR 82(2–3), 144–171 (2006)CrossRefGoogle Scholar
- 3.Embley, D.W., Hurst, M., Lopresti, D., Nagy, G.: Table-processing paradigms: a research survey. IJDAR 8(2–3), 66–86 (2006)CrossRefGoogle Scholar
- 4.Fang, J., Mitra, P., Tang, Z., Giles, C.L.: Table header detection and classification. In: AAAI (2012)Google Scholar
- 5.Jha, P., Nagy, G.: Wang notation tool: layout independent representation of tables. In: ICPR, pp. 1–4. IEEE (2008)Google Scholar
- 6.Nagy, G.: Learning the characteristics of critical cells from web tables. In: ICPR, pp. 1554–1557. IEEE (2012)Google Scholar
- 7.Nagy, G., Seth, S., Embley, D.W.: End-to-end conversion of html tables for populating a relational database. In: DAS, pp. 222–226. IEEE (2014)Google Scholar
- 8.Nagy, G., Tamhankar, M.: Vericlick: an efficient tool for table format verification. In: IS&T/SPIE Electronic Imaging, pp. 1–9 (2012)Google Scholar
- 9.Oro, E., Ruffolo, M.: PDF-TREX: an approach for recognizing and extracting tables from pdf documents. In: ICDAR, pp. 906–910. IEEE (2009)Google Scholar
- 10.Padmanabhan, R.K.: Table abstraction tool. PhD thesis, Citeseer (2009)Google Scholar
- 11.Rastan, R., Paik, H.-Y., Shepherd, J.: TEXUS: a task-based approach for table extraction and understanding. In: DocEng2015, pp. 25–34 (2015)Google Scholar
- 12.Seth, S., Jandhyala, R., Krishnamoorthy, M., Nagy, G.: Analysis and taxonomy of column header categories for web tables. In: IAPR, pp. 81–88. ACM (2010)Google Scholar
- 13.Seth, S., Nagy, G.: Segmenting tables via indexing of value cells by table headers. In: ICDAR, pp. 887–891. IEEE (2013)Google Scholar
- 14.Wang, X.: Tabular abstraction, editing, and formatting. PhD thesis, University of Waterloo (1996)Google Scholar
- 15.Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition. Doc. Anal. Recogn. 7(1), 1–16 (2004)Google Scholar
- 16.Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18(6), 1245–1262 (1989)MathSciNetCrossRefMATHGoogle Scholar
Copyright information
© Springer International Publishing Switzerland 2016