Robust Detection of Tables in Documents Using Scores from Table Cell Cores

Ajij, Md.; Pratihar, Sanjoy; Roy, Diptendu Sinha; Hanne, Thomas

doi:10.1007/s42979-022-01041-z

Robust Detection of Tables in Documents Using Scores from Table Cell Cores

Original Research
Open access
Published: 12 February 2022

Volume 3, article number 161, (2022)
Cite this article

Download PDF

You have full access to this open access article

SN Computer Science Aims and scope Submit manuscript

Robust Detection of Tables in Documents Using Scores from Table Cell Cores

Download PDF

Md. Ajij¹,
Sanjoy Pratihar²,
Diptendu Sinha Roy¹ &
…
Thomas Hanne ORCID: orcid.org/0000-0002-5636-1660³

4094 Accesses
3 Citations
Explore all metrics

Abstract

Table detection is an essential step in many document analysis systems. Tabular data are a pivotal form of information representation that can organize data in a conventional structure for comfortable and quick information retrieval and comparison. Detection of table structures in PDF files or images is a challenging task because of the variability of table layouts, and sometimes the tabular structures’ similarities with non-tabular elements like charts, plots, etc. In this work, we have presented a table detection method using a geometric analysis of the table cell cores that represents the table cell texts. The proposed method works by analyzing the text gap information, and hence it can detect the table cell cores, irrespective of the presence of the table boundary lines and cell-separating rule-lines. Experimentations have been done on various document images of complex structures from well-known datasets. The detection accuracies obtained by us corroborate the usefulness of the proposed method.

A Clustering Approach Combining Lines and Text Detection for Table Extraction

Table Detection and Metadata Extraction in Document Images

Digital Line Segment Detection for Table Reconstruction in Document Images

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

One of the significant challenges of document layout analysis is table understanding in the document image. Tables are broadly present in a prodigious variety of documents such as official documents, bills, scientific articles, reports, or archival documents among others; and, hence, techniques for table analysis are instrumental to automatically extract important information kept in a tabular form from numerous sources [1]. Tables facilitate readers to easily compare, analyze and understand facts present in documents [2]. So, table detection is an essential task as the accurate table detection will enhance document analysis addressing important information extraction. Due to the diversity of table styles, table detection and extraction is a popular and challenging task. There is no such general algorithm that can detect the presence of the tables in the document irrespective of the styles of the tables.

A conventional optical character recognition (OCR) system consists of three significant steps, i.e., layout analysis, character recognition and text string generation using a language modeling tool [3]. Since layout analysis is the first step in such a process, all subsequent stages rely on layout analysis to work correctly. One of the significant difficulties faced by layout analysis is detecting table regions. Tables are made of horizontal and vertical lines or by introducing uniform spaces to differentiate the cells within it. The variety of styles makes it difficult to to provide a generic algorithm for table detection [2]. Our main contribution in this paper is writing a generalized algorithm for table detection followed by information extraction. The rest of the paper is organized as follows: we discuss related work in “Related Works and Motivation”. In “Our Proposed Method”, we present our proposed approach and further details on checking components are in “Score Computation”. Results of the method are shown in “Results and Discussions” and we conclude with “Conclusions”.

Related Works and Motivation

Several methods have been proposed for table detection and are available in the literature. There are approaches which use purely geometric features extracted from the ruled lines, pixel distributions, white gaps and finally those features help detecting tables using machine learning. Our approach is based on a geometric analysis of the table cell centers. In general, these methods can be divided into two main categories, text analysis based and ruling line based.

Anh et al. [4] proposed a hybrid approach for the detection of table structures, irrespective of the style, a ruling line table or a non-ruling line table. Experimental results are shown by them for the ICDAR-2013 table competition dataset. Jahan et al. [5] proposed a method where local thresholds for word gaps and the line-heights have been used to locate and extract all categories of tables. The system shows a $75\%$ overall detection rate which was not very promising. Bansal et al. [6] presented a learning-based framework which identifies tables from scanned document images. They proposed a scheme for analyzing and labeling different document elements, their contexts, and finally to define and understand the table boundaries from the context informations. Kasar et al. [7] presented a method which works by identifying the column and row line separators. The horizontal and vertical-aligned lines are extracted first using run-length thresholds and then those aligned lines are used for feature generation and subsequent classification in to tables and non-tables.

Many recent works are available for the proposed problem, which use neural networks or deep learning models. For example, Forczmański et al. [8] presented an object detection approach using a Convolutional Neural Network. They focused on automatic segmentation of elements from documents. The elements considered by them were stamps, logos, text blocks, tables, and signatures. The authors have collected various documents from internet and created their own dataset. The method presented by them works in two stages. In the first stage, a rough classification of the detected regions of interest is done, and then in the second stage, verification of found elements are done. They experimented on public datasets and obtained a table detection accuracy of 97.79%. In another recent work, the authors have used a Convolutional Neural Network with 28 layers for the detection of tables [9]. In a study by Shah Rukh Qasim et al. [10], a graph model is used for the structural analysis problem of documents. Table recognition from scanned document images is the main point of interest. The proposed architecture is a combination of convolutional neural network (CNN) and graph network, where CNN helps in visual feature extraction and the graph network deals with the problem structure. For experimentation, the authors used UNLV [11] and ICDAR 2013 [12] datasets. The absence of the ruling lines, it was reported as $94.7\%$. Kavasidis et al. [13] solved the automated table or chart detection task by a combination of deep convolutional neural networks, graphical models and saliency concepts are presented in this article. Localization of tables and charts in documents was carried out using the saliency-based fully-convolutional neural network followed by a fully-connected conditional random field (CRF). Performance was tested on the ICDAR 2013 dataset and they observed a precision of $97.5\%$, and a recall of $98.1\%$. Arif et al. [14] suggested a novel data-driven approach for table detection from document images using foreground and background features. The observations the authors were that the tables normally contain more numeric data, they focused on differentiating the numerical and other text data. They obtained a precision of $86.33\%$ and a recall of $93.21\%$ when applied on the UNLV dataset. Schreiber et al. [15] presented a system for table detection using deep learning which works by analyzing the cell positions after detection of rows and columns present in the tables. The accuracy for table detection and structure recognition by their method was $91.44\%$ when applied on ICDAR 2013 dataset. Li et al. [16] proposed a convolutional neural networks based method which applies some loose heuristic rules to extract meta-information from the PDF documents and used those meta-informations for table detection purposes. The crucial limitation of the method is that it only works for PDF documents.

Our Contributions

From the current works available, we can see that the approaches are based on the table lines’ geometry or gap between contents only. The method in [17] is made only for ruled tables. A document may have ruled, non-ruled tables, and partially ruled tables, and our proposed method aims to work for all of them. In our proposed method, we are not relying on any horizontal or vertical lines for detecting tables.
A hybrid method to detect both ruled and non-ruled tables has been proposed in [4]. However, this method is very complicated and time consuming. It categorizes the tables as ruled and non-ruled and processes them differently. We do not classify tables into ruled and non-ruled tables, and neither do we classify text and non-text elements in the documents. Hence our method is more simple and yet useful.
The method in [18] relies on graphic lines, which sometimes leads to false detections of tables when there is a line in a paragraph with sparsely populated text. Our proposed method of score computation for the recognition solves this problem to some extent.
Mandal et al. [19] have proposed a method based on the fact that the gaps between the fields have to be larger than the gap between the words in text lines. Though, this may not always be true as tables can be densely populated. Our proposed method also uses gaps between elements in a page, but relies on a more accurate assumption that tables can be recognized seeing the well-structured point set representing the table cell cores. Cores are understood as the text blocks’ centers in the tabular cells and represented as a set of points.

Our Proposed Method

Our objective is to find tables present in a document image using simple methods. We start with a gray-scale image of the page. It is assumed that the image is already skew corrected. The image is then binarized using adaptive thresholding. Then the average character height is estimated. The next step’s goal is to separate the page into regions, each containing a single component, such as a paragraph, image, table, figure, etc. It can be shown that the gap between components is significantly larger than the gap between text lines. Next, we examine the elements inside each component and try to group them into rows and columns. These elements’ relative positions are further examined to categorize the components into two categories, tables and non-tables. These steps are described in more detail below. The proposed methodology is shown in Fig. 1.

Pre-processing

We start with a grayscale image of the document. If the image is skewed, it must be skew corrected for this method to work. We assume that the given image is already skew corrected. The input image is binarized using the method proposed by Sauvola et al. [20]. The method uses adaptive thresholding and can produce good quality binarized images even for input images that have a change of illumination or noise issues. A sample output image is shown in Fig. 2. We have slightly modified the method to extract an estimate of character height in the document image. In the final step of the binarization method, the connected black elements (say a text character) are plotted on the resulting final image (which was initially taken to be white). While plotting these elements, we keep track of the height of each of them. We find the mean and median of these heights and estimate the model using Karl Pearson’s formula of computing mode, as shown below in Eq. (1). We take this mode to be the estimated character height h.

$$\begin{aligned} {\text {mode}} = 3\times {\text {median}} - 2\times {\text {mean}} \end{aligned}$$

(1)

Component Extraction

A document consists of a variety of components or regions such as text blocks, paragraphs, images, tables, figures, etc. It is helpful to separate them before further processing. Document structure and layout analysis can be used to decompose these components from a document image. Various such techniques exist and are mentioned in [21]. We use a simple smoothing-based technique.

Component Bounding Box Detection

To detect the bounding boxes, we start by smearing the foreground pixels, like coalescing nearby black pixels and forming blobs. We use the run-length smoothing algorithm (RLSA) [22] for this. The RLSA can be used for block segmentation and text discrimination. The algorithm converts white pixels in the input image to black if the number of adjacent white pixels is less than or equal to some predefined limit l. We set this limit l to be some multiple of the estimated character height h. RLSA is applied both horizontally and vertically with respective parameter values $l_h$ and $l_v$, respectively (horizontal or vertical run-lengths).

We then traverse the edge of the blobs and find the four extreme points of each blob, namely $x_{\min }$, $y_{\min }$, $x_{\max }$ and $y_{\max }$. These four points are enough to define a bounding box (see Fig. 5). We call these bounding boxes outer bounding boxes as they represent the outer boundary of each component. The steps are shown in Fig. 3.

Inner Elements Detection

To detect the elements inside a component boundary, we use a similar approach to detect components on a page. We start with a copy of the binarized image combining nearby black pixels and forming blobs but this time only horizontally. That is, RLSA is applied only in the horizontal direction. In this way, elements in separate lines do not coalesce into a single blob. We then find the extreme points of each blob and store them as an array of bounding boxes. We call these boxes as inner bounding boxes because these are obtained from the elements inside each component. We found that rough removal of the long vertical or horizontal lines (table boundaries or separator lines) before applying RLSA in this step gives better results. Occasionally, the cell contents in a ruled table are too close to the table boundary lines. The steps are shown in Fig. 4.

Combining Inner and Outer Bounding Boxes

We now have a list of outer bounding boxes for each component and a list of inner bounding boxes (Fig. 5). These are now combined into a list of components. Each component has an outer bounding box which contains the smaller inner bounding boxes as shown in Fig. 6. These inner components are processed individually for all outer boxes one by one.

Component Representation

We represent each component with the following attributes as shown in the example in Fig. 6.

Outer bounding box: a rectangular box which contains all inner components.
An array of inner bounding boxes: an array of bounding boxes of all the elements inside the Outer bounding box. These can be text lines for paragraphs, a cell for a table, or some arbitrary region from graphics.

Table Detection

Once the components have been extracted, we need to identify the ones that could be tables. We will examine the relative position of all the inner elements to see if their structure is close to that of the table cells in any way. However, to do that, we need to find a more straightforward way to represent the inner elements. So, we attempt to group all the inner elements into rows and columns by testing if their X or Y axes projections overlap. Then, based on their overlapping areas, each inner element that could be successfully grouped into rows and columns is allocated one single 2D point. We call this point the overlapping center. Now that a single point can represent each element and by comparing their relative positions and further examination of their layout structure would be more comfortable (Fig. 7).

Row-Column Grouping

In this step, each component’s inner bounding boxes are grouped into rows and columns and marked accordingly. We ignore any inner bounding box with a width more than $75\%$ of the width of the outer bounding box as cells are generally not of this big size. It may be a header, but we only focus on the cells.

Two inner bounding boxes A and B can be said to be in the same row if their projections on the Y axis overlap. This can be checked easily with the following formula given in Eq. (2).

$$\begin{aligned} F_y(A,B) = \left\{ \begin{array}{ll} {\text {True}} &{} \quad (A \cdot y_{\min } \ge B\cdot y_{\min } \,\text { and} \,A\cdot y_{\min } \le B\cdot y_{\max }) \\ &{} \quad OR \\ &{} \quad (B\cdot y_{\min } \ge A\cdot y_{\min } \,\text { and} \, B\cdot y_{\min } \le A\cdot y_{\max }) \\ {\text {False}} &{} \quad \text {otherwise} \end{array} \right. \end{aligned}$$

(2)

The result $F_y(A,B)$ indicates whether the two boxes A and B have a Y-axis projection intersection or not. Similarly, two inner bounding boxes can be said to be in the same column if their X axis projections overlap each other. The previous formula can be tweaked a little to use in this case, as shown in Eq. (3).

$$\begin{aligned} F_x(A,B) = \left\{ \begin{array}{ll} {\text {True}} &{} \quad (A \cdot x_{\min } \ge B\cdot x_{\min } \text { and} A\cdot x_{\min } \le B\cdot x_{\max }) \\ &{} \quad OR \\ &{} \quad (B\cdot x_{\min } \ge A\cdot x_{\min } \text {and } B\cdot x_{\min } \le A\cdot x_{\max }) \\ {\text {False}} &{} \quad \text {otherwise} \end{array} \right. \end{aligned}$$

(3)

The result $F_x(A,B)$ indicates whether the two boxes A and B have an X-axis projection intersection or not.

Representing Cells by Single Points

Once the groups are formed, for each element, we check the minimum area that is overlapping with all the elements in the same row group and the same column group. Then we assign the center of this overlapped area to the element as shown in Fig. 8. To describe this more formally, let us suppose that the inner elements $I_1, I_2, I_3, \dots I_k$ are there in the same column, and the elements $J_1, J_2, J_3, \ldots J_m$ are coming from the same row. Then the coordinates of the center p of the overlapping region is defined as shown in Eqs. (4) and (5)

The collection of all p values from all overlapping regions is referred to as the core C. Hence, C represents the well-structured point set representing the table cell cores which are the text blocks’ centers within the tabular cells.

$$\begin{aligned} p_x = x_1 + (x_2-x_1)/2 \end{aligned}$$

(4)

where, $x_1 = \text {Max}(I_1\cdot x_{\min }, I_2\cdot x_{\min }, I_3\cdot x_{\min }, \ldots I_k\cdot x_{\min })$,

$x_2 = \text {Min}(I_1\cdot x_{\max }, I_2\cdot x_{\max }, I_3\cdot x_{\max }, \ldots I_k\cdot x_{\max })$

and

$$\begin{aligned} p_y = y_1 + (y_2-y_1)/2 \end{aligned}$$

(5)

where,

$y_1 = \text {Max}(J_1\cdot y_{\min }, J_2\cdot y_{\min }, J_3\cdot y_{\min }, \ldots J_m\cdot y_{\min })$,

$y_2 = \text {Min}(J_1\cdot y_{\max }, J_2\cdot y_{\max }, J_3\cdot x_{\max }, \ldots J_m\cdot y_{\max })$.

Score Computation

Now, we have a set of points, each representing an inner element of the outer components. In tables, these points would represent cells and hence would be arranged as the core structure representing the table as a whole. The cells of a table are generally group-wise uniformly spaced. This helps us in identifying a table even in the absence of ruling lines. We define a score based examination method from the relative distances of these points to identify if a given component is to be considered as a table or not.

Vertical and Horizontal Relations and Distances

For two given points p and q in C, we say, $pR_vq$ (p is vertically related to q) if $\vert p\cdot y-q\cdot y \vert \le \delta $ where $\delta $ is some distance threshold. Similarly, we define $pR_hq$ if $\vert p\cdot x-q\cdot x\vert \le \delta $. Next, we do the following using the point set to calculate the tabular structure score:

1.
For every point p find a point q, if any, such that $pR_hq$ holds, $p\cdot x > q\cdot x$, and q is closest to p.
2.
For every point p find a point q, if any, such that $pR_hq$ holds, $p\cdot x < q\cdot x$, and q is closest to p.
3.
For every point p find a point q, if any, such that $pR_vq$ holds, $p\cdot y > q\cdot y$, and q is closest to p.
4.
For every point p find a point q, if any, such that $pR_vq$ holds, $p\cdot y < q\cdot y$, and q is closest to p.

Here, q is closest to p in terms of distances and these distances are stored. The patterns of this distances will reveal the nature of the component. Some distance values will repeat (or close enough) many times in case of tables. We define our score for recognition of tables using the frequencies of these distances as shown in Eq. 6.

$$\begin{aligned} \text {Score} =\frac{ \sum {n_{d_i} \times r_{d_i}}}{\max \{ \vert pR_hq \vert + \vert pR_vq \vert , \vert C \vert \}} \end{aligned}$$

(6)

Here, $n_{d_i}$ denotes the frequency of the distance $d_i$, $r_{d_i}$ denotes the number of points involved with distance $d_i$, $\vert pR_hq \vert $ and $\vert pR_vq \vert $ denotes, respectively, the number of horizontal and vertical relations, and $\vert C \vert $ denotes the size of core C (in terms of number of points). For example, with respect to the following point set shown in Fig. 9, we have, $r_{d_1}=6$, $n_{d_1}=3$, $r_{d_2}=6$, $n_{d_2}=3$, $r_{d_3}=9$, $n_{d_3}=6$, $\vert pR_hq \vert =6$, $\vert pR_vq \vert =6$, and $\vert C \vert =9$. Therefore, for this example we have, Score = 7.5.

Results and Discussions

Table 1 Results of proposed method on ICDAR-2013 dataset in comparison with the state-of-the-art

Full size table

Our program was tested on a computer with Intel Core $i3-6098P$ Processor which has a base frequency of 3.60 GHz. Our method was tested on 80 input documents images taken from various scholarly articles. the document pages contain various types of tables along with other types of graphics elements like plots, equations, images etc. Figure 14 shows the required CPU times for detection of tables for some sample documents of different size.

The computed scores from the core points for some documents are shown in Figs. 10, 11, 12, 13. Based on experiment observations, we have classified components as tables when the score exceeds 5.00 (Fig. 14).

Results are shown for various types of document images in Figs. 15, 16 and 17. Here, Figs. 15 and 16 show pages containing ruled tables, whereas Fig. 17 shows document pages with tables where cell contents are not separated by ruled lines.

Evaluation Metrics

To evaluate the classification accuracy, four metrics have been used in our work: Precision, Recall, F1 score, and Accuracy. Respective definitions are shown in Eq. (7) where TP, FP, FN, and TN represent true positives, false positives, and false negatives, and true negatives, respectively. Here, TP represents the count of tables correctly predicted as a table. The figure FP shows the number of non-tables (plots, graphs, graphics) predicted as a table. FN represents the count of tables not detected as tables, and TN denotes the count of non-tables predicted as a non-table.

$$\begin{aligned} \begin{array}{l} \text{ Accuracy } =(\mathrm {TP}+\mathrm {TN}) /(T \mathrm {P}+\mathrm {FP}+\mathrm {FN}+\mathrm {TN}) \\ \text{ Precision } =\mathrm {TP} /(\mathrm {TP}+\mathrm {FP}) \\ \text{ Recall } =\mathrm {TP} /(\mathrm {TP}+\mathrm {FN}) \\ \text{ F1 } \text{ Score } =2^{*}(\mathrm {Recall} * \text{ Precision } ) / \text{(Recall } \text{+ } \text{ Precision) } \end{array} \end{aligned}$$

(7)

Another metric we used for evaluation, Intersection over Union (IoU), which is used widely in the object detection benchmarks [11]. It measures the overlap between predicted and ground truth tables’ covering rectangles or polygons. The value of IoU lies in the range of [0, 1]. The higher value of IoU designates maximum match in the ground truth and predicted tables.

Our Results

Initially, we tested our method on our own dataset. Our dataset contains 80 document images with 99 tables in total. We obtained $FP=8$, $FN=6$, and $TP=93$ thereby giving precision = 0.921, recall = 0.939, and $F_1$ score equal to 0.93.

We have also tested our method on datasets like ICDAR-2013, Marmot, TableBank, and ICDAR-2019. ICDAR-2013 [12] is one of the most popular datasets for table detection and structure recognition. This dataset is created by documents obtained from web pages. This dataset was made for a competition concentrated on detecting figures, tables, and mathematical equations from document images. The dataset is composed of PDF files which we converted to images to be used within our research work. The dataset contains 59 PDF files, a total of 117 tables. To give the algorithms ample possibility to find false positives, approximately two pages before and after the table included as excerpts. A comparative discussion on accuracy figures with respect to other methods applied on ICDAR-2013 is shown in Table 1, which clearly shows that the proposed method outperforms the other methods. Our proposed method detects well irrespective of the presence of the table boundary rule lines, which is the major contribution of our work.

The Marmot [23] dataset comprises English and Chinese documents. The dataset consists of 2000 images, where a ratio of almost 1:1 is present between the positive to negative samples. The pages show a great variety in language type, page layout, and table styles. Over 1500 conference and journal papers pages were taken into the dataset, covering various fields, spanning from the year 1970 to the latest 2011 publications. The e-Book pages are primarily in one-column layout, while the English pages are mixed with both one-column and two-column layouts. Our method was tested on English pages, and it achieved the precision of 0.960, recall of 0.984, and F1 score equal to 0.972.

The TableBank dataset [24] consists of 417, 234 high quality labeled tables as well as their original documents in a variety of domains. Our method achieved the precision of 0.9813, recall of 0.9482, and F1 Score 0.9645, respectively, on a subset of this dataset. We tested only on Part 1 of this dataset. The Part 1 of this dataset consists of 1379 document pages.

The ICDAR-19 represents a recent time table detection dataset introduced in the table detection competition at ICDAR 2019 [25]. Our approach achieved the precision of 0.9777, recall of 0.9518, and F1 Score 0.9645 on this dataset. Sample results using the proposed method on some sample document pages from various datasets are shown in Fig. 18.

For the proposed method, we obtained the IoU values as 0.90, 0.65 and 0.60 with respect to our dataset, the Marmot dataset, and the TableBank (Part 1), respectively (Tables 2, 3, 4).

Table 2 Results of proposed method on Marmot dataset in comparison with the state-of-the-art

Full size table

Table 3 Results of proposed method on the TableBank dataset in comparison with the state-of-the-art

Full size table

Table 4 Results of proposed method on ICDAR-2019 dataset in comparison with the state-of-the-art

Full size table

Challenges

Sample document pages where our proposed method failed are shown in Fig. 19 from the Marmot dataset. In the image shown in Fig. 19a (Marmot-123), horizontal spacing is not there in the first column, and this fact lowers the score. In our experimentation, the horizontal ($l_h$) and vertical ($l_v$) run-length parameters are taken as 1.5h and 1.5h to detect the outer and inner boxes. Here, h is the average height of characters on the page. In the case of the image shown in Fig. 19b (Marmot-190), the presence of the vertical separator line and its proximity with the text words are the reasons for failure.

For the sample shown in Fig. 20 (Tablebank-007), we see that the distributions of texts within the table cells are not uniform. So, a proper selection of horizontal and vertical smoothing parameters detects the table correctly. We see that a run-length of value 2h for horizontal smoothing ($l_h$) and of h for vertical smoothing ($l_v$) detects the table correctly.

Optimized selection of the run-length parameter values can be done in terms of the average height of characters. The inter-line, inter-paragraph, or the table cell gaps are decided in terms of average height of characters.

Conclusions

This paper presents a novel method for the detection of tables from document images. Table detection in documents is, for instance, necessary to convert tables in a document image into an editable format. Once the tables are detected, the table cells can be localized, and finally, using OCR, we can extract cell contents. While it is straightforward to detect ruled tables and is often done by identifying the horizontal and vertical lines of the table borders, it is more challenging to detect unruled or partially ruled tables. We presented a method that can recognize tables irrespective of whether it is ruled, partially ruled or unruled. We do not look for any lines or boundaries; instead, we rely on the fact that cells of a table are arranged in well structured way. We have shown that the structure can be represented as a set of core points if the cell contents are replaced by representative points. In this work, we have shown the score computation method only for the detection table structures. Further, the work can be extended to design the score formulae for other types of graphics elements like plots, graphs, equations, etc. Further automated selection of run-length parameters for smoothing will be worthy to explore for fine-tuning.

Data Availability

The used datasets are available at the locations specified in the references [11, 12, 23,24,25]. Our software is available at Zenodo at https://zenodo.org/record/5824970.

References

Casado-García Á, Domínguez C, Heras J, Mata E, Pascual V. The benefits of close-domain fine-tuning for table detection in document images. In: International workshop on document analysis systems. Cham: Springer; 2020. p. 199–215.
Chapter Google Scholar
Gilani A, Qasim SR, Malik I, Shafait F, Table detection using deep learning. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol. 1, IEEE; 2017, p. 771–6.
Shafait F, Smith R. Table detection in heterogeneous documents. In: Proceedings of the 9th IAPR international workshop on document analysis systems, IEEE; 2010, p. 65–72.
Anh TT, In-Seop N, Soo-Hyung K. A hybrid method for table detection from document image. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR), IEE; 2015, p. 131–5.
Jahan MA, Ragel RG. Locating tables in scanned documents for reconstructing and republishing. In: 7th international conference on information and automation for sustainability, IEEE; 2014, p. 1–6.
Bansal A, Harit G, Roy SD. Table extraction from document images using fixed point model. In: Proceedings of the 2014 Indian conference on computer vision graphics and image processing, ACM; 2014, p. 1–8.
Kasar T, Barlas P, Adam S, Chatelain C, Paquet T. Learning to detect tables in scanned document images using line information. In: 2013 12th international conference on document analysis and recognition, IEEE; 2013, p. 1185–1189.
Forczmański P, Smoliński A, Nowosielski A, Małecki K. Segmentation of scanned documents using deep-learning approach. In: International conference on computer recognition systems. Springer, Cham; 2019, p. 141–152.
Redmon J, Farhadi A. Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, IEEE; 2017, p. 7263–7271.
Qasim SR, Mahmood H, Shafait F. Rethinking table recognition using graph neural networks; 2019. p. 142–7.
Please update the reference with complete details.
Göbel M, Hassan T, Oro E, Orsi G. Icdar 2013 table competition. In: 2013 12th international conference on document analysis and recognition, IEEE; 2013, p. 1449–53.
Kavasidis I, Pino C, Palazzo S, Rundo F, Giordano D, Messina P, Spampinato C. A saliency-based convolutional neural network for table and chart detection in digitized documents. In: International conference on image analysis and processing. Springer, Cham; 2019, p. 292–302.
Arif S, Shafait F. Table detection in document images using foreground and background features. In: Digital image computing: techniques and applications (DICTA). IEEE; 2018, p. 1–8.
Schreiber S, Agne S, Wolf I, Dengel A, Ahmed S. Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol. 1, IEEE; 2017, p. 1162–7.
Hao L, Gao L, Yi X, Tang Z. A table detection method for pdf documents based on convolutional neural networks. In: 12th IAPR workshop on document analysis systems (DAS). IEEE; 2016, p. 287–92.
Gatos B, Danatsas D, Pratikakis I, Perantonis SJ. Automatic table detection in document images. In: International conference on pattern recognition and image analysis. Springer; 2005, p. 609–18.
Ramel J-Y, Crucianu M, Vincent N, Faure C. Detection, extraction and representation of tables. In: Seventh international conference on document analysis and recognition. Proceedings., IEEE; 2003, p. 374–8.
Mandal S, Chowdhury S, Das AK, Chanda B. A simple and effective table detection system from document images. Int J Doc Anal Recogn. 2006;8(2):172–82.
Article Google Scholar
Hadjadj Z, Meziane A, Cherfa Y, Cheriet M, Setitra I. Isauvola: improved sauvola’s algorithm for document image binarization. In: International conference on image analysis and recognition. Springer, Cham; 2016, p. 737–45.
Namboodiri AM, Jain AK. Document structure and layout analysis. London: Springer; 2007. p. 29–48.
Google Scholar
Wong KY, Casey RG, Wahl FM. Document analysis system. IBM J Res Dev. 1982;26(6):647–56.
Article Google Scholar
Fang J, Tao X, Tang Z, Qiu R, Liu Y. Dataset, ground-truth and performance metrics for table detection evaluation. In: 10th IAPR international workshop on document analysis systems. IEEE. 2012; p. 445–9.
Li M, Cui L, Huang S, Wei F, Zhou M, Li Z. Tablebank: table benchmark for image-based table detection and recognition. In: Proceedings of the 12th language resources and evaluation conference; 2020, p. 1918–25.
Gao L, Huang Y, Déjean H, Meunier J-L, Yan Q, Fang Y, Kleber F, Lang E. Icdar 2019 competition on table detection and recognition (CTDAR). In: 2019 international conference on document analysis and recognition (ICDAR), IEEE; 2019, p. 1510–5.
Nazir D, Hashmi KA, Pagani A, Liwicki M, Stricker D, Afzal MZ. Hybridtabnet: towards better table detection in scanned document images. Appl Sci. 2021;11(18):8396.
Article Google Scholar
Hashmi KA, Pagani A, Liwicki M, Stricker D, Afzal MZ. Castabdetectors: cascade network for table detection in document images with recursive feature pyramid and switchable atrous convolution. J Imaging. 2021;7(10):214.
Article Google Scholar
Agarwal M, Mondal A, Jawahar C. Cdec-net: composite deformable cascade network for table detection in document images. In: 2020 25th international conference on pattern recognition (ICPR), IEEE; 2021, p. 9491–8.
He D, Cohen S, Price B, Kifer D, Giles CL. Multi-scale multi-task fcn for semantic page segmentation and table detection. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1, IEEE; 2017, p. 254–61.
Prasad D, Gadpal A, Kapadni K, Visave M, Sultanpure K. Cascadetabnet: an approach for end to end table detection and structure recognition from image-based documents. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops; 2020, p. 572–3.

Download references

Acknowledgement

We would like to acknowledge IIIT Kalyani undergraduate students Sayantan Das, Somenath Maji, and Somnath Pal for their help on this paper.

Funding

Open access funding provided by FHNW University of Applied Sciences and Arts Northwestern Switzerland.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Meghalaya, Shillong, India
Md. Ajij & Diptendu Sinha Roy
Department of Computer Science and Engineering, Indian Institute of Information Technology, Kalyani, India
Sanjoy Pratihar
FHNW University of Applied Sciences and Arts Northwestern Switzerland School of Business, Riggenbachstrasse 16, 4600, Olten, Switzerland
Thomas Hanne

Authors

Md. Ajij
View author publications
You can also search for this author in PubMed Google Scholar
Sanjoy Pratihar
View author publications
You can also search for this author in PubMed Google Scholar
Diptendu Sinha Roy
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Hanne
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Hanne.

Ethics declarations

Conflict of Interest

The authors declare that there is no conflict of interest related to this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ajij, M., Pratihar, S., Roy, D.S. et al. Robust Detection of Tables in Documents Using Scores from Table Cell Cores. SN COMPUT. SCI. 3, 161 (2022). https://doi.org/10.1007/s42979-022-01041-z

Download citation

Received: 17 August 2021
Accepted: 18 January 2022
Published: 12 February 2022
DOI: https://doi.org/10.1007/s42979-022-01041-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Robust Detection of Tables in Documents Using Scores from Table Cell Cores

Abstract

Similar content being viewed by others

A Clustering Approach Combining Lines and Text Detection for Table Extraction

Table Detection and Metadata Extraction in Document Images

Digital Line Segment Detection for Table Reconstruction in Document Images

Introduction

Related Works and Motivation

Our Proposed Method

Pre-processing

Component Extraction

Component Bounding Box Detection

Inner Elements Detection

Combining Inner and Outer Bounding Boxes

Component Representation

Table Detection

Row-Column Grouping

Representing Cells by Single Points

Score Computation

Vertical and Horizontal Relations and Distances

Results and Discussions

Evaluation Metrics

Our Results

Challenges

Conclusions

Data Availability

References

Acknowledgement

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation