Adaptive Scaling for Archival Table Structure Recognition

Li, Xiao-Hui; Yin, Fei; Zhang, Xu-Yao; Liu, Cheng-Lin

doi:10.1007/978-3-030-86549-8_6

Xiao-Hui Li^11,12,
Fei Yin¹¹,
Xu-Yao Zhang^11,12 &
…
Cheng-Lin Liu^11,12,13

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12821))

Included in the following conference series:

International Conference on Document Analysis and Recognition

4160 Accesses
2 Citations

Abstract

Table detection and structure recognition from archival document images remain challenging due to diverse table structures, complex document layouts, degraded image qualities and inconsistent table scales. In this paper, we propose an instance segmentation based approach for archival table structure recognition which utilizes both foreground cell content and background ruling line information. To overcome the influence from inconsistent table scales, we design an adaptive image scaling method based on average cell size and density of ruling lines inside each document image. Different from previous multi-scale training and testing approaches which usually slow down the speed of the whole system, our adaptive scaling resizes each image to a single optimal size which can not only improve overall model performance but also reduce memory and computing overhead on average. Extensive experiments on cTDaR 2019 Archival dataset show that our method can outperform the baselines and achieve new state-of-the-art performance, which demonstrates the effectiveness and superiority of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://pytorch.org/get-started/locally/.

References

Agarwal, M., Mondal, A., Jawahar, C.: Cdec-net: composite deformable cascade network for table detection in document images. arXiv:2008.10831 (2020)
Arias, J.F., Kasturi, R.: Efficient extraction of primitives from line drawings composed of horizontal and vertical lines. Mach. Vis. Appl. 10(4), 214–221 (1997)
Article Google Scholar
Bai, M., Urtasun, R.: Deep watershed transform for instance segmentation. In: CVPR, pp. 5221–5229 (2017)
Google Scholar
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587 (2017)
Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.L.: Complicated table structure recognition. arXiv:1908.04729 (2019)
Chin, T.W., Ding, R., Marculescu, D.: Adascale: towards real-time video object detection using adaptive scaling. arXiv:1902.02910 (2019)
Deng, Y., Rosenberg, D., Mann, G.: Challenges in end-to-end neural scientific table recognition. In: ICDAR, pp. 894–901. IEEE (2019)
Google Scholar
Gao, L., et al.: Icdar 2019 competition on table detection and recognition (ctdar). In: ICDAR, pp. 1510–1515. IEEE (2019)
Google Scholar
Gatos, B., Danatsas, D., Pratikakis, I., Perantonis, S.J.: Automatic table detection in document images. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 609–618. Springer, Heidelberg (2005). https://doi.org/10.1007/11551188_67
Chapter Google Scholar
Ghanmi, N., Belaid, A.: Table detection in handwritten chemistry documents using conditional random fields. In: ICFHR, pp. 146–151. IEEE (2014)
Google Scholar
Gilani, A., Qasim, S.R., Malik, I., Shafait, F.: Table detection using deep learning. In: ICDAR, vol. 1, pp. 771–776. IEEE (2017)
Google Scholar
Girshick, R.: Fast r-cnn. In: ICCV (2015)
Google Scholar
Hayder, Z., He, X., Salzmann, M.: Boundary-aware instance segmentation. In: CVPR, pp. 5696–5704 (2017)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
He, W., Zhang, X.Y., Yin, F., Luo, Z., Ogier, J.M., Liu, C.L.: Realtime multi-scale scene text detection with scale-based region proposal network. Pattern Recogn. 98, 107026 (2020)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR, pp. 4700–4708 (2017)
Google Scholar
Huang, Y., et al.: A yolo-based table detection method. In: ICDAR, pp. 813–818. IEEE (2019)
Google Scholar
Khan, S.A., Khalid, S.M.D., Shahzad, M.A., Shafait, F.: Table structure extraction with bi-directional gated recurrent unit networks. In: ICDAR, pp. 1366–1371. IEEE (2019)
Google Scholar
Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: Tablebank: Table benchmark for image-based table detection and recognition. arXiv:1903.01949 (2019)
Li, X.H., Yin, F., Liu, C.L.: Page object detection from pdf document images by deep structured prediction and supervised clustering. In: ICPR, pp. 3627–3632. IEEE (2018)
Google Scholar
Li, X.-H., Yin, F., Liu, C.-L.: Page segmentation using convolutional neural network and graphical model. In: Bai, X., Karatzas, D., Lopresti, D. (eds.) DAS 2020. LNCS, vol. 12116, pp. 231–245. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57058-3_17
Chapter Google Scholar
Li, X.H., Yin, F., Xue, T., Liu, L., Ogier, J.M., Liu, C.L.: Instance aware document image segmentation using label pyramid networks and deep watershed transformation. In: ICDAR, pp. 514–519. IEEE (2019)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)
Google Scholar
Qasim, S.R., Mahmood, H., Shafait, F.: Rethinking table recognition using graph neural networks. In: ICDAR, pp. 142–147. IEEE (2019)
Google Scholar
Raja, S., Mondal, A., Jawahar, C.: Table structure recognition using top-down and bottom-up cues. arXiv:2010.04565 (2020)
Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Google Scholar
Riba, P., Dutta, A., Goldmann, L., Fornés, A., Ramos, O., Lladós, J.: Table detection in invoice documents by graph neural networks. In: ICDAR, pp. 122–127. IEEE (2019)
Google Scholar
Richardson, E., et al.: It’s all about the scale-efficient text detection using adaptive scaling. In: WACV, pp. 1844–1853 (2020)
Google Scholar
Seo, W., Koo, H.I., Cho, N.I.: Junction-based table detection in camera-captured document images. IJDAR 18(1), 47–57 (2015)
Article Google Scholar
Siddiqui, S.A., Fateh, I.A., Rizvi, S.T.R., Dengel, A., Ahmed, S.: Deeptabstr: deep learning based table structure recognition. In: ICDAR, pp. 1403–1409. IEEE (2019)
Google Scholar
Siddiqui, S.A., Khan, P.I., Dengel, A., Ahmed, S.: Rethinking semantic segmentation for table structure recognition in documents. In: ICDAR, pp. 1397–1402. IEEE (2019)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Sun, N., Zhu, Y., Hu, X.: Faster r-cnn based table detection combining corner locating. In: ICDAR, pp. 1314–1319. IEEE (2019)
Google Scholar
Tensmeyer, C., Morariu, V.I., Price, B., Cohen, S., Martinez, T.: Deep splitting and merging for table structure decomposition. In: ICDAR, pp. 114–121. IEEE (2019)
Google Scholar
Tseng, L.Y., Chen, R.C.: Recognition and data extraction of form documents based on three types of line segments. Pattern Recogn. 31(10), 1525–1540 (1998)
Article Google Scholar
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: CVPR, pp. 9336–9345 (2019)
Google Scholar
Xue, W., Li, Q., Tao, D.: Res2tim: reconstruct syntactic structures from table images. In: ICDAR, pp. 749–755. IEEE (2019)
Google Scholar
Zheng, Y., Liu, C., Ding, X., Pan, S.: Form frame line detection with directional single-connected chain. In: ICDAR, pp. 699–703. IEEE (2001)
Google Scholar
Zhong, X., ShafieiBavani, E., Yepes, A.J.: Image-based table recognition: data, model, and evaluation. arXiv:1911.10683 (2019)
Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document layout analysis. In: ICDAR, pp. 1015–1022. IEEE (2019)
Google Scholar

Download references

Acknowledgments

This work has been supported by the National Key Research and Development Program Grant 2020AAA0109702, the National Natural Science Foundation of China (NSFC) grants 61733007, 61721004.

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation of Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing, 100190, People’s Republic of China
Xiao-Hui Li, Fei Yin, Xu-Yao Zhang & Cheng-Lin Liu
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, People’s Republic of China
Xiao-Hui Li, Xu-Yao Zhang & Cheng-Lin Liu
CAS Center for Excellence of Brain Science and Intelligence Technology, Beijing, People’s Republic of China
Cheng-Lin Liu

Authors

Xiao-Hui Li
View author publications
You can also search for this author in PubMed Google Scholar
Fei Yin
View author publications
You can also search for this author in PubMed Google Scholar
Xu-Yao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Lin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cheng-Lin Liu .

Editor information

Editors and Affiliations

Universitat Autònoma de Barcelona, Barcelona, Spain
Josep Lladós
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti
Kyushu University, Fukuoka-shi, Japan
Seiichi Uchida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, XH., Yin, F., Zhang, XY., Liu, CL. (2021). Adaptive Scaling for Archival Table Structure Recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12821. Springer, Cham. https://doi.org/10.1007/978-3-030-86549-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-86549-8_6
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86548-1
Online ISBN: 978-3-030-86549-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)