PyramidTabNet: Transformer-Based Table Recognition in Image-Based Documents

Umer, Muhammad; Mohsin, Muhammad Ahmed; Ul-Hasan, Adnan; Shafait, Faisal

doi:10.1007/978-3-031-41734-4_26

Muhammad Umer¹¹,
Muhammad Ahmed Mohsin¹¹,
Adnan Ul-Hasan¹² &
…
Faisal Shafait^11,12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14191))

Included in the following conference series:

International Conference on Document Analysis and Recognition

760 Accesses

Abstract

Table detection and structure recognition is an important component of document analysis systems. Deep learning-based transformer models have recently demonstrated significant success in various computer vision and document analysis tasks. In this paper, we introduce PyramidTabNet (PTN), a method that builds upon Convolution-less Pyramid Vision Transformer to detect tables in document images. Furthermore, we present a tabular image generative augmentation technique to effectively train the architecture. The proposed augmentation process consists of three steps, namely, clustering, fusion, and patching, for the generation of new document images containing tables. Our proposed pipeline demonstrates significant performance improvements for table detection on several standard datasets. Additionally, it achieves performance comparable to the state-of-the-art methods for structure recognition tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images

TabAug: Data Driven Augmentation for Enhanced Table Structure Recognition

Adaptive Scaling for Archival Table Structure Recognition

References

Agarwal, M., Mondal, A., Jawahar, C.: CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9491–9498. IEEE (2021)
Google Scholar
Arif, S., Shafait, F.: Table Detection in Document Images using Foreground and Background Features. In: 2018 20th Digital Image Computing: Techniques and Applications (DICTA), pp. 1–8. IEEE (2018)
Google Scholar
Cai, Z., Vasconcelos, N.: Cascade R-CNN: Delving Into High Quality Object Detection. In: 2018 Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6154–6162. IEEE (2018)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-End Object Detection With Transformers. In: 2020 16th European Conference on Computer Vision (ECCV), pp. 213–229. Springer (2020)
Google Scholar
Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.L.: Complicated Table Structure Recognition. arXiv preprint arXiv:1908.04729 (2019)
Dai, J., et al.: Deformable Convolutional Networks. In: 2017 16th International Conference on Computer Vision (ICCV), pp. 764–773. IEEE (2017)
Google Scholar
Duan, D., Xie, M., Mo, Q., Han, Z., Wan, Y.: An Improved Hough Transform for Line Detection. In: 2010 International Conference on Computer Application and System Modeling (ICCASM). vol. 2, pp. 354–357 (2010)
Google Scholar
Fang, J., Tao, X., Tang, Z., Qiu, R., Liu, Y.: Dataset, Ground-Truth and Performance Metrics for Table Detection Evaluation. In: 2012 10th IAPR International Workshop on Document Analysis Systems (DAS), pp. 445–449 (2012)
Google Scholar
Fernandes, J., Simsek, M., Kantarci, B., Khan, S.: TableDet: An End-to-End Deep Learning Approach for Table Detection and Table Image Classification in Data Sheet Images. In: Neurocomputing. vol. 468, pp. 317–334. Elsevier (2022)
Google Scholar
Gao, L., et al.: ICDAR 2019 Competition on Table Detection and Recognition (cTDaR). In: 2019 16th International Conference on Document Analysis and Recognition (ICDAR), pp. 1510–1515 (2019)
Google Scholar
Gao, L., Yi, X., Jiang, Z., Hao, L., Tang, Z.: ICDAR 2017 Competition on Page Object Detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 1, pp. 1417–1422 (2017)
Google Scholar
Gilani, A., Qasim, S.R., Malik, I., Shafait, F.: Table Detection Using Deep Learning. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 1, pp. 771–776. IEEE (2017)
Google Scholar
Göbel, M., Hassan, T., Oro, E., Orsi, G.: ICDAR 2013 Table Competition. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1449–1453 (2013)
Google Scholar
Hashmi, K.A., Pagani, A., Liwicki, M., Stricker, D., Afzal, M.Z.: CasTabDetectoRS: Cascade Network for Table Detection in Document Images With Recursive Feature Pyramid and Switchable Atrous Convolution. In: Journal of Imaging. vol. 7, p. 214. MDPI (2021)
Google Scholar
Hashmi, K.A., Stricker, D., Liwicki, M., Afzal, M.N., Afzal, M.Z.: Guided Table Structure Recognition Through Anchor Optimization. In: IEEE Access. vol. 9, pp. 113521–113534. IEEE (2021)
Google Scholar
Khan, S.A., Khalid, S.M.D., Shahzad, M.A., Shafait, F.: Table Structure Extraction with Bi-Directional Gated Recurrent Unit Networks. In: 2019 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1366–1371. IEEE (2019)
Google Scholar
Khan, U., Zahid, S., Ali, M.A., Ul-Hasan, A., Shafait, F.: TabAug: Data Driven Augmentation for Enhanced Table Structure Recognition. In: 2021 16th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 2, pp. 585–601. Springer (2021)
Google Scholar
Li, J., Xu, Y., Lv, T., Cui, L., Zhang, C., Wei, F.: DiT: Self-Supervised Pre-training for Document Image Transformer. In: 2022 30th ACM International Conference on Multimedia (ACM MM), pp. 3530–3539 (2022)
Google Scholar
Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: TableBank: Table Benchmark for Image-Based Table Detection and Recognition. In: 2020 12th Language Resources and Evaluation Conference (LREC), pp. 1918–1925 (2020)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Ma, C., Lin, W., Sun, L., Huo, Q.: Robust Table Detection and Structure Recognition from Heterogeneous Document Images. In: Pattern Recognition. vol. 133, p. 109006. Elsevier (2023)
Google Scholar
Nazir, D., Hashmi, K.A., Pagani, A., Liwicki, M., Stricker, D., Afzal, M.Z.: Hybridtabnet: Towards Better Table Detection in Scanned Document Images. In: Applied Sciences. vol. 11, p. 8396. MDPI (2021)
Google Scholar
Paliwal, S.S., Vishwanath, D., Rahul, R., Sharma, M., Vig, L.: TableNet: Deep Learning Model for End-To-End Table Detection and Tabular Data Extraction from Scanned Document Images. In: 2019 15th International Conference on Document Analysis and Recognition (ICDAR), pp. 128–133. IEEE (2019)
Google Scholar
Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: CascadeTabNet: An Approach for End-to-End Table Detection and Structure Recognition from Image-Based Documents. In: 2020 Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 572–573 (2020)
Google Scholar
Qasim, S.R., Mahmood, H., Shafait, F.: Rethinking Table Recognition Using Graph Neural Networks. In: 2019 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 142–147. IEEE (2019)
Google Scholar
Raja, S., Mondal, A., Jawahar, C.V.: Table structure recognition using top-down and bottom-up cues. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 70–86. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_5
Chapter Google Scholar
Raja, S., Mondal, A., Jawahar, C.: Visual Understanding of Complex Table Structures from Document Images. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2299–2308 (2022)
Google Scholar
Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1162–1167 (2017)
Google Scholar
Shahab, A., Shafait, F., Kieninger, T., Dengel, A.: An Open Approach Towards The Benchmarking of Table Structure Recognition Systems. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 113–120 (2010)
Google Scholar
Siddiqui, S.A., Fateh, I.A., Rizvi, S.T.R., Dengel, A., Ahmed, S.: DeepTabStR: Deep Learning Based Table Structure Recognition. In: 2019 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1403–1409 (2019)
Google Scholar
Siddiqui, S.A., Malik, M.I., Agne, S., Dengel, A., Ahmed, S.: DeCNT: Deep Deformable CNN for Table Detection. In: IEEE Access. vol. 6, pp. 74151–74161. IEEE (2018)
Google Scholar
Smock, B., Pesala, R., Abraham, R.: PubTables-1M: Towards Comprehensive Table Extraction from Unstructured Documents. In: 2022 Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4634–4642 (2022)
Google Scholar
Tensmeyer, C., Morariu, V.I., Price, B., Cohen, S., Martinez, T.: Deep Splitting and Merging for Table Structure Decomposition. In: 2019 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 114–121. IEEE (2019)
Google Scholar
Wang, W., et al.: Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions. In: 2021 17th International Conference on Computer Vision (ICCV), pp. 568–578. IEEE (2021)
Google Scholar
Wang, W., et al.: PVT v2: improved baselines with pyramid vision transformer. Comput. Visual Media 8, 1–10 (2022). https://doi.org/10.1007/s41095-022-0274-8
Article Google Scholar
Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global Table Extractor (GTE): A Framework for Joint Table Identification and Cell Structure Recognition using Visual Context. In: 2021 Winter Conference on Applications of Computer Vision (WACV), pp. 697–706 (2021)
Google Scholar
Zhong, X., Tang, J., Yepes, A.J.: PubLayNet: Largest Dataset Ever for Document Layout Analysis. In: 2019 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022. IEEE (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad, Pakistan
Muhammad Umer, Muhammad Ahmed Mohsin & Faisal Shafait
Deep Learning Laboratory, National Center of Artificial Intelligence (NCAI), Islamabad, Pakistan
Adnan Ul-Hasan & Faisal Shafait

Authors

Muhammad Umer
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Ahmed Mohsin
View author publications
You can also search for this author in PubMed Google Scholar
Adnan Ul-Hasan
View author publications
You can also search for this author in PubMed Google Scholar
Faisal Shafait
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Ahmed Mohsin .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Umer, M., Mohsin, M.A., Ul-Hasan, A., Shafait, F. (2023). PyramidTabNet: Transformer-Based Table Recognition in Image-Based Documents. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-41734-4_26
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41733-7
Online ISBN: 978-3-031-41734-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

PyramidTabNet: Transformer-Based Table Recognition in Image-Based Documents

Abstract

Access this chapter

Similar content being viewed by others

The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images

TabAug: Data Driven Augmentation for Enhanced Table Structure Recognition

Adaptive Scaling for Archival Table Structure Recognition

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

PyramidTabNet: Transformer-Based Table Recognition in Image-Based Documents

Abstract

Access this chapter

Similar content being viewed by others

The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images

TabAug: Data Driven Augmentation for Enhanced Table Structure Recognition

Adaptive Scaling for Archival Table Structure Recognition

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation