Optimized Table Tokenization for Table Structure Recognition

Lysak, Maksym; Nassar, Ahmed; Livathinos, Nikolaos; Auer, Christoph; Staar, Peter

doi:10.1007/978-3-031-41679-8_3

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14188))

Included in the following conference series:

International Conference on Document Analysis and Recognition

1107 Accesses

Abstract

Extracting tables from documents is a crucial task in any document conversion pipeline. Recently, transformer-based models have demonstrated that table-structure can be recognized with impressive accuracy using Image-to-Markup-Sequence (Im2Seq) approaches. Taking only the image of a table, such models predict a sequence of tokens (e.g. in HTML, LaTeX) which represent the structure of the table. Since the token representation of the table structure has a significant impact on the accuracy and run-time performance of any Im2Seq model, we investigate in this paper how table-structure representation can be optimised. We propose a new, optimised table-structure language (OTSL) with a minimized vocabulary and specific rules. The benefits of OTSL are that it reduces the number of tokens to 5 (HTML needs 28+) and shortens the sequence length to half of HTML on average. Consequently, model accuracy improves significantly, inference time is halved compared to HTML-based models, and the predicted table structures are always syntactically correct. This in turn eliminates most post-processing needs. Popular table structure data-sets will be published in OTSL format to the community.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Flexible Hybrid Table Recognition and Semantic Interpretation System

Article Open access 04 March 2023

Automated Table Understanding Using Stub Patterns

Table Extraction from Text Documents

References

Auer, C., Dolfi, M., Carvalho, A., Ramis, C.B., Staar, P.W.J.: Delivering document conversion as a cloud service with high throughput and responsiveness. CoRR abs/2206.00785 (2022). https://doi.org/10.48550/arXiv.2206.00785
Chen, B., Peng, D., Zhang, J., Ren, Y., Jin, L.: Complex table structure recognition in the wild using transformer and identity matrix-based augmentation. In: Porwal, U., Fornés, A., Shafait, F. (eds.) ICFHR 2022. LNCS, vol. 13639, pp. 545–561. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-21648-0_37
Chapter Google Scholar
Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.L.: Complicated table structure recognition. arXiv preprint arXiv:1908.04729 (2019)
Deng, Y., Rosenberg, D., Mann, G.: Challenges in end-to-end neural scientific table recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 894–901. IEEE (2019)
Google Scholar
Kayal, P., Anand, M., Desai, H., Singh, M.: Tables to LaTeX: structure and content extraction from scientific tables. Int. J. Doc. Anal. Recognit. (IJDAR) 26, 1–10 (2022)
Google Scholar
Lee, E., et al.: Table structure recognition based on grid shape graph. In: 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1868–1873. IEEE (2022)
Google Scholar
Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: Tablebank: a benchmark dataset for table detection and recognition (2019)
Google Scholar
Livathinos, N., et al.: Robust pdf document conversion using recurrent neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 17, pp. 15137–15145 (2021). http://ojs.aaai.org/index.php/AAAI/article/view/17777
Nassar, A., Livathinos, N., Lysak, M., Staar, P.: Tableformer: table structure understanding with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4614–4623 (2022)
Google Scholar
Pfitzmann, B., Auer, C., Dolfi, M., Nassar, A.S., Staar, P.W.J.: Doclaynet: a large human-annotated dataset for document-layout segmentation. In: Zhang, A., Rangwala, H. (eds.) KDD 2022: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022, pp. 3743–3751. ACM (2022). https://doi.org/10.1145/3534678.3539043
Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: Cascadetabnet: an approach for end to end table detection and structure recognition from image-based documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 572–573 (2020)
Google Scholar
Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1162–1167. IEEE (2017)
Google Scholar
Siddiqui, S.A., Fateh, I.A., Rizvi, S.T.R., Dengel, A., Ahmed, S.: Deeptabstr: deep learning based table structure recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1403–1409 (2019). https://doi.org/10.1109/ICDAR.2019.00226
Smock, B., Pesala, R., Abraham, R.: PubTables-1M: towards comprehensive table extraction from unstructured documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4634–4642 (2022)
Google Scholar
Staar, P.W.J., Dolfi, M., Auer, C., Bekas, C.: Corpus conversion service: a machine learning platform to ingest documents at scale. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, pp. 774–782. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3219819.3219834
Wang, X.: Tabular abstraction, editing, and formatting. Ph.D. thesis, CAN (1996). aAINN09397
Google Scholar
Xue, W., Li, Q., Tao, D.: Res2tim: reconstruct syntactic structures from table images. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 749–755. IEEE (2019)
Google Scholar
Xue, W., Yu, B., Wang, W., Tao, D., Li, Q.: Tgrnet: a table graph reconstruction network for table structure recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1295–1304 (2021)
Google Scholar
Ye, J., et al.: PingAN-VCGroup’s solution for ICDAR 2021 competition on scientific literature parsing Task B: table recognition to HTML (2021). https://doi.org/10.48550/ARXIV.2105.01848. http://arxiv.org/abs/2105.01848
Zhang, Z., Zhang, J., Du, J., Wang, F.: Split, embed and merge: an accurate table structure recognizer. Pattern Recogn. 126, 108565 (2022)
Article Google Scholar
Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global table extractor (GTE): a framework for joint table identification and cell structure recognition using visual context. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 697–706 (2021). https://doi.org/10.1109/WACV48630.2021.00074
Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition: data, model, and evaluation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 564–580. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_34
Chapter Google Scholar
Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022. IEEE (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Research, Zurich, Switzerland
Maksym Lysak, Ahmed Nassar, Nikolaos Livathinos, Christoph Auer & Peter Staar

Authors

Maksym Lysak
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Nassar
View author publications
You can also search for this author in PubMed Google Scholar
Nikolaos Livathinos
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Auer
View author publications
You can also search for this author in PubMed Google Scholar
Peter Staar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maksym Lysak .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lysak, M., Nassar, A., Livathinos, N., Auer, C., Staar, P. (2023). Optimized Table Tokenization for Table Structure Recognition. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14188. Springer, Cham. https://doi.org/10.1007/978-3-031-41679-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-41679-8_3
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41678-1
Online ISBN: 978-3-031-41679-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Optimized Table Tokenization for Table Structure Recognition

Abstract

Access this chapter

Similar content being viewed by others

Flexible Hybrid Table Recognition and Semantic Interpretation System

Automated Table Understanding Using Stub Patterns

Table Extraction from Text Documents

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Optimized Table Tokenization for Table Structure Recognition

Abstract

Access this chapter

Similar content being viewed by others

Flexible Hybrid Table Recognition and Semantic Interpretation System

Automated Table Understanding Using Stub Patterns

Table Extraction from Text Documents

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation