The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images

Casado-García, Ángela; Domínguez, César; Heras, Jónathan; Mata, Eloy; Pascual, Vico

doi:10.1007/978-3-030-57058-3_15

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12116))

Included in the following conference series:

International Workshop on Document Analysis Systems

1360 Accesses
7 Citations

Abstract

A correct localisation of tables in a document is instrumental for determining their structure and extracting their contents; therefore, table detection is a key step in table understanding. Nowadays, the most successful methods for table detection in document images employ deep learning algorithms; and, particularly, a technique known as fine-tuning. In this context, such a technique exports the knowledge acquired to detect objects in natural images to detect tables in document images. However, there is only a vague relation between natural and document images, and fine-tuning works better when there is a close relation between the source and target task. In this paper, we show that it is more beneficial to employ fine-tuning from a closer domain. To this aim, we train different object detection algorithms (namely, Mask R-CNN, RetinaNet, SSD and YOLO) using the TableBank dataset (a dataset of images of academic documents designed for table detection and recognition), and fine-tune them for several heterogeneous table detection datasets. Using this approach, we considerably improve the accuracy of the detection models fine-tuned from natural images (in mean a 17%, and, in the best case, up to a 60%).

This work was partially supported by Ministerio de Economía y Competitividad [MTM2017-88804-P], Ministerio de Ciencia, Innovación y Universidades [RTC-2017-6640-7], Agencia de Desarrollo Económico de La Rioja [2017-I-IDD-00018], and the computing facilities of Extremadura Research Centre for Advanced Technologies (CETA-CIEMAT), funded by the European Regional Development Fund (ERDF). CETA-CIEMAT belongs to CIEMAT and the Government of Spain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

TableSegNet: a fully convolutional network for table detection and segmentation in document images

Article 22 November 2021

DeepDoT: Deep Framework for Detection of Tables in Document Images

A Hybrid Approach for Table Detection in Document Images

References

Abdulla, W.: Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow (2017). https://github.com/matterport/Mask_RCNN
Alexey, A.B.: YOLO darknet (2018). https://github.com/AlexeyAB/darknet
Cesari, F., et al.: Trainable table location in document images. In: 16th International Conference on Pattern Recognition, ICPR 2002, vol. 3, p. 30236. ACM (2002)
Google Scholar
Chen, T., et al.: MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. CoRR abs/1512.01274 (2015). http://arxiv.org/abs/1512.01274
Colaboratory team: Google colaboratory (2017). https://colab.research.google.com
Costa e Silva, A.: Learning rich hidden Markov models in document analysis: table location. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2010, pp. 843–847. IEEE (2009)
Google Scholar
Coüasnon, B., Lemaitre, A.: Recognition of tables and forms. In: Doermann, D., Tombre, K. (eds.) Handbook of Document Image Processing and Recognition, pp. 647–677. Springer, London (2014). https://doi.org/10.1007/978-0-85729-859-1_20
Chapter Google Scholar
Embley, D.W., et al.: Table-processing paradigms: a research survey. Int. J. Doc. Anal. Recogn. 8(2–3), 647–677 (2006)
Google Scholar
Everingham, M., et al.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)
Article Google Scholar
Gao, L., Yi, X., Jiang, Z., Hao, L., Tang, Z.: ICDAR2017 competition on page object detection. In: 14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017, pp. 1417–1422 (2017)
Google Scholar
Gilani, A., et al.: Table detection using deep learning. In: 14th International Conference on Document Analysis and Recognition, ICDAR 2017, pp. 771–776. IEEE (2017)
Google Scholar
Girshick, R., et al.: Accurate object detection and semantic segmentation. In: 2014 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2014, pp. 580–587. IEEE (2014)
Google Scholar
Gobel, M.C., Hassan, T., Oro, E., Orsi, G.: ICDAR2013 table competition. In: 12th ICDAR Robust Reading Competition, ICDAR 2013, pp. 1449–1453. IEEE (2013)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.deeplearningbook.org
Hao, L., et al.: A table detection method for PDF documents based on convolutional neural networks. In: 12th International Workshop on Document Analysis Systems, DAS 2016, pp. 287–292. IEEE (2016)
Google Scholar
Hirayama, Y.: A method for table structure analysis using DP matching. In: 3rd International Conference on Document Analysis and Recognition, ICDAR 1995, pp. 583–586. IEEE (1995)
Google Scholar
Huang, Y., et al.: A YOLO-based table detection method. In: 15th International Conference on Document Analysis and Recognition, ICDAR 2019 (2019)
Google Scholar
Institute of Computer Science and Technology of Peking University and Institute of Digital Publishing of Founder R&D Center, China: Marmot dataset for table recognition (2011). http://www.icst.pku.edu.cn/cpdp/sjzy/index.htm
Jianying, H., et al.: Medium-independent table detection. In: Document Recognition and Retrieval VII. vol. 3967, pp. 583–586. International Society for Optics and Photonics (1999)
Google Scholar
Kasar, T., et al.: Learning to detect tables in scanned document images using line information. In: 12th International Conference on Document Analysis and Recognition, ICDAR 2013, pp. 1185–1189. IEEE (2013)
Google Scholar
Kerwat, M., George, R., Shujaee, K.: Detecting knowledge artifacts in scientific document images - comparing deep learning architectures. In: 5th International Conference on Social Networks Analysis, Management and Security, SNAMS 2018, pp. 147–152. IEEE (2018)
Google Scholar
Kluyver, T., et al.: Jupyter notebooks – a publishing format for reproducible computational workflows. In: 20th International Conference on Electronic Publishing, pp. 87–90. IOS Press (2016)
Google Scholar
Li, M., et al.: TableBank: Table Benchmark for Image-based Table Detection and Recognition. CoRR abs/1903.01949 (2019). http://arxiv.org/abs/1903.01949
Lin, T., Goyal, P., Girshick, R., He, K., Dollár., P.: Keras retinanet (2017). https://github.com/fizyr/keras-retinanet
Lin, T.Y., et al.: Focal loss for dense object detection. In: 16th International Conference on Computer Vision, ICCV 2017, pp. 2999–3007 (2017)
Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Oliveira, D.A.B., Viana, M.P.: Fast CNN-based document layout analysis. In: 14th International Conference on Computer Vision Workshops, ICCVW 2017, pp. 1173–1180. IEEE (2017)
Google Scholar
Oro, E., Ruffolo, M.: PDF-TREX: an approach for recognizing and extracting tables from PDF documents. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 906–910. IEEE (2009)
Google Scholar
Razavian, A.S., et al.: CNN features off-the-shelf: an astounding baseline for recognition. In: 27th Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2014, pp. 512–519 (2014)
Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. CoRR abs/1804.02767 (2018). http://arxiv.org/abs/1804.02767
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 91–99 (2015)
Google Scholar
Rosebrock, A.: Deep Learning for Computer Vision with Python. PyImageSearch (2018). https://www.pyimagesearch.com/
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Schreiber, S., et al.: DeepDeSRT: deep learning for detection and structure recognition of tables in document images. In: 14th International Conference on Document Analysis and Recognition, ICDAR 2017, pp. 1162–1167. IEEE (2017)
Google Scholar
Shahab, A., Shafait, F., Kieninger, T., Dengel, A.: An open approach towards the benchmarking of table structure recognition systems. In: 9th IAPR International Workshop on Document Analysis Systems, DAS 2010, pp. 113–120 (2010)
Google Scholar
Siddiqui, S.A., et al.: DeCNT: deep deformable CNN for table detection. IEEE Access 6, 74151–74161 (2018)
Article Google Scholar
Suen, C.Y., et al.: ICDAR2019 Table Competition (2019). http://icdar2019.org/
Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition. Document Anal. Recogn. 7(1), 1–16 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of La Rioja, Logroño, Spain
Ángela Casado-García, César Domínguez, Jónathan Heras, Eloy Mata & Vico Pascual

Authors

Ángela Casado-García
View author publications
You can also search for this author in PubMed Google Scholar
César Domínguez
View author publications
You can also search for this author in PubMed Google Scholar
Jónathan Heras
View author publications
You can also search for this author in PubMed Google Scholar
Eloy Mata
View author publications
You can also search for this author in PubMed Google Scholar
Vico Pascual
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ángela Casado-García .

Editor information

Editors and Affiliations

Huazhong University of Science and Technology, Wuhan, China
Xiang Bai
Autonomous University of Barcelona, Barcelona, Spain
Dimosthenis Karatzas
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Casado-García, Á., Domínguez, C., Heras, J., Mata, E., Pascual, V. (2020). The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images. In: Bai, X., Karatzas, D., Lopresti, D. (eds) Document Analysis Systems. DAS 2020. Lecture Notes in Computer Science(), vol 12116. Springer, Cham. https://doi.org/10.1007/978-3-030-57058-3_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-57058-3_15
Published: 14 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57057-6
Online ISBN: 978-3-030-57058-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images

Abstract

Access this chapter

Similar content being viewed by others

TableSegNet: a fully convolutional network for table detection and segmentation in document images

DeepDoT: Deep Framework for Detection of Tables in Document Images

A Hybrid Approach for Table Detection in Document Images

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images

Abstract

Access this chapter

Similar content being viewed by others

TableSegNet: a fully convolutional network for table detection and segmentation in document images

DeepDoT: Deep Framework for Detection of Tables in Document Images

A Hybrid Approach for Table Detection in Document Images

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation