Abstract
A correct localisation of tables in a document is instrumental for determining their structure and extracting their contents; therefore, table detection is a key step in table understanding. Nowadays, the most successful methods for table detection in document images employ deep learning algorithms; and, particularly, a technique known as fine-tuning. In this context, such a technique exports the knowledge acquired to detect objects in natural images to detect tables in document images. However, there is only a vague relation between natural and document images, and fine-tuning works better when there is a close relation between the source and target task. In this paper, we show that it is more beneficial to employ fine-tuning from a closer domain. To this aim, we train different object detection algorithms (namely, Mask R-CNN, RetinaNet, SSD and YOLO) using the TableBank dataset (a dataset of images of academic documents designed for table detection and recognition), and fine-tune them for several heterogeneous table detection datasets. Using this approach, we considerably improve the accuracy of the detection models fine-tuned from natural images (in mean a 17%, and, in the best case, up to a 60%).
This work was partially supported by Ministerio de Economía y Competitividad [MTM2017-88804-P], Ministerio de Ciencia, Innovación y Universidades [RTC-2017-6640-7], Agencia de Desarrollo Económico de La Rioja [2017-I-IDD-00018], and the computing facilities of Extremadura Research Centre for Advanced Technologies (CETA-CIEMAT), funded by the European Regional Development Fund (ERDF). CETA-CIEMAT belongs to CIEMAT and the Government of Spain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdulla, W.: Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow (2017). https://github.com/matterport/Mask_RCNN
Alexey, A.B.: YOLO darknet (2018). https://github.com/AlexeyAB/darknet
Cesari, F., et al.: Trainable table location in document images. In: 16th International Conference on Pattern Recognition, ICPR 2002, vol. 3, p. 30236. ACM (2002)
Chen, T., et al.: MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. CoRR abs/1512.01274 (2015). http://arxiv.org/abs/1512.01274
Colaboratory team: Google colaboratory (2017). https://colab.research.google.com
Costa e Silva, A.: Learning rich hidden Markov models in document analysis: table location. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2010, pp. 843–847. IEEE (2009)
Coüasnon, B., Lemaitre, A.: Recognition of tables and forms. In: Doermann, D., Tombre, K. (eds.) Handbook of Document Image Processing and Recognition, pp. 647–677. Springer, London (2014). https://doi.org/10.1007/978-0-85729-859-1_20
Embley, D.W., et al.: Table-processing paradigms: a research survey. Int. J. Doc. Anal. Recogn. 8(2–3), 647–677 (2006)
Everingham, M., et al.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)
Gao, L., Yi, X., Jiang, Z., Hao, L., Tang, Z.: ICDAR2017 competition on page object detection. In: 14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017, pp. 1417–1422 (2017)
Gilani, A., et al.: Table detection using deep learning. In: 14th International Conference on Document Analysis and Recognition, ICDAR 2017, pp. 771–776. IEEE (2017)
Girshick, R., et al.: Accurate object detection and semantic segmentation. In: 2014 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2014, pp. 580–587. IEEE (2014)
Gobel, M.C., Hassan, T., Oro, E., Orsi, G.: ICDAR2013 table competition. In: 12th ICDAR Robust Reading Competition, ICDAR 2013, pp. 1449–1453. IEEE (2013)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.deeplearningbook.org
Hao, L., et al.: A table detection method for PDF documents based on convolutional neural networks. In: 12th International Workshop on Document Analysis Systems, DAS 2016, pp. 287–292. IEEE (2016)
Hirayama, Y.: A method for table structure analysis using DP matching. In: 3rd International Conference on Document Analysis and Recognition, ICDAR 1995, pp. 583–586. IEEE (1995)
Huang, Y., et al.: A YOLO-based table detection method. In: 15th International Conference on Document Analysis and Recognition, ICDAR 2019 (2019)
Institute of Computer Science and Technology of Peking University and Institute of Digital Publishing of Founder R&D Center, China: Marmot dataset for table recognition (2011). http://www.icst.pku.edu.cn/cpdp/sjzy/index.htm
Jianying, H., et al.: Medium-independent table detection. In: Document Recognition and Retrieval VII. vol. 3967, pp. 583–586. International Society for Optics and Photonics (1999)
Kasar, T., et al.: Learning to detect tables in scanned document images using line information. In: 12th International Conference on Document Analysis and Recognition, ICDAR 2013, pp. 1185–1189. IEEE (2013)
Kerwat, M., George, R., Shujaee, K.: Detecting knowledge artifacts in scientific document images - comparing deep learning architectures. In: 5th International Conference on Social Networks Analysis, Management and Security, SNAMS 2018, pp. 147–152. IEEE (2018)
Kluyver, T., et al.: Jupyter notebooks – a publishing format for reproducible computational workflows. In: 20th International Conference on Electronic Publishing, pp. 87–90. IOS Press (2016)
Li, M., et al.: TableBank: Table Benchmark for Image-based Table Detection and Recognition. CoRR abs/1903.01949 (2019). http://arxiv.org/abs/1903.01949
Lin, T., Goyal, P., Girshick, R., He, K., Dollár., P.: Keras retinanet (2017). https://github.com/fizyr/keras-retinanet
Lin, T.Y., et al.: Focal loss for dense object detection. In: 16th International Conference on Computer Vision, ICCV 2017, pp. 2999–3007 (2017)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Oliveira, D.A.B., Viana, M.P.: Fast CNN-based document layout analysis. In: 14th International Conference on Computer Vision Workshops, ICCVW 2017, pp. 1173–1180. IEEE (2017)
Oro, E., Ruffolo, M.: PDF-TREX: an approach for recognizing and extracting tables from PDF documents. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 906–910. IEEE (2009)
Razavian, A.S., et al.: CNN features off-the-shelf: an astounding baseline for recognition. In: 27th Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2014, pp. 512–519 (2014)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. CoRR abs/1804.02767 (2018). http://arxiv.org/abs/1804.02767
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 91–99 (2015)
Rosebrock, A.: Deep Learning for Computer Vision with Python. PyImageSearch (2018). https://www.pyimagesearch.com/
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Schreiber, S., et al.: DeepDeSRT: deep learning for detection and structure recognition of tables in document images. In: 14th International Conference on Document Analysis and Recognition, ICDAR 2017, pp. 1162–1167. IEEE (2017)
Shahab, A., Shafait, F., Kieninger, T., Dengel, A.: An open approach towards the benchmarking of table structure recognition systems. In: 9th IAPR International Workshop on Document Analysis Systems, DAS 2010, pp. 113–120 (2010)
Siddiqui, S.A., et al.: DeCNT: deep deformable CNN for table detection. IEEE Access 6, 74151–74161 (2018)
Suen, C.Y., et al.: ICDAR2019 Table Competition (2019). http://icdar2019.org/
Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition. Document Anal. Recogn. 7(1), 1–16 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Casado-García, Á., Domínguez, C., Heras, J., Mata, E., Pascual, V. (2020). The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images. In: Bai, X., Karatzas, D., Lopresti, D. (eds) Document Analysis Systems. DAS 2020. Lecture Notes in Computer Science(), vol 12116. Springer, Cham. https://doi.org/10.1007/978-3-030-57058-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-57058-3_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57057-6
Online ISBN: 978-3-030-57058-3
eBook Packages: Computer ScienceComputer Science (R0)