Skip to main content

The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images

  • Conference paper
  • First Online:
Document Analysis Systems (DAS 2020)

Abstract

A correct localisation of tables in a document is instrumental for determining their structure and extracting their contents; therefore, table detection is a key step in table understanding. Nowadays, the most successful methods for table detection in document images employ deep learning algorithms; and, particularly, a technique known as fine-tuning. In this context, such a technique exports the knowledge acquired to detect objects in natural images to detect tables in document images. However, there is only a vague relation between natural and document images, and fine-tuning works better when there is a close relation between the source and target task. In this paper, we show that it is more beneficial to employ fine-tuning from a closer domain. To this aim, we train different object detection algorithms (namely, Mask R-CNN, RetinaNet, SSD and YOLO) using the TableBank dataset (a dataset of images of academic documents designed for table detection and recognition), and fine-tune them for several heterogeneous table detection datasets. Using this approach, we considerably improve the accuracy of the detection models fine-tuned from natural images (in mean a 17%, and, in the best case, up to a 60%).

This work was partially supported by Ministerio de Economía y Competitividad [MTM2017-88804-P], Ministerio de Ciencia, Innovación y Universidades [RTC-2017-6640-7], Agencia de Desarrollo Económico de La Rioja [2017-I-IDD-00018], and the computing facilities of Extremadura Research Centre for Advanced Technologies (CETA-CIEMAT), funded by the European Regional Development Fund (ERDF). CETA-CIEMAT belongs to CIEMAT and the Government of Spain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdulla, W.: Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow (2017). https://github.com/matterport/Mask_RCNN

  2. Alexey, A.B.: YOLO darknet (2018). https://github.com/AlexeyAB/darknet

  3. Cesari, F., et al.: Trainable table location in document images. In: 16th International Conference on Pattern Recognition, ICPR 2002, vol. 3, p. 30236. ACM (2002)

    Google Scholar 

  4. Chen, T., et al.: MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. CoRR abs/1512.01274 (2015). http://arxiv.org/abs/1512.01274

  5. Colaboratory team: Google colaboratory (2017). https://colab.research.google.com

  6. Costa e Silva, A.: Learning rich hidden Markov models in document analysis: table location. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2010, pp. 843–847. IEEE (2009)

    Google Scholar 

  7. Coüasnon, B., Lemaitre, A.: Recognition of tables and forms. In: Doermann, D., Tombre, K. (eds.) Handbook of Document Image Processing and Recognition, pp. 647–677. Springer, London (2014). https://doi.org/10.1007/978-0-85729-859-1_20

    Chapter  Google Scholar 

  8. Embley, D.W., et al.: Table-processing paradigms: a research survey. Int. J. Doc. Anal. Recogn. 8(2–3), 647–677 (2006)

    Google Scholar 

  9. Everingham, M., et al.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)

    Article  Google Scholar 

  10. Gao, L., Yi, X., Jiang, Z., Hao, L., Tang, Z.: ICDAR2017 competition on page object detection. In: 14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017, pp. 1417–1422 (2017)

    Google Scholar 

  11. Gilani, A., et al.: Table detection using deep learning. In: 14th International Conference on Document Analysis and Recognition, ICDAR 2017, pp. 771–776. IEEE (2017)

    Google Scholar 

  12. Girshick, R., et al.: Accurate object detection and semantic segmentation. In: 2014 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2014, pp. 580–587. IEEE (2014)

    Google Scholar 

  13. Gobel, M.C., Hassan, T., Oro, E., Orsi, G.: ICDAR2013 table competition. In: 12th ICDAR Robust Reading Competition, ICDAR 2013, pp. 1449–1453. IEEE (2013)

    Google Scholar 

  14. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.deeplearningbook.org

  15. Hao, L., et al.: A table detection method for PDF documents based on convolutional neural networks. In: 12th International Workshop on Document Analysis Systems, DAS 2016, pp. 287–292. IEEE (2016)

    Google Scholar 

  16. Hirayama, Y.: A method for table structure analysis using DP matching. In: 3rd International Conference on Document Analysis and Recognition, ICDAR 1995, pp. 583–586. IEEE (1995)

    Google Scholar 

  17. Huang, Y., et al.: A YOLO-based table detection method. In: 15th International Conference on Document Analysis and Recognition, ICDAR 2019 (2019)

    Google Scholar 

  18. Institute of Computer Science and Technology of Peking University and Institute of Digital Publishing of Founder R&D Center, China: Marmot dataset for table recognition (2011). http://www.icst.pku.edu.cn/cpdp/sjzy/index.htm

  19. Jianying, H., et al.: Medium-independent table detection. In: Document Recognition and Retrieval VII. vol. 3967, pp. 583–586. International Society for Optics and Photonics (1999)

    Google Scholar 

  20. Kasar, T., et al.: Learning to detect tables in scanned document images using line information. In: 12th International Conference on Document Analysis and Recognition, ICDAR 2013, pp. 1185–1189. IEEE (2013)

    Google Scholar 

  21. Kerwat, M., George, R., Shujaee, K.: Detecting knowledge artifacts in scientific document images - comparing deep learning architectures. In: 5th International Conference on Social Networks Analysis, Management and Security, SNAMS 2018, pp. 147–152. IEEE (2018)

    Google Scholar 

  22. Kluyver, T., et al.: Jupyter notebooks – a publishing format for reproducible computational workflows. In: 20th International Conference on Electronic Publishing, pp. 87–90. IOS Press (2016)

    Google Scholar 

  23. Li, M., et al.: TableBank: Table Benchmark for Image-based Table Detection and Recognition. CoRR abs/1903.01949 (2019). http://arxiv.org/abs/1903.01949

  24. Lin, T., Goyal, P., Girshick, R., He, K., Dollár., P.: Keras retinanet (2017). https://github.com/fizyr/keras-retinanet

  25. Lin, T.Y., et al.: Focal loss for dense object detection. In: 16th International Conference on Computer Vision, ICCV 2017, pp. 2999–3007 (2017)

    Google Scholar 

  26. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  27. Oliveira, D.A.B., Viana, M.P.: Fast CNN-based document layout analysis. In: 14th International Conference on Computer Vision Workshops, ICCVW 2017, pp. 1173–1180. IEEE (2017)

    Google Scholar 

  28. Oro, E., Ruffolo, M.: PDF-TREX: an approach for recognizing and extracting tables from PDF documents. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 906–910. IEEE (2009)

    Google Scholar 

  29. Razavian, A.S., et al.: CNN features off-the-shelf: an astounding baseline for recognition. In: 27th Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2014, pp. 512–519 (2014)

    Google Scholar 

  30. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. CoRR abs/1804.02767 (2018). http://arxiv.org/abs/1804.02767

  31. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 91–99 (2015)

    Google Scholar 

  32. Rosebrock, A.: Deep Learning for Computer Vision with Python. PyImageSearch (2018). https://www.pyimagesearch.com/

  33. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  34. Schreiber, S., et al.: DeepDeSRT: deep learning for detection and structure recognition of tables in document images. In: 14th International Conference on Document Analysis and Recognition, ICDAR 2017, pp. 1162–1167. IEEE (2017)

    Google Scholar 

  35. Shahab, A., Shafait, F., Kieninger, T., Dengel, A.: An open approach towards the benchmarking of table structure recognition systems. In: 9th IAPR International Workshop on Document Analysis Systems, DAS 2010, pp. 113–120 (2010)

    Google Scholar 

  36. Siddiqui, S.A., et al.: DeCNT: deep deformable CNN for table detection. IEEE Access 6, 74151–74161 (2018)

    Article  Google Scholar 

  37. Suen, C.Y., et al.: ICDAR2019 Table Competition (2019). http://icdar2019.org/

  38. Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition. Document Anal. Recogn. 7(1), 1–16 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ángela Casado-García .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Casado-García, Á., Domínguez, C., Heras, J., Mata, E., Pascual, V. (2020). The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images. In: Bai, X., Karatzas, D., Lopresti, D. (eds) Document Analysis Systems. DAS 2020. Lecture Notes in Computer Science(), vol 12116. Springer, Cham. https://doi.org/10.1007/978-3-030-57058-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-57058-3_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-57057-6

  • Online ISBN: 978-3-030-57058-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics