Skip to main content

YOLO-table: disclosure document table detection with involution

Abstract

As financial document automation becomes more general, table detection is receiving more and more attention as an important part of document automation. Disclosure documents contain both bordered and borderless tables of varying lengths, and there is currently no model that performs well on these types of documents. To solve this problem, we propose a table detection model based on YOLO-table. We introduce involution into the backbone of the network to improve the network’s ability to learn table spatial layout features and design a simple Feature Pyramid Network to improve model effectiveness. In addition, this paper proposes a table-based augment method. We experiment on a disclosure document dataset, and the results show that the F1-measure of the YOLO-table reaches 97.3%. Compared with YOLOv3, our method improves the accuracy by 2.8% and the speed by 1.25 times. It also evaluates the ICDAR2013 and ICDAR2019 Table Competition datasets and achieves state-of-the-art performance.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

References

  1. Li, H., Yang, Q., Cao, Y., Yao, J., et al.: Cracking tabular presentation diversity for automatic cross-checking over numerical facts. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2599–2607 (2020)

  2. Hu, J., Kashi, R.S., Lopresti, D., et al.: Evaluating the performance of table processing algorithms. Int. J. Doc. Anal. Recogn. 4(3), 140–153 (2020)

    Article  Google Scholar 

  3. Dai, J., Li, Y., He, K., et al.: R-fcn: Object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)

  4. Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural. Inf. Process. Syst. 28, 91–99 (2015)

    Google Scholar 

  5. He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

  6. Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

  7. Liu, W., Anguelov, D., Erhan, D., et al.: SSD: Single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37 (2016)

  8. Lin, T. Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

  9. Gobel, M., Hassan, T., Oro, E., et al.: Icdar 2013 table competition. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1449–1453 (2013)

  10. Gao, L., Huang, Y., Déjean, H., et al.: Icdar 2019 competition on table detection and recognition (ctdar). In: 2019 15th International Conference on Document Analysis and Recognition (ICDAR), pp. 1510–1515 (2019)

  11. Cesarini, F., Marinai, S., Sarti, L., et al.: Trainable table location in document images. In: Object Recognition Supported by User Interaction for Service Robots vol. 3, pp. 236–240 (2002)

  12. Yildiz, B., Kaiser, K., Miksch, S.: pdf2table: A method to extract table information from pdf files. In: IICAI, pp. 1773–1785 (2005)

  13. Silva, A.C.: Learning rich hidden Markov models in document analysis: Table location. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 843–847 (2009)

  14. Melinda, L., Bhagvati, C.: Parameter-free table detection method. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 454–460 (2019)

  15. He, D., Cohen, S., Price, B., et al.: Multi-scale multi-task FCN for semantic page segmentation and table detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 254–261 (2017)

  16. Fang, J., Tao, X., Tang, Z., et al.: Dataset, ground-truth and performance metrics for table detection evaluation. In: 2012 10th IAPR International Workshop on Document Analysis Systems (DAS), pp. 445–449 (2012)

  17. Kavasidis, I., Palazzo, S., Spampinato, C., et al.: A saliency-based convolutional neural network for table and chart detection in digitized documents. arXiv preprint arXiv:1804.06236 (2018)

  18. Gilani, A., Qasim, S. R., Malik, I., et al.: Table detection using deep learning. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 771–776 (2017)

  19. Shafait, F., Smith, R.: Table detection in heterogeneous documents. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 65–72 (2010)

  20. Shahab, A., Shafait, F., Kieninger, T., et al.: An open approach towards the benchmarkingof table structure recognition systems. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 113–120 (2010)

  21. Sun, N., Zhu, Y., Hu, X.: Faster R-CNN based table detection combining corner locating. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1314–1319 (2019)

  22. Gao, L., Yi, X., Jiang, Z., et al.: ICDAR 2017 Competition on Page Object Detection. In: 2017 14th International Conference on Document Analysis and Recognition (ICDAR), pp. 1417–1422 (2017)

  23. Huang, Y., Yan, Q., Li, Y., et al.: A YOLO-based table detection method. In 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 813–818 (2019)

  24. Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  25. Zhang, X., Bai, Y., Wei, N., et al.: Cloud computer research on table detection model based on the DC-LSTM model. J. Phys. Conf. Ser. 1927(1), 012004 (2021)

    Article  Google Scholar 

  26. Li, M., Cui, L., Huang, S., et al.: TableBank: Table Benchmark for Image-based Table Detection and Recognition. In: Proceedings of the 12th Language Resources and Evaluation (2020)

  27. Riba, P., Dutta, A., Goldmann, L., et al.: Table detection in invoice documents by graph neural networks. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 122–127 (2019)

  28. Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 991–995 (2015)

  29. Li, D., Hu, J., Wang, C., et al.: Involution: Inverting the inherence of convolution for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12321–12330 (2021)

  30. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

  31. Lin, T. Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

  32. Chen, Q., Wang, Y., Yang, T., et al.: You only look one-level feature. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13039–13048 (2021)

  33. Yu, F., Koltun, V: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016)

  34. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  35. Zhang, S., Chi, C., Yao, Y., et al.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9759–9768 (2020)

  36. Rezatofighi, H., Tsoi, N., Gwak, J., et al.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)

  37. Khan, U., Zahid, S., Ali, M. A., et al.: TabAug: data driven augmentation for enhanced table structure recognition. In: International Conference on Document Analysis and Recognition, pp. 585–601 (2021)

  38. Shepley, A., Falzon, G., Kwan, P.: Confluence: A robust non-IoU alternative to non-maxima suppression in object detection. arXiv preprint arXiv:2012.00257 (2020)

  39. Neubeck, A., Van Gool, L: Efficient non-maximum suppression. In: 18th International Conference on Pattern Recognition (ICPR’06), vol. 3, pp. 850–855 (2006)

  40. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  41. Schreiber, S., Agne, S., Wolf, I., et al.: Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1162–1167 (2017)

  42. Tran, D. N., Tran, T. A., Oh, A., et al.: Table detection from document image using vertical arrangement of text blocks. Int. J. Contents 77–85 (2015)

  43. Hao, L., Gao, L., Yi, X., et al.: A table detection method for pdf documents based on convolutional neural networks. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 287–292 (2016)

  44. Prasad, D., Gadpal, A., Kapadni, K., et al.: ascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 572–573 (2020)

  45. SNazir, D., Hashmi, K. A., Pagani, A., et al.: HybridTabNet: Towards better table detection in scanned document images. Appl. Sci. 11(18), 8396 (2021)

  46. Zheng, X., Burdick, D., Popa, L., et al.: Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 697–706 (2021)

  47. Li, J., Xu, Y., Lv, T., et al.: DiT: Self-supervised Pre-training for Document Image Transformer. arXiv preprint arXiv:2203.02378 (2022)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daqian Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, D., Mao, R., Guo, R. et al. YOLO-table: disclosure document table detection with involution. IJDAR (2022). https://doi.org/10.1007/s10032-022-00400-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10032-022-00400-z

Keywords

  • Table detection
  • Involution
  • FPNs
  • Deep learning