IIIT-AR-13K: A New Dataset for Graphical Object Detection in Documents

Mondal, Ajoy; Lipps, Peter; Jawahar, C. V.

doi:10.1007/978-3-030-57058-3_16

Ajoy Mondal¹¹,
Peter Lipps¹² &
C. V. Jawahar¹¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12116))

Included in the following conference series:

International Workshop on Document Analysis Systems

1443 Accesses
17 Citations

Abstract

We introduce a new dataset for graphical object detection in business documents, more specifically annual reports. This dataset, iiit-ar-13k, is created by manually annotating the bounding boxes of graphical or page objects in publicly available annual reports. This dataset contains a total of 13k annotated page images with objects in five different popular categories—table, figure, natural image, logo, and signature. It is the largest manually annotated dataset for graphical object detection. Annual reports created in multiple languages for several years from various companies bring high diversity into this dataset. We benchmark iiit-ar-13k dataset with two state of the art graphical object detection techniques using faster r-cnn [20] and mask r-cnn [11] and establish high baselines for further research. Our dataset is highly effective as training data for developing practical solutions for graphical object detection in both business documents and technical articles. By training with iiit-ar-13k, we demonstrate the feasibility of a single solution that can report superior performance compared to the equivalent ones trained with a much larger amount of data, for table detection. We hope that our dataset helps in advancing the research for detecting various types of graphical objects in business documents (http://cvit.iiit.ac.in/usodi/iiitar13k.php).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

TACTFUL: A Framework for Targeted Active Learning for Document Analysis

DeMatch: Towards Understanding the Panel of Chart Documents

Segmentation for document layout analysis: not dead yet

Article 13 January 2022

References

Abdulla, W.: Mask R-CNN for object detection and instance segmentation on Keras and Tensorflow. GitHub repository (2017)
Google Scholar
Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.L.: Complicated table structure recognition. arXiv (2019)
Google Scholar
Deng, Y., Rosenberg, D., Mann, G.: Challenges in end-to-end neural scientific table recognition. In: ICDAR (2019)
Google Scholar
Fang, J., Tao, X., Tang, Z., Qiu, R., Liu, Y.: Dataset, ground-truth and performance metrics for table detection evaluation. In: WDAS (2012)
Google Scholar
Gao, L., Yi, X., Jiang, Z., Hao, L., Tang, Z.: ICDAR 2017 competition on page object detection. In: ICDAR (2017)
Google Scholar
Gao, L., et al.: ICDAR 2019 competition on table detection and recognition (cTDaR). In: ICDAR (2019)
Google Scholar
Gilani, A., Qasim, S.R., Malik, I., Shafait, F.: Table detection using deep learning. In: ICDAR (2017)
Google Scholar
Göbel, M., Hassan, T., Oro, E., Orsi, G.: ICDAR 2013 table competition. In: ICDAR (2013)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)
Google Scholar
Hao, L., Gao, L., Yi, X., Tang, Z.: A table detection method for PDF documents based on convolutional neural networks. In: Workshop on DAS (2016)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Huang, Y., et al.: A YOLO-based table detection method. In: ICDAR (2019)
Google Scholar
Kavasidis, I., et al.: A saliency-based convolutional neural network for table and chart detection in digitized documents. In: Ricci, E., Rota Bulò, S., Snoek, C., Lanz, O., Messelodi, S., Sebe, N. (eds.) ICIAP 2019. LNCS, vol. 11752, pp. 292–302. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30645-8_27
Chapter Google Scholar
Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: TableBank: table benchmark for image-based table detection and recognition. In: ICDAR (2019)
Google Scholar
Li, Y., Yan, Q., Huang, Y., Gao, L., Tang, Z.: A GAN-based feature generator for table detection. In: ICDAR (2019)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Melinda, L., Bhagvati, C.: Parameter-free table detection method. In: ICDAR (2019)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Saha, R., Mondal, A., Jawahar, C.V.: Graphical object detection in document images. In: ICDAR (2019)
Google Scholar
Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: deep learning for detection and structure recognition of tables in document images. In: ICDAR (2017)
Google Scholar
Shahab, A., Shafait, F., Kieninger, T., Dengel, A.: An open approach towards the benchmarking of table structure recognition systems. In: DAS (2010)
Google Scholar
Siddiqui, S.A., Malik, M.I., Agne, S., Dengel, A., Ahmed, S.: DeCNT: deep deformable CNN for table detection. IEEE Access 6, 74151–74161 (2018)
Article Google Scholar
Siegel, N., Lourie, N., Power, R., Ammar, W.: Extracting scientific figures with distantly supervised neural networks. In: ACM/IEEE on Joint Conference on Digital Libraries (2018)
Google Scholar
Sun, N., Zhu, Y., Hu, X.: Faster R-CNN based table detection combining corner locating. In: ICDAR (2019)
Google Scholar
Tran, D.N., Tran, T.A., Oh, A., Kim, S.H., Na, I.S.: Table detection from document image using vertical arrangement of text blocks. Int. J. Contents 11, 77–85 (2015)
Article Google Scholar
Yang, J., Lu, J., Batra, D., Parikh, D.: A faster Pytorch implementation of faster R-CNN (2017)
Google Scholar
Zhong, X., ShafieiBavani, E., Yepes, A.J.: Image-based table recognition: data, model, and evaluation. arXiv (2019)
Google Scholar
Zhong, X., Tang, J., Yepes, A.J.: PubLayNet: largest dataset ever for document layout analysis. In: ICDAR (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Visual Information Technology, International Institute of Information Technology, Hyderabad, India
Ajoy Mondal & C. V. Jawahar
Open Text Software GmbH, Grasbrunn/Munich, Germany
Peter Lipps

Authors

Ajoy Mondal
View author publications
You can also search for this author in PubMed Google Scholar
Peter Lipps
View author publications
You can also search for this author in PubMed Google Scholar
C. V. Jawahar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ajoy Mondal .

Editor information

Editors and Affiliations

Huazhong University of Science and Technology, Wuhan, China
Xiang Bai
Autonomous University of Barcelona, Barcelona, Spain
Dimosthenis Karatzas
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mondal, A., Lipps, P., Jawahar, C.V. (2020). IIIT-AR-13K: A New Dataset for Graphical Object Detection in Documents. In: Bai, X., Karatzas, D., Lopresti, D. (eds) Document Analysis Systems. DAS 2020. Lecture Notes in Computer Science(), vol 12116. Springer, Cham. https://doi.org/10.1007/978-3-030-57058-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-57058-3_16
Published: 14 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57057-6
Online ISBN: 978-3-030-57058-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

IIIT-AR-13K: A New Dataset for Graphical Object Detection in Documents

Abstract

Access this chapter

Similar content being viewed by others

TACTFUL: A Framework for Targeted Active Learning for Document Analysis

DeMatch: Towards Understanding the Panel of Chart Documents

Segmentation for document layout analysis: not dead yet

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

IIIT-AR-13K: A New Dataset for Graphical Object Detection in Documents

Abstract

Access this chapter

Similar content being viewed by others

TACTFUL: A Framework for Targeted Active Learning for Document Analysis

DeMatch: Towards Understanding the Panel of Chart Documents

Segmentation for document layout analysis: not dead yet

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation