Line Graphics Digitization: A Step Towards Full Automation

Moured, Omar; Zhang, Jiaming; Roitberg, Alina; Schwarz, Thorsten; Stiefelhagen, Rainer

doi:10.1007/978-3-031-41734-4_27

Omar Moured^11,12,
Jiaming Zhang¹¹,
Alina Roitberg¹¹,
Thorsten Schwarz¹² &
…
Rainer Stiefelhagen^11,12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14191))

Included in the following conference series:

International Conference on Document Analysis and Recognition

734 Accesses

Abstract

The digitization of documents allows for wider accessibility and reproducibility. While automatic digitization of document layout and text content has been a long-standing focus of research, this problem in regard to graphical elements, such as statistical plots, has been under-explored. In this paper, we introduce the task of fine-grained visual understanding of mathematical graphics and present the Line Graphics (LG) dataset, which includes pixel-wise annotations of 5 coarse and 10 fine-grained categories. Our dataset covers 520 images of mathematical graphics collected from 450 documents from different disciplines. Our proposed dataset can support two different computer vision tasks, i.e., semantic segmentation and object detection. To benchmark our LG dataset, we explore 7 state-of-the-art models. To foster further research on the digitization of statistical graphs, we will make the dataset, code and models publicly available to the community.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Amin, A., Shiu, R.: Page segmentation and classification utilizing bottom-up approach. Int. J. Image Graph. 1(02), 345–361 (2001)
Article Google Scholar
Bajić, F., Orel, O., Habijan, M.: A multi-purpose shallow convolutional neural network for chart images. Sensors 22(20), 7695 (2022)
Article Google Scholar
Breuel, T.M.: Robust, simple page segmentation using hybrid convolutional MDLSTM networks. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 733–740. IEEE (2017)
Google Scholar
Chen, K., Seuret, M., Hennebert, J., Ingold, R.: Convolutional neural networks for page segmentation of historical document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 965–970. IEEE (2017)
Google Scholar
Chen, K., Seuret, M., Liwicki, M., Hennebert, J., Ingold, R.: Page segmentation of historical document images with convolutional autoencoders. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1011–1015. IEEE (2015)
Google Scholar
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with Atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49
Chapter Google Scholar
Chintalapati, S., Bragg, J., Wang, L.L.: A dataset of alt texts from HCI publications: analyses and uses towards producing more descriptive alt texts of data visualizations in scientific papers. arXiv preprint arXiv:2209.13718 (2022)
Choi, J., Jung, S., Park, D.G., Choo, J., Elmqvist, N.: Visualizing for the non-visual: enabling the visually impaired to use visualization. In: Computer Graphics Forum, pp. 249–260. Wiley Online Library (2019)
Google Scholar
Clark, C., Divvala, S.: PDFFigures 2.0: mining figures from research papers. In: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, pp. 143–152 (2016)
Google Scholar
Clausner, C., Antonacopoulos, A., Pletschacher, S.: ICDAR 2017 competition on recognition of documents with complex layouts-RDCL2017. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1404–1410. IEEE (2017)
Google Scholar
Dai, W., Wang, M., Niu, Z., Zhang, J.: Chart decoder: generating textual and numeric information from chart images automatically. J. Vis. Lang. Comput. 48, 101–109 (2018)
Article Google Scholar
Davila, K., et al.: ICDAR 2019 competition on harvesting raw tables from infographics (chart-infographics). In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1594–1599. IEEE (2019)
Google Scholar
Davila, K., Setlur, S., Doermann, D., Kota, B.U., Govindaraju, V.: Chart mining: a survey of methods for automated chart analysis. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 3799–3819 (2020)
Article Google Scholar
Davila, K., Tensmeyer, C., Shekhar, S., Singh, H., Setlur, S., Govindaraju, V.: ICPR 2020 - competition on harvesting raw tables from infographics. In: Del Bimbo, A., et al. (eds.) ICPR 2021. LNCS, vol. 12668, pp. 361–380. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68793-9_27
Chapter Google Scholar
Drivas, D., Amin, A.: Page segmentation and classification utilising a bottom-up approach. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 2, pp. 610–614. IEEE (1995)
Google Scholar
Guo, M.H., Lu, C.Z., Hou, Q., Liu, Z., Cheng, M.M., Hu, S.M.: SegNeXt: rethinking convolutional attention design for semantic segmentation. arXiv preprint arXiv:2209.08575 (2022)
Ha, J., Haralick, R.M., Phillips, I.T.: Document page decomposition by the bounding-box project. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 2, pp. 1119–1122. IEEE (1995)
Google Scholar
Haurilet, M., Al-Halah, Z., Stiefelhagen, R.: SPaSe-multi-label page segmentation for presentation slides. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 726–734. IEEE (2019)
Google Scholar
Haurilet, M., Roitberg, A., Martinez, M., Stiefelhagen, R.: Wise-slide segmentation in the wild. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 343–348. IEEE (2019)
Google Scholar
Howard, A., et al.: Searching for MobileNetV3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Google Scholar
Huang, Z., et al.: ICDAR 2019 Competition On Scanned Receipt OCR and information extraction. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1516–1520. IEEE (2019)
Google Scholar
Jobin, K., Mondal, A., Jawahar, C.: DocFigure: a dataset for scientific document figure classification. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 1, pp. 74–79. IEEE (2019)
Google Scholar
Keefer, R., Bourbakis, N.: From image to XML: monitoring a page layout analysis approach for the visually impaired. Int. J. Monit. Surveill. Technol. Res. (IJMSTR) 2(1), 22–43 (2014)
Google Scholar
Li, P., Jiang, X., Shatkay, H.: Figure and caption extraction from biomedical documents. Bioinformatics 35(21), 4381–4388 (2019)
Article Google Scholar
Liu, X., Klabjan, D., NBless, P.: Data extraction from charts via single deep neural network. arXiv preprint arXiv:1906.11906 (2019)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV (2021)
Google Scholar
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_2
Chapter Google Scholar
Methani, N., Ganguly, P., Khapra, M.M., Kumar, P.: PlotQA: reasoning over scientific plots. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1527–1536 (2020)
Google Scholar
Poco, J., Heer, J.: Reverse-engineering visualizations: recovering visual encodings from chart images. In: Computer Graphics Forum, pp. 353–363. Wiley Online Library (2017)
Google Scholar
Seweryn, K., Lorenc, K., Wróblewska, A., Sysko-Romańczuk, S.: What will you tell me about the chart? – automated description of charts. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds.) ICONIP 2021. CCIS, vol. 1516, pp. 12–19. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-92307-5_2
Chapter Google Scholar
Siegel, N., Lourie, N., Power, R., Ammar, W.: Extracting scientific figures with distantly supervised neural networks. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, pp. 223–232 (2018)
Google Scholar
Wang, J., et al.: Deep high-resolution representation learning for visual recognition. TPAMI 43, 3349–3364 (2021)
Google Scholar
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: SegFormer: simple and efficient design for semantic segmentation with transformers. In: NeurIPS (2021)
Google Scholar
Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D., Lee Giles, C.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5315–5324 (2017)
Google Scholar
Yoshitake, M., Kono, T., Kadohira, T.: Program for automatic numerical conversion of a line graph (line plot). J. Comput. Chem. Jpn. 19(2), 25–35 (2020)
Article Google Scholar
Zhang, J., Ma, C., Yang, K., Roitberg, A., Peng, K., Stiefelhagen, R.: Transfer beyond the field of view: dense panoramic semantic segmentation via unsupervised domain adaptation. IEEE Trans. Intell. Transp. Syst. 23(7), 9478–9491 (2021)
Article Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie Grant No. 861166, in part by the Ministry of Science, Research and the Arts of Baden-Württemberg (MWK) through the Cooperative Graduate School Accessibility through AI-based Assistive Technology (KATE) under Grant BW6-03, and in part by the Federal Ministry of Education and Research (BMBF) through a fellowship within the IFI programme of the German Academic Exchange Service (DAAD). This work was partially performed on the HoreKa supercomputer funded by the MWK and by the Federal Ministry of Education and Research.

Author information

Authors and Affiliations

CV:HCI Lab, Karlsruhe Institute of Technology, Karlsruhe, Germany
Omar Moured, Jiaming Zhang, Alina Roitberg & Rainer Stiefelhagen
ACCESS@KIT, Karlsruhe Institute of Technology, Karlsruhe, Germany
Omar Moured, Thorsten Schwarz & Rainer Stiefelhagen

Authors

Omar Moured
View author publications
You can also search for this author in PubMed Google Scholar
Jiaming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Alina Roitberg
View author publications
You can also search for this author in PubMed Google Scholar
Thorsten Schwarz
View author publications
You can also search for this author in PubMed Google Scholar
Rainer Stiefelhagen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Omar Moured .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moured, O., Zhang, J., Roitberg, A., Schwarz, T., Stiefelhagen, R. (2023). Line Graphics Digitization: A Step Towards Full Automation. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-41734-4_27
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41733-7
Online ISBN: 978-3-031-41734-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)