Abstract
Data extraction from line-chart images is an essential component of the automated document understanding process, as line charts are a ubiquitous data visualization format. However, the amount of visual and structural variations in multi-line graphs makes them particularly challenging for automated parsing. Existing works, however, are not robust to all these variations, either taking an all-chart unified approach or relying on auxiliary information such as legends for line data extraction. In this work, we propose LineFormer, a robust approach to line data extraction using instance segmentation. We achieve state-of-the-art performance on several benchmark synthetic and real chart datasets. Our implementation is available at https://github.com/TheJaeLal/LineFormer.
A. Mitkari and Mahesh Bhosale—Joint Second Authorship.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bresenham, J.E.: Algorithm for computer control of a digital plotter. IBM Syst. J. 4(1), 25–30 (1965)
Chen, K., et al.: MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1290–1299 (2022)
Chester, D., Elzer, S.: Getting computers to see information graphics so users do not have to. In: Hacid, M.-S., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds.) ISMIS 2005. LNCS (LNAI), vol. 3488, pp. 660–668. Springer, Heidelberg (2005). https://doi.org/10.1007/11425274_68
Davila, K., et al.: ICDAR 2019 Competition on harvesting raw tables from infographics (CHART-infographics). In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1594–1599 (Sep 2019), ISSN: 2379–2140
Davila, K., Setlur, S., Doermann, D., Kota, B.U., Govindaraju, V.: Chart mining: a survey of methods for automated chart analysis. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 3799–3819 (2021)
Davila, K., Tensmeyer, C., Shekhar, S., Singh, H., Setlur, S., Govindaraju, V.: ICPR 2020 - competition on harvesting raw tables from infographics. In: Del Bimbo, A., et al. (eds.) ICPR 2021. LNCS, vol. 12668, pp. 361–380. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68793-9_27
Davila, K., Xu, F., Ahmed, S., Mendoza, D.A., Setlur, S., Govindaraju, V.: ICPR 2022: challenge on harvesting raw tables from infographics (CHART-infographics). In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 4995–5001 (Aug 2022), ISSN: 2831–7475
De Brabandere, B., Neven, D., Van Gool, L.: Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551 (2017)
Demir, S., Carberry, S., McCoy, K.F.: Summarizing information graphics textually. Comput. Linguist. 38(3), 527–574 (2012)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hoque, E., Kavehzadeh, P., Masry, A.: Chart question answering: State of the art and future directions. In: Computer Graphics Forum, vol. 41, pp. 555–572. Wiley Online Library (2022)
Kanthara, S., et al.: Chart-to-text: A large-scale benchmark for chart summarization. arXiv preprint arXiv:2203.06486 (2022)
Kato, H., Nakazawa, M., Yang, H.K., Chen, M., Stenger, B.: Parsing Line Chart Images Using Linear Programming, pp. 2109–2118 (2022)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Luo, J., Li, Z., Wang, J., Lin, C.Y.: ChartOCR: Data Extraction From Charts Images via a Deep Hybrid Framework, pp. 1917–1925 (2021)
Ma, W., et al.: Towards an efficient framework for data extraction from chart images. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 583–597. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_37
Masry, A., Long, D.X., Tan, J.Q., Joty, S., Hoque, E.: ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning, arXiv:2203.10244 (Mar 2022) [cs]
Mei, H., Ma, Y., Wei, Y., Chen, W.: The design space of construction tools for information visualization: a survey. J. Vis. Lang. Comput. 44, 120–132 (2018)
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. Ieee (2016)
Molla, M.K.I., Talukder, K.H., Hossain, M.A.: Line chart recognition and data extraction technique. In: Liu, J., Cheung, Y., Yin, H. (eds.) IDEAL 2003. LNCS, vol. 2690, pp. 865–870. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45080-1_120
Nair, R.R., Sankaran, N., Nwogu, I., Govindaraju, V.: Automated analysis of line plots in documents. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 796–800. IEEE (2015)
Neven, D., De Brabandere, B., Georgoulis, S., Proesmans, M., Van Gool, L.: Towards end-to-end lane detection: an instance segmentation approach. In: 2018 IEEE intelligent Vehicles Symposium (IV), pp. 286–291. IEEE (2018)
Shivasankaran, V.P., Hassan, M.Y., Singh, M.: LineEX: Data Extraction From Scientific Line Charts, pp. 6213–6221 (2023)
Ray Choudhury, S., Giles, C.L.: An architecture for information extraction from figures in digital libraries. In: Proceedings of the 24th International Conference on World Wide Web, pp. 667–672 (2015)
Ray Choudhury, S., Wang, S., Giles, C.L.: Curve separation for line graphs in scholarly documents. In: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, pp. 277–278 (2016)
Siegel, N., Horvitz, Z., Levin, R., Divvala, S., Farhadi, A.: FigureSeer: parsing result-figures in research papers. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 664–680. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_41
Wang, Y., et al.: End-to-end video instance segmentation with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8741–8750 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lal, J., Mitkari, A., Bhosale, M., Doermann, D. (2023). LineFormer: Line Chart Data Extraction Using Instance Segmentation. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-41734-4_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41733-7
Online ISBN: 978-3-031-41734-4
eBook Packages: Computer ScienceComputer Science (R0)