Chart-Type Classification Using Convolutional Neural Network for Scholarly Figures

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12047)


Text-to-speech conversion by smart speakers is expected to assist visually handicapped people who are near total blindness to read documents. This research supposes a situation where such a text-to-speech conversion is applied to scholarly documents. Usually, a page in scholarly documents consists of multiple regions, i.e. ordinary text, mathematical expressions, tables, and figures. In this paper, we propose a method which classifys chart-type of scholarly figures using a convolutional neural network. The method classifies an input figure image into line charts or others. We evaluated the accuracy of the method using scholarly figures dataset collected from actual academic papers. The classification accuracy of the proposed method achieved 97%. We also compared the performance of the proposed method with that of hand-crafted features and support vector machine. The results suggest that the proposed CNN classification outperforms the conventional approach.


Chart Image recognition Image classification Document recognition 


  1. 1.
    Nagy, G., Seth, S., Viswanathan, M.: A prototype document image analysis system for technical journals. Computer 25(7), 10–22 (1992)CrossRefGoogle Scholar
  2. 2.
    Namboodiri, A.M., Jain, A.K.: Document structure and layout analysis. In: Chaudhuri, B.B. (ed.) Digital Document Processing. Advances in Pattern Recognition, pp. 29–48. Springer, London (2007). Scholar
  3. 3.
    Vinyals, O., et al.: Show and tell: a neural image caption generator. In: CVPR, pp. 3156–3164, June 2015Google Scholar
  4. 4.
    You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4651–4659 (2016)Google Scholar
  5. 5.
    Kieninger, T.G.: Table structure recognition based on robust block segmentation. In: Document Recognition V, vol. 3305. International Society for Optics and Photonics (1998)Google Scholar
  6. 6.
    Jie, Z., et al.: Tree-structured reinforcement learning for sequential object localization. arXiv preprint arXiv: 1703.02710v1 (2017)
  7. 7.
    Kendall A., Gal, Y.: What uncertaintie do we need in bayesian deep learning for computer vision? arXiv preprint arXiv: 1703.04977v2 (2017)
  8. 8.
    Meier, R., et al.: Perturb-and-MPM: quantifying segmentation uncertainty in dense multi-label CRFs. arXiv preprint arXiv:1703.00312 (2017)
  9. 9.
    Siam, M., Singh, A., Perez, C., Jagersand, M.: 4-DoF tracking for robot fine manipulation tasks. arXiv preprint arXiv:1703.01698v2 (2017)
  10. 10.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)Google Scholar
  11. 11.
    Kulkarni, G., et al.: BabyTalk: understanding and generating simple image descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2891–2903 (2013)CrossRefGoogle Scholar
  12. 12.
    Elliott, D., Keller, F.: Image description using visual dependency representations. In: EMNLP, pp. 1292–1302 (2013)Google Scholar
  13. 13.
    Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 (2014)
  14. 14.
    Huang, W., Tan, C.L.: A system for understanding imaged infographics and its applications. In: Proceedings of the 2007 ACM symposium on Document engineering. ACM (2007)Google Scholar
  15. 15.
    Tang, B., et al.: DeepChart: combining deep convolutional networks and deep belief networks in chart classification. Signal Process. 124, 156–161 (2016)CrossRefGoogle Scholar
  16. 16.
    Savva, M., et al.: Revision: automated classification, analysis and redesign of chart images. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology. ACM (2011)Google Scholar
  17. 17.
    Shi, M., Fujisawa, Y., Wakabayashi, T., Kimura, F.: Handwritten numeral recognition using gradient and curvature of gray scale image. Pattern Recogn. 35(10), 2051–2059 (2002)CrossRefGoogle Scholar
  18. 18.
    arXiv Accessed 16 Jan 2019
  19. 19.
    Elouard, C., et al.: Extracting work from quantum measurement in Maxwell demon engines. arXiv preprint arXiv:1702.01970v1 (2017)
  20. 20.
    Kemker, R., Salvaggio, C., Kanan, C.: Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning. arXiv preprint arXiv: 1703.06452v3 (2017)
  21. 21.
    Yang, S.: Propensity score weighting for causal inference with clustered data. arXiv preprint arXiv: 1703.06086v4 (2017)
  22. 22.
    Chen, L., Tang, W., John, N.W., Wan, T.R., Zhang, J.J.: Augmented reality for depth cues in monocular minimally invasive surgery. arXiv preprint arXiv: 1703.01243v1 (2017)

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Mie UniversityTsuJapan
  2. 2.Saitama Institute of TechnologyFukayaJapan

Personalised recommendations