Semi-automatic Ground Truth Generation for Chart Image Recognition

  • Li Yang
  • Weihua Huang
  • Chew Lim Tan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3872)


While research on scientific chart recognition is being carried out, there is no suitable standard that can be used to evaluate the overall performance of the chart recognition results. In this paper, a system for semi-automatic chart ground truth generation is introduced. Using the system, the user is able to extract multiple levels of ground truth data. The role of the user is to perform verification and correction and to input values where necessary. The system carries out automatic tasks such as text blocks detection and line detection etc. It can effectively reduce the time to generate ground truth data, comparing to full manual processing. We experimented the system using 115 images. The images and ground truth data generated are available to the public.


Ground Truth Feature Point Text Component Ground Truth Data Line Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Futrelle, R.P., Kakadiaris, I.A., Alexander, J., Carriero, C.M., Nikolakis, N., Futrelle, J.M.: Understanding diagrams in technical documents. IEEE Computer 25, 75–78 (1992)Google Scholar
  2. 2.
    Yokokura, N., Watanabe, T.: Layout-Based Approach for extracting constructive elements of bar-charts, Graphics recognition: algorithms and systems. In: Chhabra, A.K., Tombre, K. (eds.) GREC 1997. LNCS, vol. 1389, pp. 163–174. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  3. 3.
    Zhou, Y.P., Tan, C.L.: Hough technique for bar charts detection and recognition in document images. In: International Conference on Image Processing, ICIP 2000, pp. 494–497 (2000)Google Scholar
  4. 4.
    Zhou, Y.P., Tan, C.L.: Learning-based scientific chart recognition. In: 4th IAPR International Workshop on Graphics Recognition, GREC 2001, pp. 482–492 (2001)Google Scholar
  5. 5.
    Huang, W.H., Tan, C.L., Leow, W.K.: Model based chart image recognition. In: Lladós, J., Kwon, Y.-B. (eds.) GREC 2003. LNCS, vol. 3088, pp. 87–99. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Liu, W., Dori, D.: A protocol for performance evaluation of line detection algorithms. Machine Vision and Applications 9, 240–250 (1997)CrossRefGoogle Scholar
  7. 7.
    Wang, Y., Haralick, R.M., Phillips, I.T.: Automatic Table Ground Truth Generation and a Background-Analysis-Based Table Structure Extraction Method. In: ICDAR 2001, pp. 528–532 (2001)Google Scholar
  8. 8.
    Zi, G., Doermann, D.: Document Image Ground Truth Generation from Electronic Text. In: 17th International Conference on Pattern Recognition, ICPR 2004, vol. 2, pp. 663–666 (2004)Google Scholar
  9. 9.
    Yacoub, S., Saxena, V., Sami, S.: PerfectDoc: A Ground Truthing Environment for Complex Documents. In: 8th International Conference on Document Analysis and Recognition, ICDAR 2005, vol. 1, pp. 452–456 (2005)Google Scholar
  10. 10.
    Yuan, B., Tan, C.L.: A Multi-level Component Grouping Algorithm and Its Applications. In: 8th International Conference on Document Analysis and Recognition, ICDAR 2005, pp. 1178–1181 (2005)Google Scholar
  11. 11.
    Liu, W., Dori, D.: Sparse Pixel Vectorization: An Algorithm and Its Performance Evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 202–215 (1999)CrossRefGoogle Scholar
  12. 12.
    Dori, D., Liu, W.: Incremental Arc Segmentation Algorithm and Its Evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 424–431 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Li Yang
    • 1
  • Weihua Huang
    • 1
  • Chew Lim Tan
    • 1
  1. 1.School of ComputingNational University of SingaporeSingapore

Personalised recommendations