Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks

  • Benjamin Yao
  • Xiong Yang
  • Song-Chun Zhu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4679)


This paper presents a large scale general purpose image database with human annotated ground truth. Firstly, an all-in-all labeling framework is proposed to group visual knowledge of three levels: scene level (global geometric description), object level (segmentation, sketch representation, hierarchical decomposition), and low-mid level (2.1D layered representation, object boundary attributes, curve completion, etc.). Much of this data has not appeared in previous databases. In addition, And-Or Graph is used to organize visual elements to facilitate top-down labeling. An annotation tool is developed to realize and integrate all tasks. With this tool, we’ve been able to create a database consisting of more than 636,748 annotated images and video frames. Lastly, the data is organized into 13 common subsets to serve as benchmarks for diverse evaluation endeavors.


Ground truth Annotation Image database Benchmark Sketch representation Top-down/Bottom-up Labeling 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barnard, K., Fan, Q., et al.: Evaluation of localized semantics: Data, methodology, and experiments. University of Arizona, Computing Science, Technical Report,TR-05-08. (September 2005)Google Scholar
  2. 2.
    Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Recognition and Machine Intelligence, 509–522 (April 2002)Google Scholar
  3. 3.
    Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 11, 1222–1239 (2001)CrossRefGoogle Scholar
  4. 4.
    Chen, H., Xu, Z.J., Zhu, S.: Composite templates for cloth modeling and sketching. In: CVPR 2006, pp. 943–950 (2006)Google Scholar
  5. 5.
    Cootes, T.F., Taylor, C.J.: Active appearance models. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, Springer, Heidelberg (1998)CrossRefGoogle Scholar
  6. 6.
    Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Recognition and Machine Intelligence, pp. 594–611 (April 2006)Google Scholar
  7. 7.
    Griffin, G., Holub, A., Perona, P.: The caltech 256. Caltech Technical ReportGoogle Scholar
  8. 8.
    Guo, C., Zhu, S., Wu, Y.: Primal sketch: Integrating texture and structure. Computer Vision and Image Understanding (2006)Google Scholar
  9. 9.
    Martin, D., Fowlkes, C., et al.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: ICCV 2001, p. 416 (2001)Google Scholar
  10. 10.
    Miller, F.C., Tengi, R., Wakefield, P., et al.: Wordnet - a lexical database for english (1990)Google Scholar
  11. 11.
    Russel, B.C., Torralba, A., Murphy, K.P.: Labelme: a database and web-based tool for image annotation, M.I.T., C.S. and A.I. Lab Techinical Report, MIT-CSAIL-TR-2005-056 (September 2005)Google Scholar
  12. 12.
    Tu, Z., Chen, X., Yuille, A.L., Zhu, S.-C.: Image parsing: Unifying segmentation, detection and recognition. Int’l. J. of Computer Vision, Marr Prize Issue (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Benjamin Yao
    • 1
  • Xiong Yang
    • 1
  • Song-Chun Zhu
    • 1
  1. 1.Lotus Hill Institute of Computer Vision and Information Sciences, EZhou City, HuBei ProvinceP.R. China

Personalised recommendations