Abstract
This paper presents a large scale general purpose image database with human annotated ground truth. Firstly, an all-in-all labeling framework is proposed to group visual knowledge of three levels: scene level (global geometric description), object level (segmentation, sketch representation, hierarchical decomposition), and low-mid level (2.1D layered representation, object boundary attributes, curve completion, etc.). Much of this data has not appeared in previous databases. In addition, And-Or Graph is used to organize visual elements to facilitate top-down labeling. An annotation tool is developed to realize and integrate all tasks. With this tool, we’ve been able to create a database consisting of more than 636,748 annotated images and video frames. Lastly, the data is organized into 13 common subsets to serve as benchmarks for diverse evaluation endeavors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barnard, K., Fan, Q., et al.: Evaluation of localized semantics: Data, methodology, and experiments. University of Arizona, Computing Science, Technical Report,TR-05-08. (September 2005)
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Recognition and Machine Intelligence, 509–522 (April 2002)
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 11, 1222–1239 (2001)
Chen, H., Xu, Z.J., Zhu, S.: Composite templates for cloth modeling and sketching. In: CVPR 2006, pp. 943–950 (2006)
Cootes, T.F., Taylor, C.J.: Active appearance models. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, Springer, Heidelberg (1998)
Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Recognition and Machine Intelligence, pp. 594–611 (April 2006)
Griffin, G., Holub, A., Perona, P.: The caltech 256. Caltech Technical Report
Guo, C., Zhu, S., Wu, Y.: Primal sketch: Integrating texture and structure. Computer Vision and Image Understanding (2006)
Martin, D., Fowlkes, C., et al.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: ICCV 2001, p. 416 (2001)
Miller, F.C., Tengi, R., Wakefield, P., et al.: Wordnet - a lexical database for english (1990)
Russel, B.C., Torralba, A., Murphy, K.P.: Labelme: a database and web-based tool for image annotation, M.I.T., C.S. and A.I. Lab Techinical Report, MIT-CSAIL-TR-2005-056 (September 2005)
Tu, Z., Chen, X., Yuille, A.L., Zhu, S.-C.: Image parsing: Unifying segmentation, detection and recognition. Int’l. J. of Computer Vision, Marr Prize Issue (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yao, B., Yang, X., Zhu, SC. (2007). Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks. In: Yuille, A.L., Zhu, SC., Cremers, D., Wang, Y. (eds) Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2007. Lecture Notes in Computer Science, vol 4679. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74198-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-74198-5_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74195-4
Online ISBN: 978-3-540-74198-5
eBook Packages: Computer ScienceComputer Science (R0)