Abstract
We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a web-based tool that allows easy image annotation and instant sharing of such annotations. Using this annotation tool, we have collected a large dataset that spans many object categories, often containing multiple instances over a wide variety of images. We quantify the contents of the dataset and compare against existing state of the art datasets used for object recognition and detection. Also, we show how to extend the dataset to automatically enhance object labels with WordNet, discover object parts, recover a depth ordering of objects in a scene, and increase the number of labels using minimal user supervision and images from the web.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abramson, Y., & Freund, Y. (2005). Semi-automatic visual learning (seville): a tutorial on active learning for visual object recognition. In International conference on computer vision and pattern recognition (CVPR’05), San Diego.
Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490.
Berg, T. L., & Forsyth, D. A. (2006). Animals on the web. In CVPR (Vol. 2, pp. 1463–1470).
Biederman, I. (1987). Recognition by components: a theory of human image interpretation. Pyschological Review, 94, 115–147.
Bileschi, S. (2006). CBCL streetscenes (Technical report). MIT CBCL. The CBCL-Streetscenes dataset can be downloaded at http://cbcl.mit.edu/software-datasets.
Burianek, J., Ahmadyfard, A., & Kittler, J. (2000). Soil-47, the Surrey object image library. http://www.ee.surrey.ac.uk/Research/VSSP/demos/colour/soil47/.
Carmichael, O., & Hebert, M. (2004). Word: Wiry object recognition database. Carnegie Mellon University. www.cs.cmu.edu/~owenc/word.htm. Accessed January 2004.
Everingham, M., Zisserman, A., Williams, C., Van Gool, L., Allan, M., Bishop, C., Chapelle, O., Dalal, N., Deselaers, T., Dorko, G., Duffner, S., Eichhorn, J., Farquhar, J., Fritz, M., Garcia, C., Griffiths, T., Jurie, F., Keysers, D., Koskela, M., Laaksonen, J., Larlus, D., Leibe, B., Meng, H., Ney, H., Schiele, B., Schmid, C., Seemann, E., Shawe-Taylor, J., Storkey, A., Szedmak, S., Triggs, B., Ulusoy, I., Viitaniemi, V., & Zhang, J. (2005). The 2005 pascal visual object classes challenge. In First PASCAL challenges workshop. Springer.
Everingham, M., Zisserman, A., Williams, C. K. I., & Van Gool, L. (2006). The pascal visual object classes challenge 2006 (voc 2006) results (Technical report). September 2006. The PASCAL2006 dataset can be downloaded at http://www.pascal-network.org/challenges/VOC/voc2006/.
Fei-Fei, L., Fergus, R., & Perona, P. (2003). A bayesian approach to unsupervised one-shot learning of object categories. In IEEE international conference on computer vision.
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In IEEE CVPR 2004, workshop on generative-model based vision.
Fei-Fei, L., Fergus, R., & Perona, P. (2007, in press). One-shot learning of object categories. IEEE Transactions on Pattern Recognition and Machine Intelligence. The Caltech 101 dataset can be downloaded at http://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html.
Fellbaum, C. (1998). Wordnet: An electronic lexical database. Bradford Books.
Fergus, R., Fei-Fei, L., Perona, P., & Zisserman, A. (2005). Learning object categories from google’s image search. In Proceedings of the 10th international conference on computer vision (Vol. 2, pp. 1816–1823). Beijing, China, October 2005.
Griffin, G., Holub, A. D., & Perona, P. (2006). The Caltech-256 (Technical report). California Institute of Technology.
Heisele, B., Serre, T., Mukherjee, S., & Poggio, T. (2001). Feature reduction and hierarchy of classifiers for fast object detection in video images. In CVPR.
Hoiem, D., Efros, A., & Hebert, M. (2006). Putting objects in perspective. In CVPR.
Ide, N., & Vronis, J. (1998). Introduction to the special issue on word sense disambiguation: the state of the art. Computational Linguistics, 24(1), 1–40.
LeCun, Y., Huang, F.-J., & Bottou, L. (2004). Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of CVPR’04. Los Alamitos: IEEE Press.
Leibe, B. (2005). Interleaved object categorization and segmentation. Ph.D. thesis.
Leibe, B., & Schiele, B. (2003). Analyzing appearance and contour based methods for object categorization. In IEEE conference on computer vision and pattern recognition (CVPR’03), Madison, WI, June 2003.
Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV.
Li, Y., & Shapiro, L. G. (2002). Consistent line clusters for building recognition in cbir. In Proceedings of the international conference on pattern recognition.
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.
Opelt, A., Pinz, A., & Zisserman, A. (2006a). A boundary-fragment-model for object detection. In ECCV.
Opelt, A., Pinz, A., Fussenegger, M., & Auer, P. (2006b). Generic object recognition with boosting. IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI), 28(3).
Quelhas, P., Monay, F., Odobez, J. M., Gatica-Perez, D., Tuytelaars, T., & Van Gool, L. (2005). Modeling scenes with local descriptors and latent aspects. In IEEE international conference on computer vision.
Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2005). Labelme: a database and web-based tool for image annotation (Technical Report AIM-2005-025). MIT AI Lab Memo, September 2005.
Russell, B. C., Efros, A. A., Sivic, J., Freeman, W. T., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In CVPR.
Sivic, J., Russell, B. C., Efros, A. A., Zisserman, A., & Freeman, W. T. (2005). Discovering objects and their location in images. In IEEE international conference on computer vision.
Snavely, N., Seitz, S. M., & Szeliski, R. (2006). Photo tourism: Exploring photo collections in 3d. ACM Transactions on Graphics, 25(3), 137–154.
Stork, D. G. (1999). The open mind initiative. IEEE Intelligent Systems and Their Applications, 14(3), 19–20.
Sudderth, E., Torralba, A., Freeman, W. T., & Willsky, W. (2005a). Describing visual scenes using transformed dirichlet processes. In Advances in neural information processing systems.
Sudderth, E., Torralba, A., Freeman, W. T., & Willsky, W. (2005b). Learning hierarchical models of scenes, objects, and parts. In IEEE international conference on computer vision.
Swain, M. J., & Ballard, D. H. (1991). Color indexing. International Journal of Computer Vision, 7(1).
Torralba, A. (2003). Contextual priming for object detection. International Journal of Computer Vision, 53(2), 153–167.
Torralba, A., Murphy, K., & Freeman, W. (2004). Sharing features: efficient boosting procedures for multiclass object detection. In CVPR.
Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1), 71–86.
Vapnik, V. (1999). The nature of statistical learning theory. New York: Springer.
Vetter, T., Jones, M., & Poggio, T. (1997). A bootstrapping algorithm for learning linear models of object classes. In CVPR.
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple classifiers. In CVPR.
von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In Processing on SIGCHI conference on human factors in computing systems.
von Ahn, L., Liu, R., & Blum, M. (2006). Peekaboom: A game for locating objects in images. In In ACM CHI.
Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learned universal visual dictionary. In IEEE international conference on computer vision. The MSRC dataset can be downloaded at http://research.microsoft.com/vision/cambridge/recognition/default.htm.
Author information
Authors and Affiliations
Corresponding author
Additional information
The first two authors (B.C. Russell and A. Torralba) contributed equally to this work.
Rights and permissions
About this article
Cite this article
Russell, B.C., Torralba, A., Murphy, K.P. et al. LabelMe: A Database and Web-Based Tool for Image Annotation. Int J Comput Vis 77, 157–173 (2008). https://doi.org/10.1007/s11263-007-0090-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-007-0090-8