LabelMe: A Database and Web-Based Tool for Image Annotation

Russell, Bryan C.; Torralba, Antonio; Murphy, Kevin P.; Freeman, William T.

doi:10.1007/s11263-007-0090-8

LabelMe: A Database and Web-Based Tool for Image Annotation

Published: 31 October 2007

Volume 77, pages 157–173, (2008)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Bryan C. Russell¹,
Antonio Torralba¹,
Kevin P. Murphy² &
…
William T. Freeman¹

7369 Accesses
2324 Citations
6 Altmetric
Explore all metrics

Abstract

We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a web-based tool that allows easy image annotation and instant sharing of such annotations. Using this annotation tool, we have collected a large dataset that spans many object categories, often containing multiple instances over a wide variety of images. We quantify the contents of the dataset and compare against existing state of the art datasets used for object recognition and detection. Also, we show how to extend the dataset to automatically enhance object labels with WordNet, discover object parts, recover a depth ordering of objects in a scene, and increase the number of labels using minimal user supervision and images from the web.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abramson, Y., & Freund, Y. (2005). Semi-automatic visual learning (seville): a tutorial on active learning for visual object recognition. In International conference on computer vision and pattern recognition (CVPR’05), San Diego.
Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490.
Article Google Scholar
Berg, T. L., & Forsyth, D. A. (2006). Animals on the web. In CVPR (Vol. 2, pp. 1463–1470).
Biederman, I. (1987). Recognition by components: a theory of human image interpretation. Pyschological Review, 94, 115–147.
Article Google Scholar
Bileschi, S. (2006). CBCL streetscenes (Technical report). MIT CBCL. The CBCL-Streetscenes dataset can be downloaded at http://cbcl.mit.edu/software-datasets.
Burianek, J., Ahmadyfard, A., & Kittler, J. (2000). Soil-47, the Surrey object image library. http://www.ee.surrey.ac.uk/Research/VSSP/demos/colour/soil47/.
Carmichael, O., & Hebert, M. (2004). Word: Wiry object recognition database. Carnegie Mellon University. www.cs.cmu.edu/~owenc/word.htm. Accessed January 2004.
Everingham, M., Zisserman, A., Williams, C., Van Gool, L., Allan, M., Bishop, C., Chapelle, O., Dalal, N., Deselaers, T., Dorko, G., Duffner, S., Eichhorn, J., Farquhar, J., Fritz, M., Garcia, C., Griffiths, T., Jurie, F., Keysers, D., Koskela, M., Laaksonen, J., Larlus, D., Leibe, B., Meng, H., Ney, H., Schiele, B., Schmid, C., Seemann, E., Shawe-Taylor, J., Storkey, A., Szedmak, S., Triggs, B., Ulusoy, I., Viitaniemi, V., & Zhang, J. (2005). The 2005 pascal visual object classes challenge. In First PASCAL challenges workshop. Springer.
Everingham, M., Zisserman, A., Williams, C. K. I., & Van Gool, L. (2006). The pascal visual object classes challenge 2006 (voc 2006) results (Technical report). September 2006. The PASCAL2006 dataset can be downloaded at http://www.pascal-network.org/challenges/VOC/voc2006/.
Fei-Fei, L., Fergus, R., & Perona, P. (2003). A bayesian approach to unsupervised one-shot learning of object categories. In IEEE international conference on computer vision.
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In IEEE CVPR 2004, workshop on generative-model based vision.
Fei-Fei, L., Fergus, R., & Perona, P. (2007, in press). One-shot learning of object categories. IEEE Transactions on Pattern Recognition and Machine Intelligence. The Caltech 101 dataset can be downloaded at http://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html.
Fellbaum, C. (1998). Wordnet: An electronic lexical database. Bradford Books.
Fergus, R., Fei-Fei, L., Perona, P., & Zisserman, A. (2005). Learning object categories from google’s image search. In Proceedings of the 10th international conference on computer vision (Vol. 2, pp. 1816–1823). Beijing, China, October 2005.
Griffin, G., Holub, A. D., & Perona, P. (2006). The Caltech-256 (Technical report). California Institute of Technology.
Heisele, B., Serre, T., Mukherjee, S., & Poggio, T. (2001). Feature reduction and hierarchy of classifiers for fast object detection in video images. In CVPR.
Hoiem, D., Efros, A., & Hebert, M. (2006). Putting objects in perspective. In CVPR.
Ide, N., & Vronis, J. (1998). Introduction to the special issue on word sense disambiguation: the state of the art. Computational Linguistics, 24(1), 1–40.
Google Scholar
LeCun, Y., Huang, F.-J., & Bottou, L. (2004). Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of CVPR’04. Los Alamitos: IEEE Press.
Google Scholar
Leibe, B. (2005). Interleaved object categorization and segmentation. Ph.D. thesis.
Leibe, B., & Schiele, B. (2003). Analyzing appearance and contour based methods for object categorization. In IEEE conference on computer vision and pattern recognition (CVPR’03), Madison, WI, June 2003.
Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV.
Li, Y., & Shapiro, L. G. (2002). Consistent line clusters for building recognition in cbir. In Proceedings of the international conference on pattern recognition.
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.
Article MATH Google Scholar
Opelt, A., Pinz, A., & Zisserman, A. (2006a). A boundary-fragment-model for object detection. In ECCV.
Opelt, A., Pinz, A., Fussenegger, M., & Auer, P. (2006b). Generic object recognition with boosting. IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI), 28(3).
Quelhas, P., Monay, F., Odobez, J. M., Gatica-Perez, D., Tuytelaars, T., & Van Gool, L. (2005). Modeling scenes with local descriptors and latent aspects. In IEEE international conference on computer vision.
Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2005). Labelme: a database and web-based tool for image annotation (Technical Report AIM-2005-025). MIT AI Lab Memo, September 2005.
Russell, B. C., Efros, A. A., Sivic, J., Freeman, W. T., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In CVPR.
Sivic, J., Russell, B. C., Efros, A. A., Zisserman, A., & Freeman, W. T. (2005). Discovering objects and their location in images. In IEEE international conference on computer vision.
Snavely, N., Seitz, S. M., & Szeliski, R. (2006). Photo tourism: Exploring photo collections in 3d. ACM Transactions on Graphics, 25(3), 137–154.
Article Google Scholar
Stork, D. G. (1999). The open mind initiative. IEEE Intelligent Systems and Their Applications, 14(3), 19–20.
Google Scholar
Sudderth, E., Torralba, A., Freeman, W. T., & Willsky, W. (2005a). Describing visual scenes using transformed dirichlet processes. In Advances in neural information processing systems.
Sudderth, E., Torralba, A., Freeman, W. T., & Willsky, W. (2005b). Learning hierarchical models of scenes, objects, and parts. In IEEE international conference on computer vision.
Swain, M. J., & Ballard, D. H. (1991). Color indexing. International Journal of Computer Vision, 7(1).
Torralba, A. (2003). Contextual priming for object detection. International Journal of Computer Vision, 53(2), 153–167.
Article Google Scholar
Torralba, A., Murphy, K., & Freeman, W. (2004). Sharing features: efficient boosting procedures for multiclass object detection. In CVPR.
Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1), 71–86.
Article Google Scholar
Vapnik, V. (1999). The nature of statistical learning theory. New York: Springer.
Google Scholar
Vetter, T., Jones, M., & Poggio, T. (1997). A bootstrapping algorithm for learning linear models of object classes. In CVPR.
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple classifiers. In CVPR.
von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In Processing on SIGCHI conference on human factors in computing systems.
von Ahn, L., Liu, R., & Blum, M. (2006). Peekaboom: A game for locating objects in images. In In ACM CHI.
Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learned universal visual dictionary. In IEEE international conference on computer vision. The MSRC dataset can be downloaded at http://research.microsoft.com/vision/cambridge/recognition/default.htm.

Download references

Author information

Authors and Affiliations

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Bryan C. Russell, Antonio Torralba & William T. Freeman
Departments of computer science and statistics, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
Kevin P. Murphy

Authors

Bryan C. Russell
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Torralba
View author publications
You can also search for this author in PubMed Google Scholar
Kevin P. Murphy
View author publications
You can also search for this author in PubMed Google Scholar
William T. Freeman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bryan C. Russell.

Additional information

The first two authors (B.C. Russell and A. Torralba) contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Russell, B.C., Torralba, A., Murphy, K.P. et al. LabelMe: A Database and Web-Based Tool for Image Annotation. Int J Comput Vis 77, 157–173 (2008). https://doi.org/10.1007/s11263-007-0090-8

Download citation

Received: 06 September 2005
Accepted: 11 September 2007
Published: 31 October 2007
Issue Date: May 2008
DOI: https://doi.org/10.1007/s11263-007-0090-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LabelMe: A Database and Web-Based Tool for Image Annotation

Abstract

Access this article

Similar content being viewed by others

The Open Images Dataset V4

Automatic Image Annotation at ImageCLEF

Semi-automatic Image Annotation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

The Open Images Dataset V4

Automatic Image Annotation at ImageCLEF

Semi-automatic Image Annotation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation