Skip to main content
Log in

LabelMe: A Database and Web-Based Tool for Image Annotation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a web-based tool that allows easy image annotation and instant sharing of such annotations. Using this annotation tool, we have collected a large dataset that spans many object categories, often containing multiple instances over a wide variety of images. We quantify the contents of the dataset and compare against existing state of the art datasets used for object recognition and detection. Also, we show how to extend the dataset to automatically enhance object labels with WordNet, discover object parts, recover a depth ordering of objects in a scene, and increase the number of labels using minimal user supervision and images from the web.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Abramson, Y., & Freund, Y. (2005). Semi-automatic visual learning (seville): a tutorial on active learning for visual object recognition. In International conference on computer vision and pattern recognition (CVPR’05), San Diego.

  • Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490.

    Article  Google Scholar 

  • Berg, T. L., & Forsyth, D. A. (2006). Animals on the web. In CVPR (Vol. 2, pp. 1463–1470).

  • Biederman, I. (1987). Recognition by components: a theory of human image interpretation. Pyschological Review, 94, 115–147.

    Article  Google Scholar 

  • Bileschi, S. (2006). CBCL streetscenes (Technical report). MIT CBCL. The CBCL-Streetscenes dataset can be downloaded at http://cbcl.mit.edu/software-datasets.

  • Burianek, J., Ahmadyfard, A., & Kittler, J. (2000). Soil-47, the Surrey object image library. http://www.ee.surrey.ac.uk/Research/VSSP/demos/colour/soil47/.

  • Carmichael, O., & Hebert, M. (2004). Word: Wiry object recognition database. Carnegie Mellon University. www.cs.cmu.edu/~owenc/word.htm. Accessed January 2004.

  • Everingham, M., Zisserman, A., Williams, C., Van Gool, L., Allan, M., Bishop, C., Chapelle, O., Dalal, N., Deselaers, T., Dorko, G., Duffner, S., Eichhorn, J., Farquhar, J., Fritz, M., Garcia, C., Griffiths, T., Jurie, F., Keysers, D., Koskela, M., Laaksonen, J., Larlus, D., Leibe, B., Meng, H., Ney, H., Schiele, B., Schmid, C., Seemann, E., Shawe-Taylor, J., Storkey, A., Szedmak, S., Triggs, B., Ulusoy, I., Viitaniemi, V., & Zhang, J. (2005). The 2005 pascal visual object classes challenge. In First PASCAL challenges workshop. Springer.

  • Everingham, M., Zisserman, A., Williams, C. K. I., & Van Gool, L. (2006). The pascal visual object classes challenge 2006 (voc 2006) results (Technical report). September 2006. The PASCAL2006 dataset can be downloaded at http://www.pascal-network.org/challenges/VOC/voc2006/.

  • Fei-Fei, L., Fergus, R., & Perona, P. (2003). A bayesian approach to unsupervised one-shot learning of object categories. In IEEE international conference on computer vision.

  • Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In IEEE CVPR 2004, workshop on generative-model based vision.

  • Fei-Fei, L., Fergus, R., & Perona, P. (2007, in press). One-shot learning of object categories. IEEE Transactions on Pattern Recognition and Machine Intelligence. The Caltech 101 dataset can be downloaded at http://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html.

  • Fellbaum, C. (1998). Wordnet: An electronic lexical database. Bradford Books.

  • Fergus, R., Fei-Fei, L., Perona, P., & Zisserman, A. (2005). Learning object categories from google’s image search. In Proceedings of the 10th international conference on computer vision (Vol. 2, pp. 1816–1823). Beijing, China, October 2005.

  • Griffin, G., Holub, A. D., & Perona, P. (2006). The Caltech-256 (Technical report). California Institute of Technology.

  • Heisele, B., Serre, T., Mukherjee, S., & Poggio, T. (2001). Feature reduction and hierarchy of classifiers for fast object detection in video images. In CVPR.

  • Hoiem, D., Efros, A., & Hebert, M. (2006). Putting objects in perspective. In CVPR.

  • Ide, N., & Vronis, J. (1998). Introduction to the special issue on word sense disambiguation: the state of the art. Computational Linguistics, 24(1), 1–40.

    Google Scholar 

  • LeCun, Y., Huang, F.-J., & Bottou, L. (2004). Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of CVPR’04. Los Alamitos: IEEE Press.

    Google Scholar 

  • Leibe, B. (2005). Interleaved object categorization and segmentation. Ph.D. thesis.

  • Leibe, B., & Schiele, B. (2003). Analyzing appearance and contour based methods for object categorization. In IEEE conference on computer vision and pattern recognition (CVPR’03), Madison, WI, June 2003.

  • Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV.

  • Li, Y., & Shapiro, L. G. (2002). Consistent line clusters for building recognition in cbir. In Proceedings of the international conference on pattern recognition.

  • Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.

    Article  MATH  Google Scholar 

  • Opelt, A., Pinz, A., & Zisserman, A. (2006a). A boundary-fragment-model for object detection. In ECCV.

  • Opelt, A., Pinz, A., Fussenegger, M., & Auer, P. (2006b). Generic object recognition with boosting. IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI), 28(3).

  • Quelhas, P., Monay, F., Odobez, J. M., Gatica-Perez, D., Tuytelaars, T., & Van Gool, L. (2005). Modeling scenes with local descriptors and latent aspects. In IEEE international conference on computer vision.

  • Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2005). Labelme: a database and web-based tool for image annotation (Technical Report AIM-2005-025). MIT AI Lab Memo, September 2005.

  • Russell, B. C., Efros, A. A., Sivic, J., Freeman, W. T., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In CVPR.

  • Sivic, J., Russell, B. C., Efros, A. A., Zisserman, A., & Freeman, W. T. (2005). Discovering objects and their location in images. In IEEE international conference on computer vision.

  • Snavely, N., Seitz, S. M., & Szeliski, R. (2006). Photo tourism: Exploring photo collections in 3d. ACM Transactions on Graphics, 25(3), 137–154.

    Article  Google Scholar 

  • Stork, D. G. (1999). The open mind initiative. IEEE Intelligent Systems and Their Applications, 14(3), 19–20.

    Google Scholar 

  • Sudderth, E., Torralba, A., Freeman, W. T., & Willsky, W. (2005a). Describing visual scenes using transformed dirichlet processes. In Advances in neural information processing systems.

  • Sudderth, E., Torralba, A., Freeman, W. T., & Willsky, W. (2005b). Learning hierarchical models of scenes, objects, and parts. In IEEE international conference on computer vision.

  • Swain, M. J., & Ballard, D. H. (1991). Color indexing. International Journal of Computer Vision, 7(1).

  • Torralba, A. (2003). Contextual priming for object detection. International Journal of Computer Vision, 53(2), 153–167.

    Article  Google Scholar 

  • Torralba, A., Murphy, K., & Freeman, W. (2004). Sharing features: efficient boosting procedures for multiclass object detection. In CVPR.

  • Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1), 71–86.

    Article  Google Scholar 

  • Vapnik, V. (1999). The nature of statistical learning theory. New York: Springer.

    Google Scholar 

  • Vetter, T., Jones, M., & Poggio, T. (1997). A bootstrapping algorithm for learning linear models of object classes. In CVPR.

  • Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple classifiers. In CVPR.

  • von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In Processing on SIGCHI conference on human factors in computing systems.

  • von Ahn, L., Liu, R., & Blum, M. (2006). Peekaboom: A game for locating objects in images. In In ACM CHI.

  • Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learned universal visual dictionary. In IEEE international conference on computer vision. The MSRC dataset can be downloaded at http://research.microsoft.com/vision/cambridge/recognition/default.htm.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bryan C. Russell.

Additional information

The first two authors (B.C. Russell and A. Torralba) contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Russell, B.C., Torralba, A., Murphy, K.P. et al. LabelMe: A Database and Web-Based Tool for Image Annotation. Int J Comput Vis 77, 157–173 (2008). https://doi.org/10.1007/s11263-007-0090-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-007-0090-8

Keywords

Navigation