LabelMe: A Database and Web-Based Tool for Image Annotation

  • Bryan C. Russell
  • Antonio Torralba
  • Kevin P. Murphy
  • William T. Freeman
Article

Abstract

We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a web-based tool that allows easy image annotation and instant sharing of such annotations. Using this annotation tool, we have collected a large dataset that spans many object categories, often containing multiple instances over a wide variety of images. We quantify the contents of the dataset and compare against existing state of the art datasets used for object recognition and detection. Also, we show how to extend the dataset to automatically enhance object labels with WordNet, discover object parts, recover a depth ordering of objects in a scene, and increase the number of labels using minimal user supervision and images from the web.

Keywords

Database Annotation tool Object recognition Object detection 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abramson, Y., & Freund, Y. (2005). Semi-automatic visual learning (seville): a tutorial on active learning for visual object recognition. In International conference on computer vision and pattern recognition (CVPR’05), San Diego. Google Scholar
  2. Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490. CrossRefGoogle Scholar
  3. Berg, T. L., & Forsyth, D. A. (2006). Animals on the web. In CVPR (Vol. 2, pp. 1463–1470). Google Scholar
  4. Biederman, I. (1987). Recognition by components: a theory of human image interpretation. Pyschological Review, 94, 115–147. CrossRefGoogle Scholar
  5. Bileschi, S. (2006). CBCL streetscenes (Technical report). MIT CBCL. The CBCL-Streetscenes dataset can be downloaded at http://cbcl.mit.edu/software-datasets.
  6. Burianek, J., Ahmadyfard, A., & Kittler, J. (2000). Soil-47, the Surrey object image library. http://www.ee.surrey.ac.uk/Research/VSSP/demos/colour/soil47/.
  7. Carmichael, O., & Hebert, M. (2004). Word: Wiry object recognition database. Carnegie Mellon University. www.cs.cmu.edu/~owenc/word.htm. Accessed January 2004.
  8. Everingham, M., Zisserman, A., Williams, C., Van Gool, L., Allan, M., Bishop, C., Chapelle, O., Dalal, N., Deselaers, T., Dorko, G., Duffner, S., Eichhorn, J., Farquhar, J., Fritz, M., Garcia, C., Griffiths, T., Jurie, F., Keysers, D., Koskela, M., Laaksonen, J., Larlus, D., Leibe, B., Meng, H., Ney, H., Schiele, B., Schmid, C., Seemann, E., Shawe-Taylor, J., Storkey, A., Szedmak, S., Triggs, B., Ulusoy, I., Viitaniemi, V., & Zhang, J. (2005). The 2005 pascal visual object classes challenge. In First PASCAL challenges workshop. Springer. Google Scholar
  9. Everingham, M., Zisserman, A., Williams, C. K. I., & Van Gool, L. (2006). The pascal visual object classes challenge 2006 (voc 2006) results (Technical report). September 2006. The PASCAL2006 dataset can be downloaded at http://www.pascal-network.org/challenges/VOC/voc2006/.
  10. Fei-Fei, L., Fergus, R., & Perona, P. (2003). A bayesian approach to unsupervised one-shot learning of object categories. In IEEE international conference on computer vision. Google Scholar
  11. Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In IEEE CVPR 2004, workshop on generative-model based vision. Google Scholar
  12. Fei-Fei, L., Fergus, R., & Perona, P. (2007, in press). One-shot learning of object categories. IEEE Transactions on Pattern Recognition and Machine Intelligence. The Caltech 101 dataset can be downloaded at http://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html.
  13. Fellbaum, C. (1998). Wordnet: An electronic lexical database. Bradford Books. Google Scholar
  14. Fergus, R., Fei-Fei, L., Perona, P., & Zisserman, A. (2005). Learning object categories from google’s image search. In Proceedings of the 10th international conference on computer vision (Vol. 2, pp. 1816–1823). Beijing, China, October 2005. Google Scholar
  15. Griffin, G., Holub, A. D., & Perona, P. (2006). The Caltech-256 (Technical report). California Institute of Technology. Google Scholar
  16. Heisele, B., Serre, T., Mukherjee, S., & Poggio, T. (2001). Feature reduction and hierarchy of classifiers for fast object detection in video images. In CVPR. Google Scholar
  17. Hoiem, D., Efros, A., & Hebert, M. (2006). Putting objects in perspective. In CVPR. Google Scholar
  18. Ide, N., & Vronis, J. (1998). Introduction to the special issue on word sense disambiguation: the state of the art. Computational Linguistics, 24(1), 1–40. Google Scholar
  19. LeCun, Y., Huang, F.-J., & Bottou, L. (2004). Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of CVPR’04. Los Alamitos: IEEE Press. Google Scholar
  20. Leibe, B. (2005). Interleaved object categorization and segmentation. Ph.D. thesis. Google Scholar
  21. Leibe, B., & Schiele, B. (2003). Analyzing appearance and contour based methods for object categorization. In IEEE conference on computer vision and pattern recognition (CVPR’03), Madison, WI, June 2003. Google Scholar
  22. Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV. Google Scholar
  23. Li, Y., & Shapiro, L. G. (2002). Consistent line clusters for building recognition in cbir. In Proceedings of the international conference on pattern recognition. Google Scholar
  24. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175. MATHCrossRefGoogle Scholar
  25. Opelt, A., Pinz, A., & Zisserman, A. (2006a). A boundary-fragment-model for object detection. In ECCV. Google Scholar
  26. Opelt, A., Pinz, A., Fussenegger, M., & Auer, P. (2006b). Generic object recognition with boosting. IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI), 28(3). Google Scholar
  27. Quelhas, P., Monay, F., Odobez, J. M., Gatica-Perez, D., Tuytelaars, T., & Van Gool, L. (2005). Modeling scenes with local descriptors and latent aspects. In IEEE international conference on computer vision. Google Scholar
  28. Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2005). Labelme: a database and web-based tool for image annotation (Technical Report AIM-2005-025). MIT AI Lab Memo, September 2005. Google Scholar
  29. Russell, B. C., Efros, A. A., Sivic, J., Freeman, W. T., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In CVPR. Google Scholar
  30. Sivic, J., Russell, B. C., Efros, A. A., Zisserman, A., & Freeman, W. T. (2005). Discovering objects and their location in images. In IEEE international conference on computer vision. Google Scholar
  31. Snavely, N., Seitz, S. M., & Szeliski, R. (2006). Photo tourism: Exploring photo collections in 3d. ACM Transactions on Graphics, 25(3), 137–154. CrossRefGoogle Scholar
  32. Stork, D. G. (1999). The open mind initiative. IEEE Intelligent Systems and Their Applications, 14(3), 19–20. Google Scholar
  33. Sudderth, E., Torralba, A., Freeman, W. T., & Willsky, W. (2005a). Describing visual scenes using transformed dirichlet processes. In Advances in neural information processing systems. Google Scholar
  34. Sudderth, E., Torralba, A., Freeman, W. T., & Willsky, W. (2005b). Learning hierarchical models of scenes, objects, and parts. In IEEE international conference on computer vision. Google Scholar
  35. Swain, M. J., & Ballard, D. H. (1991). Color indexing. International Journal of Computer Vision, 7(1). Google Scholar
  36. Torralba, A. (2003). Contextual priming for object detection. International Journal of Computer Vision, 53(2), 153–167. CrossRefGoogle Scholar
  37. Torralba, A., Murphy, K., & Freeman, W. (2004). Sharing features: efficient boosting procedures for multiclass object detection. In CVPR. Google Scholar
  38. Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1), 71–86. CrossRefGoogle Scholar
  39. Vapnik, V. (1999). The nature of statistical learning theory. New York: Springer. Google Scholar
  40. Vetter, T., Jones, M., & Poggio, T. (1997). A bootstrapping algorithm for learning linear models of object classes. In CVPR. Google Scholar
  41. Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple classifiers. In CVPR. Google Scholar
  42. von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In Processing on SIGCHI conference on human factors in computing systems. Google Scholar
  43. von Ahn, L., Liu, R., & Blum, M. (2006). Peekaboom: A game for locating objects in images. In In ACM CHI. Google Scholar
  44. Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learned universal visual dictionary. In IEEE international conference on computer vision. The MSRC dataset can be downloaded at http://research.microsoft.com/vision/cambridge/recognition/default.htm.

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Bryan C. Russell
    • 1
  • Antonio Torralba
    • 1
  • Kevin P. Murphy
    • 2
  • William T. Freeman
    • 1
  1. 1.Computer Science and Artificial Intelligence LaboratoryMassachusetts Institute of TechnologyCambridgeUSA
  2. 2.Departments of computer science and statisticsUniversity of British ColumbiaVancouverCanada

Personalised recommendations