Advertisement

What Does Classifying More Than 10,000 Image Categories Tell Us?

  • Jia Deng
  • Alexander C. Berg
  • Kai Li
  • Li Fei-Fei
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6315)

Abstract

Image classification is a critical task for both humans and computers. One of the challenges lies in the large scale of the semantic space. In particular, humans can recognize tens of thousands of object classes and scenes. No computer vision algorithm today has been tested at this scale. This paper presents a study of large scale categorization including a series of challenging experiments on classification with more than 10,000 image classes. We find that a) computational issues become crucial in algorithm design; b) conventional wisdom from a couple of hundred image categories on relative performance of different classifiers does not necessarily hold when the number of categories increases; c) there is a surprisingly strong relationship between the structure of WordNet (developed for studying language) and the difficulty of visual categorization; d) classification can be improved by exploiting the semantic hierarchy. Toward the future goal of developing automatic vision algorithms to recognize tens of thousands or even millions of image categories, we make a series of observations and arguments about dataset scale, category density, and image hierarchy.

Keywords

Image Category Query Image Semantic Space Spatial Pyramid Lower Common Ancestor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Biederman, I.: Recognition by components: A theory of human image understanding. PsychR 94, 115–147 (1987)Google Scholar
  2. 2.
    Zhang, H., Berg, A.C., Maire, M., Malik, J.: SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition. In: CVPR 2006 (2006)Google Scholar
  3. 3.
    Grauman, K., Darrell, T.: The pyramid match kernel: Discriminative classification with sets of image features. In: ICCV (2005)Google Scholar
  4. 4.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR 2006 (2006)Google Scholar
  5. 5.
    Varma, M., Ray, D.: Learning the discriminative power-invariance trade-off. In: ICCV (2007)Google Scholar
  6. 6.
    Torralba, A., Fergus, R., Freeman, W.: 80 million tiny images: A large data set for nonparametric object and scene recognition. PAMI 30, 1958–1970 (2008)Google Scholar
  7. 7.
    Felzenszwalb, P., Mcallester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR 2008 (2008)Google Scholar
  8. 8.
    Tuytelaars, T., Mikolajczyk, K.: Local Invariant Feature Detectors: A Survey. Foundations and Trends in Computer Graphics and Vision 3, 177–820 (2008)CrossRefGoogle Scholar
  9. 9.
    Fei-Fei, L., Fergus, R., Torralba, A.: Recognizing and learning object categories. CVPR Short Course (2007)Google Scholar
  10. 10.
    Fei-Fei, L., Fergus, R., Torralba, A.: Recognizing and learning object categories. ICCV Short Course (2009)Google Scholar
  11. 11.
    Rosch, E., Mervis, C.B., Gray, W.D., Johnson, D.M., Braem, P.B.: Basic objects in natural categories. Cognitive Psychology 8, 382–439 (1976)CrossRefGoogle Scholar
  12. 12.
    Everingham, M., Zisserman, A., Williams, C.K.I., van Gool, L., et al.: The 2005 pascal visual object classes challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 117–176. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. PAMI 28, 594–611 (2006)Google Scholar
  14. 14.
    Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Technical Report 7694, Caltech (2007)Google Scholar
  15. 15.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  16. 16.
    Dance, C., Willamowski, J., Fan, L., Bray, C., Csurka, G.: Visual categorization with bags of keypoints. In: ECCV International Workshop on Statistical Learning in Computer Vision (2004)Google Scholar
  17. 17.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar
  18. 18.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR 2005, pp. 886–893 (2005)Google Scholar
  19. 19.
    Chum, O., Zisserman, A.: An exemplar model for learning object classes. In: CVPR 2007, pp. 1–8 (2007)Google Scholar
  20. 20.
    Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. In: ICCV (2009)Google Scholar
  21. 21.
    Gehler, P.V., Nowozin, S.: On feature combination for multiclass object classification. In: ICCV (2009)Google Scholar
  22. 22.
    Rifkin, R., Klautau, A.: In defense of one-vs-all classification. JMLR 5, 101–141 (2004)MathSciNetGoogle Scholar
  23. 23.
    Maji, S., Berg, A.C.: Max-margin additive models for detection. In: ICCV (2009)Google Scholar
  24. 24.
    Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: CVPR 2009 (2009)Google Scholar
  25. 25.
    Fergus, R., Weiss, Y., Torralba, A.: Semi-supervised learning in gigantic image collections. In: NIPS (2009)Google Scholar
  26. 26.
    Wang, C., Yan, S., Zhang, H.J.: Large scale natural image classification by sparsity exploration. ICASP (2009)Google Scholar
  27. 27.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR 2006, pp. II: 2161–2168 (2006)Google Scholar
  28. 28.
    Marszalek, M., Schmid, C.: Semantic hierarchies for visual object recognition. In: CVPR 2007, pp. 1–7 (2007)Google Scholar
  29. 29.
    Zweig, A., Weinshall, D.: Exploiting object hierarchy: Combining models from different category levels. In: ICCV 2007, pp. 1–8 (2007)Google Scholar
  30. 30.
    Griffin, G., Perona, P.: Learning and using taxonomies for fast visual categorization. In: CVPR 2008 (2008)Google Scholar
  31. 31.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge, VOC 2009 Results (2009), http://www.pascal-network.org/challenges/VOC/voc2009/workshop/
  32. 32.
  33. 33.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV 42, 145–175 (2001)zbMATHCrossRefGoogle Scholar
  34. 34.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A library for large linear classification. JMLR 9, 1871–1874 (2008)Google Scholar
  35. 35.
    Crammer, K., Singer, Y., Cristianini, N., Shawe-Taylor, J., Williamson, B.: On the algorithmic implementation of multiclass kernel-based vector machines. JMLR 2 (2001)Google Scholar
  36. 36.
    Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: FOCS 2006, pp. 459–468 (2006)Google Scholar
  37. 37.
    Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: CVPR 2008 (2008)Google Scholar
  38. 38.
    Thorpe, S., Fize, D., Marlot, C.: Speed of processing in the human visual system. Nature 381, 520–522 (1996)CrossRefGoogle Scholar
  39. 39.
    Martinez-Munoz, G., Larios, N., Mortensen, E., Zhang, W., Yamamuro, A., Paasch, R., Payet, N., Lytle, D., Shapiro, L., Todorovic, S., Moldenke, A., Dietterich, T.: Dictionary-free categorization of very similar objects via stacked evidence trees. In: CVPR 2009 (2009)Google Scholar
  40. 40.
    Nilsback, M.E., Zisserman, A.: A visual vocabulary for flower classification. In: CVPR 2006, pp. 1447–1454 (2006)Google Scholar
  41. 41.
    Ferencz, A., Learned-Miller, E.G., Malik, J.: Building a classification cascade for visual identification from one example. In: ICCV 2005, pp. 286–293 (2005)Google Scholar
  42. 42.
    Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008), http://www.vlfeat.org
  43. 43.
    Lin, H.T., Lin, C.J., Weng, R.C.: A note on platt’s probabilistic outputs for support vector machines. Mach. Learn. 68, 267–276 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Jia Deng
    • 1
    • 3
  • Alexander C. Berg
    • 2
  • Kai Li
    • 1
  • Li Fei-Fei
    • 3
  1. 1.Princeton University 
  2. 2.Columbia University 
  3. 3.Stanford University 

Personalised recommendations