Where Next in Object Recognition and how much Supervision Do We Need?

  • Sandra EbertEmail author
  • Bernt Schiele
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)


Object class recognition is an active topic in computer vision still presenting many challenges. In most approaches, this task is addressed by supervised learning algorithms that need a large quantity of labels to perform well. This leads either to small datasets (<10,000 images) that capture only a subset of the real-world class distribution (but with a controlled and verified labeling procedure), or to large datasets that are more representative but also add more label noise. Therefore, semi-supervised learning has been established as a promising direction to address object recognition. It requires only few labels while simultaneously making use of the vast amount of images available today. In this chapter, we outline the main challenges of semi-supervised object recognition, we review existing approaches, and we emphasize open issues that should be addressed next to advance this research topic.


Video Sequence Object Recognition Object Class Unlabeled Data Image Description 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Angluin D, Laird P (1988) Learning from noisy examples. Mach Learn 2:343–370 Google Scholar
  2. 2.
    Argyriou A, Herbster M, Pontil M (2005) Combining graph Laplacians for semi-supervised learning. In: NIPS Google Scholar
  3. 3.
    Ashby FG (1992) Multidimensional models of categorization. In: Multidimensional models of perception and cognition. Erlbaum, Hillsdale, pp 449–483 Google Scholar
  4. 4.
    Ashby FG, Todd WT (2011) Human category learning 2.0. Ann NY Acad Sci 1224:147–161 CrossRefGoogle Scholar
  5. 5.
    Balcan M-F, Blum A (2005) A PAC-style model for learning from labeled and unlabeled data. In: COLT Google Scholar
  6. 6.
    Balcan M-f, Blum A, Pakyan Choi P, Lafferty J, Pantano B, Rwebangira MR, Zhu X (2005) Person identification in webcam images: an application of semi-supervised learning. In: ICML WS Google Scholar
  7. 7.
    Baram Y, El-yaniv R, Luz K (2004) Online choice of active learning algorithms. J Mach Learn Res 5:255–291 MathSciNetGoogle Scholar
  8. 8.
    Bauckhage C, Thurau C (2009) Making archetypal analysis practical. In: DAGM Google Scholar
  9. 9.
    Berg TL, Forsyth DA (2006) Animals on the web. In: CVPR Google Scholar
  10. 10.
    Biederman I (1987) Recognition-by-components: a theory of human image understanding. Psychol Rev 94(2):115–147 CrossRefGoogle Scholar
  11. 11.
    Bischof H, Pinz A, Kropatsch WG (1992) Visualization methods for neural networks. In: IAPR Google Scholar
  12. 12.
    Blum A, Chawla S (2001) Learning from labeled and unlabeled data using graph mincuts. In: ICML Google Scholar
  13. 13.
    Buhmann JM, Zöller T (2000) Active learning for hierarchical pairwise data clustering. In: ICPR Google Scholar
  14. 14.
    Burl MC, Perona P (1996) Recognition of planar object classes. In: CVPR Google Scholar
  15. 15.
    Cebron N, Berthold MR (2009) Active learning for object classification: from exploration to exploitation. Data Min Knowl Discov 18(2):283–299 MathSciNetCrossRefGoogle Scholar
  16. 16.
    Cebron N, Richter F, Lienhart R (2012) “I can tell you what it’s not”: active learning from counterexamples. In: Progress in artificial intelligence Google Scholar
  17. 17.
    Chabris C, Simons D (2010) The invisible gorilla: how our intuitions deceive us. Crown Publishing Group Google Scholar
  18. 18.
    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:341–378 Google Scholar
  19. 19.
    Cohen B, Murphy GL (1984) Models of concepts. Cogn Sci 8(1):27–58 CrossRefGoogle Scholar
  20. 20.
    Cootes TF, Edwards GJ, Taylor CJ (1998) Active appearance models. In: ECCV Google Scholar
  21. 21.
    Cutler A, Breiman L (1994) Archetypal analysis. Technometrics 36(4):338–347 MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Daitch SI, Kelner JA, Spielman DA, Haven N (2009) Fitting a graph to vector data. In: ICML Google Scholar
  23. 23.
    Damasio A (1994) Descartes’ error: emotion, reason, and the human brain. Penguin Group Google Scholar
  24. 24.
    Davis J, Kulis B, Jain P, Sra S, Dhillon I (2007) Information-theoretic metric learning. In: ICML Google Scholar
  25. 25.
    Delaitre V, Fouhey DF, Laptev I, Sivic J, Gupta A, Efros AA (2012) Scene semantics from long-term observation of people. In: ECCV Google Scholar
  26. 26.
    Delalleau O, Bengio Y, Le Roux N (2005) Efficient non-parametric function induction in semi-supervised learning. In: AISTATS Google Scholar
  27. 27.
    Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR, June 2009. IEEE Google Scholar
  28. 28.
    Dubout C, Fleuret F (2011) Tasting families of features for image classification. In: ICCV Google Scholar
  29. 29.
    Ebert S (2012) Semi-supervised learning for image classification. PhD thesis, Saarland University Google Scholar
  30. 30.
    Ebert S, Larlus D, Schiele B (2010) Extracting structures in image collections for object recognition. In: ECCV Google Scholar
  31. 31.
    Ebert S, Fritz M, Schiele B (2011) Pick your neighborhood—improving labels and neighborhood structure for label propagation. In: DAGM Google Scholar
  32. 32.
    Ebert S, Fritz M, Schiele B (2012) Active metric learning for object recognition. In: DAGM Google Scholar
  33. 33.
    Ebert S, Fritz M, Schiele B (2012) Semi-supervised learning on a budget: scaling up to large datasets. In: ACCV Google Scholar
  34. 34.
    Elhamifar E, Sapiro G, Vidal R (2012) See all by looking at a few: sparse modeling for finding representative objects. In: CVPR Google Scholar
  35. 35.
    Erickson MA, Kruschke JK (1998) Rules and exemplars in category learning. J Exp Psychol Gen 127(2):107–140 CrossRefGoogle Scholar
  36. 36.
    Everingham M, Van Gool L, Williams CK (2008) The PASCAL VOC Google Scholar
  37. 37.
    Farajtabar M, Shaban A, Reza Rabiee H, Rohban MH (2011) Manifold coarse graining for online semi-supervised learning. In: ECML Google Scholar
  38. 38.
    Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611 CrossRefGoogle Scholar
  39. 39.
    Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645 CrossRefGoogle Scholar
  40. 40.
    Fergus R, Weiss Y, Torralba A (2009) Semi-supervised learning in gigantic image collections. In: NIPS Google Scholar
  41. 41.
    Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188 CrossRefGoogle Scholar
  42. 42.
    Fowlkes C, Belongie S, Chung F, Malik J (2004) Spectral grouping using the Nystrom method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225 CrossRefGoogle Scholar
  43. 43.
    Freeman WT (2011) Where computer vision needs help from computer science. In: ACM-SIAM symposium on discrete algorithms Google Scholar
  44. 44.
    Fritz M, Black M, Bradski G, Darrell T (2009) An additive latent feature model for transparent object recognition. In: NIPS Google Scholar
  45. 45.
    Fussenegger M, Roth PM, Bischof H, Pinz A (2006) On-line, incremental learning of a robust active shape model. Pattern Recognit 4174:122–131 Google Scholar
  46. 46.
    Gehler P, Nowozin S (2009) On feature combination for multiclass object classification. In: ICCV Google Scholar
  47. 47.
    Goldberg AB, Zhu X, Wright S (2007) Dissimilarity in graph-based semi-supervised classification. In: AISTATS Google Scholar
  48. 48.
    Grabner H, Leistner C, Bischof H (2008) Semi-supervised on-line boosting for robust tracking. In: ECCV Google Scholar
  49. 49.
    Hayward WG (2003) After the viewpoint debate: where next in object recognition? Trends Cogn Sci 7(10):425–427 MathSciNetCrossRefGoogle Scholar
  50. 50.
    Joachims T (1999) Transductive inference for text classification using support vector machines. In: ICML Google Scholar
  51. 51.
    Kahneman D, Tversky A (1979) Prospect theory: an analysis of decision under risk. Econometrica 47(2):263–291 CrossRefzbMATHGoogle Scholar
  52. 52.
    Kant I (1781) Kritik der reinen Vernunft. Johann Friedrich Hartknoch Verlag. English edition: Kant I (1838) Critique of pure reason (trans: Haywood F) Google Scholar
  53. 53.
    Kaplan AS, Murphy GL (2000) Category learning with minimal prior knowledge. J Exp Psychol 26(4):829–846 Google Scholar
  54. 54.
    Karlen M, Weston J, Erkan A, Collobert R (2008) Large scale manifold transduction. In: ICML. ACM Press, New York Google Scholar
  55. 55.
    Kato T, Kashima H, Sugiyama M (2009) Robust label propagation on multiple networks. IEEE Trans Neural Netw 20(1):35–44 CrossRefGoogle Scholar
  56. 56.
    Khosla A, Zhou T, Malisiewicz T, Efros AA, Torralba A (2012) Undoing the damage of dataset bias. In: ECCV Google Scholar
  57. 57.
    Kruschke JK (1992) ALCOVE: an exemplar-based connectionist model of category learning. Psychol Rev 99(1):22–44 CrossRefGoogle Scholar
  58. 58.
    Kulis B, Jain P, Grauman K (2009) Fast similarity search for learned metrics. IEEE Trans Pattern Anal Mach Intell 31(12):2143–2157 CrossRefGoogle Scholar
  59. 59.
    Lampert CH, Nickisch H, Harmeling S (2009) Learning to detect unseen object classes by between-class attribute transfer. In: CVPR Google Scholar
  60. 60.
    Lee YJ, Grauman K (2009) Foreground focus: unsupervised learning from partially matching images. Int J Comput Vis 85:143–166 CrossRefGoogle Scholar
  61. 61.
    Leibe B, Seemann E, Schiele B (2005) Pedestrian detection in crowded scenes. In: CVPR. IEEE Google Scholar
  62. 62.
    Levin DT, Simons DJ (1997) Failure to detect changes to attended objects in motion pictures. Psychon Bull Rev 4(4):501–506 CrossRefGoogle Scholar
  63. 63.
    Li W, Fritz M (2012) Recognizing materials from virtual examples. In: ECCV Google Scholar
  64. 64.
    Li Y-F, Zhou Z-H (2011) Improving semi-supervised support vector machines through unlabeled instances selection. In: AAAI Google Scholar
  65. 65.
    Liu W, He J, Chang SF (2010) Large graph construction for scalable semi-supervised learning. In: ICML, pp 1–8 Google Scholar
  66. 66.
    Lu Z, Jain P, Dhillon IS (2009) Geometry-aware metric learning. In: ICML Google Scholar
  67. 67.
    Medin DL, Schaffer MM (1978) Context theory of classification learning. Psychol Rev 85(3):207–238 CrossRefGoogle Scholar
  68. 68.
    Minda JP, Smith JD (2001) Prototypes in category learning: the effects of category size, category structure, and stimulus complexity. J Exp Psychol Learn Mem Cogn 27(3):775–799 CrossRefGoogle Scholar
  69. 69.
    Murphy GL (2002) The big book of concepts Google Scholar
  70. 70.
    Murphy GL, Allopenna PD (1994) The locus of knowledge effects in concept learning. J Exp Psychol Learn Mem Cogn 20(4):904–919 CrossRefGoogle Scholar
  71. 71.
    Murphy GL, Medin DL (1985) The role of theories in conceptual coherence. Psychol Rev 92(3):289–316 CrossRefGoogle Scholar
  72. 72.
    Nguyen HT, Smeulders A (2004) Active learning using pre-clustering. In: ICML Google Scholar
  73. 73.
    Nigam K, McCallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39:103–134 CrossRefzbMATHGoogle Scholar
  74. 74.
    Nosofsky RM (1984) Choice, similarity, and the context theory of classification. J Exp Psychol 10(1):104–114 Google Scholar
  75. 75.
    Osherson DN, Smith EE (1981) On the adequacy of prototype theory as a theory of concepts. Cognition 9(1):35–58 CrossRefGoogle Scholar
  76. 76.
    Osugi T, Kun D, Scott S (2005) Balancing exploration and exploitation: a new algorithm for active machine learning. In: ICDM Google Scholar
  77. 77.
    Parikh D, Grauman K (2011) Relative attributes. In: ICCV, November 2011. IEEE Google Scholar
  78. 78.
    Pazzani MJ (1991) Influence of prior knowledge on concept acquisition: experimental and computational results. J Exp Psychol 17(3):416–432 Google Scholar
  79. 79.
    Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2(6):559–572 CrossRefGoogle Scholar
  80. 80.
    Pepik B, Stark M, Gehler P, Schiele B (2012) Teaching 3D geometry to deformable part models. In: CVPR Google Scholar
  81. 81.
    Pishchulin L, Jain A, Andriluka M, Thormälen T, Schiele B (2012) Articulated people detection and pose estimation: reshaping the future. In: CVPR Google Scholar
  82. 82.
    Ponce J, Berg TL, Everingham M, Forsyth DA, Hebert M, Lazebnik S, Marszalek M, Schmid C, Russell BC, Torralba A, Williams CKI, Zhang J, Zisserman A (2006) Dataset issues in object recognition. In: Ponce J, Hebert M, Schmid C, Zisserman A (eds) Towards category-level object recognition. LNCS. Springer, Berlin, pp 29–48 CrossRefGoogle Scholar
  83. 83.
    Pope A, Lowe DG (1996) Learning appearance models for object recognition. In: Object representation in computer vision II Google Scholar
  84. 84.
    Posner MI, Goldsmith R, Welton KE (1967) Perceived distance and the classification of distorted patterns. J Exp Psychol 73(1):28–38 CrossRefGoogle Scholar
  85. 85.
    Prabhakaran S, Raman S, Vogt JE, Roth V (2012) Automatic model selection in archetype analysis. In: DAGM Google Scholar
  86. 86.
    Rohban MH, Rabiee HR (2012) Supervised neighborhood graph construction for semi-supervised classification. Pattern Recognit 45(4):1363–1372 CrossRefzbMATHGoogle Scholar
  87. 87.
    Rohrbach M, Stark M, Schiele B (2011) Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: CVPR Google Scholar
  88. 88.
    Rosch E, Mervis CB, Gray WD, Johnson DM, Boyes-Braem P (1976) Basic objects in natural categories. Cogn Psychol 8:382–439 CrossRefGoogle Scholar
  89. 89.
    Saffari A, Godec M, Pock T, Leistner C, Bischof H (2010) Online multi-class LPBoost. In: CVPR, June 2010. IEEE Google Scholar
  90. 90.
    Schiele B (2000) Towards automatic extraction and modeling of objects from image sequences. In: Int sym on intelligent robotic systems Google Scholar
  91. 91.
    Schiele B, Crowley JL (1996) Where to look next and what to look for. In: IROS Google Scholar
  92. 92.
    Schiele B, Crowley JL (1997) The concept of visual classes for object classification. In: Scand conf image analysis Google Scholar
  93. 93.
    Schiele B, Crowley JL (1998) Transinformation for active object recognition. In: ICCV Google Scholar
  94. 94.
    Schiele B, Crowley JL (2000) Recognition without correspondence using multidimensional receptive field histograms. Int J Comput Vis 36(1):31–52 CrossRefGoogle Scholar
  95. 95.
    Schiele B, Pentland A (1999) Probabilistic object recognition and localization. In: ICCV Google Scholar
  96. 96.
    Schnitzspan P, Fritz M, Roth S, Schiele B, Berkeley Eecs UC (2009) Discriminative structure learning of hierarchical representations for object detection. In: CVPR Google Scholar
  97. 97.
    Schohn G, Cohn D (2000) Less is more: active learning with support vector machines. In: ICML Google Scholar
  98. 98.
    Seeger M (2001) Learning with labeled and unlabeled data. Technical report, University of Edinburgh Google Scholar
  99. 99.
    Settles B (2009) Active Learning Literature Survey. Technical report, University of Wisconsin–Madison Google Scholar
  100. 100.
    Simon I, Snavely N, Seitz SM (2007) Scene summarization for online image collections. In: ICCV. IEEE Google Scholar
  101. 101.
    Simons DJ, Chabris CF (1999) Gorillas in our midst: sustained inattentional blindness for dynamic events. Perception 28(9):1059–1074 CrossRefGoogle Scholar
  102. 102.
    Simons DJ, Levin DT (1998) Failure to detect changes to people during a real-world interaction. Psychon Bull Rev 5(4):644–649 CrossRefGoogle Scholar
  103. 103.
    Sivic J, Russell BC, Efros AA, Zisserman A, Freeman WT (2005) Discovering object categories in image collections. In: ICCV Google Scholar
  104. 104.
    Smith E, Medin DL (1981) Categories and concepts. Harvard University Press, Cambridge Google Scholar
  105. 105.
    Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565 MathSciNetzbMATHGoogle Scholar
  106. 106.
    Stark M, Goesele M, Schiele B (2010) Back to the future: learning shape models from 3D CAD data. In: BMVC Google Scholar
  107. 107.
    Sternig S, Roth PM, Bischof H (2012) On-line inverse multiple instance boosting for classifier grids. Pattern Recognit Lett, 33(7):890–897 CrossRefGoogle Scholar
  108. 108.
    Sugiyama M, Rubens N (2008) Active learning with model selection in linear regression. In: DMKD Google Scholar
  109. 109.
    Talwalkar A, Kumar S, Rowley H (2008) Large-scale manifold learning. In: CVPR, June 2008, pp 1–8 Google Scholar
  110. 110.
    Tong W, Jin R (2007) Semi-supervised learning by mixed label propagation. In: AAAI, vol 22 Google Scholar
  111. 111.
    Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2:45–66 Google Scholar
  112. 112.
    Tong H, He J, Li M, Zhang C, Ma WY (2005) Graph based multi-modality learning. In: ACM multimedia Google Scholar
  113. 113.
    Torralba A (2011) Unbiased look at dataset bias. In: CVPR Google Scholar
  114. 114.
    Torralba BA, Russell BC, Yuen J (2010) LabelMe: online image annotation and applications. In: Proc IEEE Google Scholar
  115. 115.
    Tsang IW, Kwok JT (2006) Large-scale sparsified manifold regularization. In: NIPS Google Scholar
  116. 116.
    Tsuda K, Shin H, Schoelkopf B (2005) Fast protein classification with multiple networks. Bioinformatics 21:59–65 CrossRefGoogle Scholar
  117. 117.
    Vedaldi A, Gulshan V, Varma M, Zisserman A (2009) Multiple kernels for object detection. In: ICCV, pp 606–613 Google Scholar
  118. 118.
    Vernon D (2005) A research roadmap of cognitive vision. Technical report, ECVision: the European research network for cognitive computer vision systems Google Scholar
  119. 119.
    Von Luxburg U, Radl A, Hein M (2010) Getting lost in space: large sample analysis of the commute distance. In: NIPS Google Scholar
  120. 120.
    Wang L, Chan KL, Zhang Z (2003) Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval. In: CVPR. IEEE Comput. Soc., Los Alamitos Google Scholar
  121. 121.
    Wang X, Han TX, Yan S (2009) An HOG-LBP human detector with partial occlusion handling. In: ICCV, September 2009. IEEE Google Scholar
  122. 122.
    Wang G, Wang B, Yang X, Yu G (2012) Efficiently indexing large sparse graphs for similarity search. IEEE Trans Knowl Data Eng 24(3):440–451 CrossRefzbMATHGoogle Scholar
  123. 123.
    Weber M, Welling M, Perona P (2000) Unsupervised learning of models for recognition. In: ECCV Google Scholar
  124. 124.
    Welinder P, Branson S, Belongie S, Perona P (2010) The multidimensional wisdom of crowds. In: NIPS, pp 1–9 Google Scholar
  125. 125.
    Wiskott L, von der Malsburg C (1993) A neural system for the recognition of partially occluded objects in cluttered scenes. Int J Pattern Recognit Artif Intell 7(4):935–948 CrossRefGoogle Scholar
  126. 126.
    Yang L (2006) Distance metric learning: a comprehensive survey. Technical report, Michigan State University Google Scholar
  127. 127.
    Yang X, Bai X, Köknar-Tezel S, Latecki LJ (2013) Densifying distance spaces for shape and image retrieval. J Math Imaging Vis 46:12–28 CrossRefGoogle Scholar
  128. 128.
    Zaki SR, Nosofsky RM (2007) A high-distortion enhancement effect in the prototype-learning paradigm: dramatic effects of category learning during test. Mem Cogn 35(8):2088–2096 CrossRefGoogle Scholar
  129. 129.
    Zaki SR, Nosofsky RM, Stanton RD, Cohen AL (2003) Prototype and exemplar accounts of category learning and attentional allocation: a reassessment. J Exp Psychol Learn Mem Cogn 29(6):1160–1173 CrossRefGoogle Scholar
  130. 130.
    Zhang Z, Zha H, Zhang M (2008) Spectral methods for semi-supervised manifold learning. In: CVPR Google Scholar
  131. 131.
    Zhang K, Kwok JT, Parvin B (2009) Prototype vector machine for large scale semi-supervised learning. In: ICML. ACM Press, New York Google Scholar
  132. 132.
    Zhou D, Bousquet O, Navin Lal T, Weston J, Schölkopf B (2004) Learning with local and global consistency. In: NIPS Google Scholar
  133. 133.
    Zhu X, Goldberg AB, Khot T (2009) Some new directions in graph-based semi-supervised learning. In: ICME Google Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  1. 1.Max Planck Institute for InformaticsSaarbrückenGermany

Personalised recommendations