Categorical Perception

  • Mario Fritz
  • Mykhaylo Andriluka
  • Sanja Fidler
  • Michael Stark
  • Aleš Leonardis
  • Bernt Schiele
Part of the Cognitive Systems Monographs book series (COSMOS, volume 8)


The ability to recognize and categorize entities in its environment is a vital competence of any cognitive system. Reasoning about the current state of the world, assessing consequences of possible actions, as well as planning future episodes requires a concept of the roles that objects and places may possibly play. For example, objects afford to be used in specific ways, and places are usually devoted to certain activities. The ability to represent and infer these roles, or, more generally, categories, from sensory observations of the world, is an important constituent of a cognitive system’s perceptual processing (Section 1.3 elaborates on this with a very visual example).


Object Class Object Representation Topic Model Categorical Perception Topic Distribution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Burl, M.C., Perona, P.: Recognition of planar object classes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 1996), p. 223. IEEE Computer Society, San Francisco (1996)Google Scholar
  2. 2.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR 2003, pp. 264–271 (2003)Google Scholar
  3. 3.
    Leibe, B., Seemann, E., Schiele, B.: Pedestrian detection in crowded scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005) [86],
  4. 4.
    Varma, M., Ray, D.: Learning the discriminative power-invariance trade-off. In: IEEE International Conference on Computer Vision (ICCV 2007). IEEE Computer Society, Rio de Janeiro (2007)Google Scholar
  5. 5.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)CrossRefGoogle Scholar
  6. 6.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE International Conference on Computer Vision (ICCV 2005) [87]Google Scholar
  7. 7.
    Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision (IJCV) 77(1-3), 259–289 (2008), CrossRefGoogle Scholar
  8. 8.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2001), pp. 511–518. IEEE Computer Society, Kauai (2001)Google Scholar
  9. 9.
    Winn, J.M., Jojic, N.: Locus: Learning object classes with unsupervised segmentation. In: IEEE International Conference on Computer Vision (ICCV 2005) [87], pp. 756–763Google Scholar
  10. 10.
    Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 18–32. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  11. 11.
    Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their locations in images. In: IEEE International Conference on Computer Vision (ICCV 2005) [87]Google Scholar
  12. 12.
    Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: IEEE International Conference on Computer Vision (ICCV 2005) [87]Google Scholar
  13. 13.
    Ettinger, G.J.: Hierarchical object recognition using libraries of parameterized model sub-parts, Tech. rep., MIT (1987)Google Scholar
  14. 14.
    Tsotsos, J.K.: Analyzing vision at the complexity level. Behavioral and Brain Sciences 13(3), 423–469 (1990)Google Scholar
  15. 15.
    Mel, B.W., Fiser, J.: Minimizing binding errors using learned conjunctive features. Neural Computation 12(4), 731–762 (2000)CrossRefGoogle Scholar
  16. 16.
    Amit, Y., Geman, D.: A computational model for visual selection. Neural Comp. 11(7), 1691–1715 (1999)CrossRefGoogle Scholar
  17. 17.
    Amit, Y.: 2d Object Detection and Recognition: Models, Algorithms and Networks. MIT Press, Cambridge (2002)Google Scholar
  18. 18.
    Geman, S., Potter, D., Chi, Z.: Composition systems. Quarterly of Applied Mathematics 60(4), 707–736 (2002)zbMATHMathSciNetGoogle Scholar
  19. 19.
    Fidler, S., Berginc, G., Leonardis, A.: Hierarchical statistical learning of generic parts of object structure. In: CVPR, pp. 182–189 (2006),
  20. 20.
    Fidler, S., Leonardis, A.: Towards scalable representations of visual categories: Learning a hierarchy of parts. In: CVPR 2007 (2007),
  21. 21.
    Mikolajczyk, K., Leibe, B., Schiele, B.: Multiple object class detection with a generative model. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2006) [88], pp. 26–36,
  22. 22.
    Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing visual features for multiclass and multiview object detection. IEEE Trans. Pattern Analysis and Machine Intelligence 29(5)Google Scholar
  23. 23.
    Jaakkola, T.S., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Proceedings of the 1998 conference on Advances in neural information processing systems II, pp. 487–493. MIT Press, Cambridge (1999)Google Scholar
  24. 24.
    Fritz, M., Leibe, B., Caputo, B., Schiele, B.: Integrating representative and discriminant models for object category detection. In: IEEE International Conference on Computer Vision (ICCV 2005) [87],
  25. 25.
    Sudderth, E., Torralba, A., Freeman, W., Willsky, A.: Learning hierarchical models of scenes, objects, and parts. In: ICCV 2005, pp. 1331–1338 (2005)Google Scholar
  26. 26.
    Ommer, B., Buhmann, J.M.: Learning the compositional nature of visual objects. In: CVPR 2007 (2007)Google Scholar
  27. 27.
    Fleuret, F., Geman, D.: Coarse-to-fine face detection. IJCV 41(1/2), 85–107 (2001)zbMATHCrossRefGoogle Scholar
  28. 28.
    Fukushima, K., Miyake, S., Ito, T.: Neocognitron: a neural network model for a mechanism of visual pattern recognition. IEEE Systems, Man and Cybernetics 13(3), 826–834 (1983)Google Scholar
  29. 29.
    Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T.: Object recognition with cortex-like mechanisms. PAMI 29(3), 411–426 (2007)Google Scholar
  30. 30.
    Scalzo, F., Piater, J.H.: Statistical learning of visual feature hierarchies. In: Workshop on Learning, CVPR (2005)Google Scholar
  31. 31.
    Ullman, S., Epshtein, B.: Visual Classification by a Hierarchy of Extended Features. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Toward Category-Level Object Recognition. LNCS, vol. 4170, pp. 321–344. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  32. 32.
    Agarwal, A., Triggs, B.: Hyperfeatures - multilevel local coding for visual recognition. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 30–43. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  33. 33.
    Ranzato, M.A., Huang., F.-J., Boureau, Y.-L., LeCun, Y.: Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: CVPR 2007 (2007)Google Scholar
  34. 34.
    Bienenstock, E., Geman, S.: Compositionality in neural systems. In: Arbib, M. (ed.) The Handbook of Brain Theory and Neural Networks, pp. 223–226. MIT Press, Cambridge (1995)Google Scholar
  35. 35.
    Zhu, S., Mumford, D.: Quest for a stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision 2(4), 259–362 (2007)CrossRefGoogle Scholar
  36. 36.
    Califano, A., Mohan, R.: Multidimensional indexing for recognizing visual shapes. Pattern Analysis and Machine Intelligence 16(4), 373–392 (1994)CrossRefGoogle Scholar
  37. 37.
    Tsunoda, K., Yamane, Y., Nishizaki, M., Tanifuji, M.: Complex objects are represented in macaque inferotemporal cortex by the combination of feature columns. Nature Neuroscience (4), 832–838 (2001)Google Scholar
  38. 38.
    Brincat, S., Connor, C.: Dynamic shape synthesis in posterior inferotemporal cortex. Neuron 49(1), 17–24 (2006)CrossRefGoogle Scholar
  39. 39.
    Barlow, H.B.: Cerebral cortex as a model builder. In: Rose, D., Dobson, V. (eds.) Models of the Visual Cortex, pp. 37–46. John Wiley, Chichester (1985)Google Scholar
  40. 40.
    Rolls, E.T., Deco, G.: Computational Neuroscience of Vision. Oxford Univ. Press, Oxford (2002)Google Scholar
  41. 41.
    Edelman, S., Intrator, N.: Towards structural systematicity in distributed, statically bound visual representations. Cognitive Science 27, 73–110 (2003)CrossRefGoogle Scholar
  42. 42.
    Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: IEEE CVPR 2004, Workshop on Generative-Model Based Vision (2004)Google Scholar
  43. 43.
    Mutch, J., Lowe, D.G.: Multiclass object recognition with sparse, localized features. In: CVPR 2006, pp. 11–18 (2006)Google Scholar
  44. 44.
    Wolf, L., Bileschi, S., Meyers, E.: Perception strategies in hierarchical vision systems. In: CVPR 2006, pp. 2153–2160 (2006)Google Scholar
  45. 45.
    Csurka, G., Dance, C., Fan, L., Willarnowski, J., Bray, C.: Visual categorization with bags of keypoints. In: SLCV (2004)Google Scholar
  46. 46.
    Mundy, J.L.: Object recognition in the geometric era: A retrospective. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Toward Category-Level Object Recognition. LNCS, vol. 4170, pp. 3–28. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  47. 47.
    Ferrari, V., Fevrier, L., Jurie, F., Schmid, C.: Groups of adjacent contour segments for object detection, Rapport De Recherche InriaGoogle Scholar
  48. 48.
    Opelt, A., Pinz, A., Zisserman, A.: A boundary-fragment-model for object detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part II. LNCS, vol. 3952, pp. 575–588. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  49. 49.
    Stark, M., Schiele, B.: How good are local features for classes of geometric objects. In: ICCV (2007),
  50. 50.
    Berg, A.C., Malik, J.: Geometric blur for template matching. In: CVPR, pp. 607–614 (2001)Google Scholar
  51. 51.
    Belongie, S., Malik, J., Puzicha, J.: Shape context: A new descriptor for shape matching and object recognition. In: NIPS, pp. 831–837 (2000)Google Scholar
  52. 52.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. PAMI 27(10), 1615–1630 (2005)Google Scholar
  53. 53.
    Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. IJCV 60(1), 63–86 (2004)CrossRefGoogle Scholar
  54. 54.
    Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.J.V.: A comparison of affine region detectors. IJCV 65(1-2), 43–72 (2005)CrossRefGoogle Scholar
  55. 55.
    Kadir, T., Zisserman, A., Brady, M.: An affine invariant salient region detector. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 228–241. Springer, Heidelberg (2004)Google Scholar
  56. 56.
    Ferrari, V., Tuytelaars, T., Gool, L.J.V.: Object detection by contour segment networks. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 14–28. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  57. 57.
    Martin, D.R., Fowlkes, C., Malik, J.: Learning to detect natural image boundaries using local brightness, color, and texture cues. PAMI 26(5), 530–549 (2004)Google Scholar
  58. 58.
    Berg, A.C., Berg, T.L., Malik, J.: Shape matching and object recognition using low distortion correspondences. In: CVPR 2005 (2005)Google Scholar
  59. 59.
    Morrone, M., Burr, D.: Feature detection in human vision: a phase dependent energy model. Proc. Royal Soc. London Bulletin, 221–245 (1988)Google Scholar
  60. 60.
    Harris, C., Stephens, M.J.: A combined corner and edge detector. In: Alvey Conference, pp. 147–152 (1988)Google Scholar
  61. 61.
    Mikolajczyk, K., Leibe, B., Schiele, B.: Local features for object class recognition. In: ICCV [87], pp. 1792–1799,
  62. 62.
    McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI, Workshop on Learning for Text Categorization (1998)Google Scholar
  63. 63.
    Torralba, A., Murphy, K., Freeman, W.: Sharing features: efficient boosting procedures for multiclass object detection. In: CVPR (2004)Google Scholar
  64. 64.
    Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine LearningGoogle Scholar
  65. 65.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning ResearchGoogle Scholar
  66. 66.
    Griffiths, T.L., Steyvers, M.: Finding scientific topics. In: PNAS USAGoogle Scholar
  67. 67.
    Steyvers, M., Griffiths, T.L.: Probabilistic topic models. In: Handbook of Latent Semantic Analysis. Lawrence Erlbaum Associates, Mahwah (2007)Google Scholar
  68. 68.
    Everingham, M., Zisserman, A., Williams, C.K.I., Van Gool, L.: The PASCAL Visual Object Classes Challenge 2006 (VOC 2006) (2006),
  69. 69.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001) Software,
  70. 70.
    Chum, O., Zisserman, A.: An exemplar model for learning object classes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2007). IEEE Computer Society, Minneapolis (2007)Google Scholar
  71. 71.
    Fritz, M., Schiele, B.: Decomposition, discovery and detection of visual categories using topic models. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008) [89] (to appear),
  72. 72.
    Leibe, B., Seemann, E., Schiele, B.: Pedestrian detection in crowded scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005) [86],
  73. 73.
    Lawrence, N.D., Moore, A.J.: Hierarchical Gaussian process latent variable models. In: ICML 2007 (2007)Google Scholar
  74. 74.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. IJCV 61, 55–79 (2007)CrossRefGoogle Scholar
  75. 75.
    Williams, C.K.I., Allan, M.: On a connection between object localization with a generative template of features and pose-space prediction methods. Tech. Rep. EDI-INF-RR-0719, University of Edinburgh (2006)Google Scholar
  76. 76.
    Urtasun, R., Fleet, D.J., Fua, P.: 3D people tracking with Gaussian process dynamical models. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2006) [88]Google Scholar
  77. 77.
    Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models. In: NIPS 2005 (2005)Google Scholar
  78. 78.
    Sminchisescu, C., Kanaujia, A., Metaxas, D.N.: BM3E: Discriminative density propagation for visual tracking. PAMI 29, 2030–2044 (2007)Google Scholar
  79. 79.
    Lawrence, N.D.: Probabilistic non-linear principal component analysis with Gaussian process latent variable models. JMLR 6, 1783–1816 (2005)MathSciNetGoogle Scholar
  80. 80.
    Deutscher, J., Reid, I.: Articulated body motion capture by stochastic search. IJCV 61, 185–205 (2005)CrossRefGoogle Scholar
  81. 81.
    Demirdjian, D., Taycher, L., Shakhnarovich, G., Grauman, K., Darrell, T.: Avoiding the ”streetlight effect”: Tracking by exploring likelihood modes. In: IEEE International Conference on Computer Vision (ICCV 2005) [87]Google Scholar
  82. 82.
    Sigal, L., Black, M.J.: Measure locally, reason globally: Occlusion-sensitive articulated pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2006) [88]Google Scholar
  83. 83.
    Ramanan, D., Forsyth, D.A., Zisserman, A.: Tracking people by learning their appearance. PAMI 29, 65–81 (2007)Google Scholar
  84. 84.
    Grochow, K., Martin, S.L., Hertzmann, A., Popovic, Z.: Style-based inverse kinematics. In: SIGGRAPH (2004)Google Scholar
  85. 85.
    Andriluka, M., Roth, S., Schiele, B.: People-tracking-by-detection and people-detection-by-tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008) [89] (to appear)Google Scholar
  86. 86.
    IEEE Computer Society, San Diego, CA, USA (2005)Google Scholar
  87. 87.
    IEEE Computer Society, Beijing, China (2005)Google Scholar
  88. 88.
    IEEE Computer Society, New York, NY, USA (2006)Google Scholar
  89. 89.
    IEEE Computer Society, Anchorage, Alaska, USA (2008) (to appear)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Mario Fritz
    • 1
  • Mykhaylo Andriluka
    • 1
  • Sanja Fidler
    • 2
  • Michael Stark
    • 1
  • Aleš Leonardis
    • 2
  • Bernt Schiele
    • 1
  1. 1.Technische Universität DarmstadtDarmstadtGermany
  2. 2.VICOS LabUniversity of LjubljanaSlovenia

Personalised recommendations