Psychonomic Bulletin & Review

, Volume 15, Issue 2, pp 256–271 | Cite as

Generalization and similarity in exemplar models of categorization: Insights from machine learning

  • Frank Jäkel
  • Bernhard Schölkopf
  • Felix A. Wichmann
Theoretical and Review Articles
  • 202 Downloads

Abstract

Exemplar theories of categorization depend on similarity for explaining subjects’ ability to generalize to new stimuli. A major criticism of exemplar theories concerns their lack of abstraction mechanisms and thus, seemingly, of generalization ability. Here, we use insights from machine learning to demonstrate that exemplar models can actually generalize very well. Kernel methods in machine learning are akin to exemplar models and are very successful in real-world applications. Their generalization performance depends crucially on the chosen similarity measure. Although similarity plays an important role in describing generalization behavior, it is not the only factor that controls generalization performance. In machine learning, kernel methods are often combined with regularization techniques in order to ensure good generalization. These same techniques are easily incorporated in exemplar models. We show that the generalized context model (Nosofsky, 1986) and ALCOVE (Kruschke, 1992) are closely related to a statistical model called kernel logistic regression. We argue that generalization is central to the enterprise of understanding categorization behavior, and we suggest some ways in which insights from machine learning can offer guidance.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aizerman, M. A., Braverman, E. M., & Rozonoer, L. I (1964). The probability problem of pattern recognition learning and the method of potential functions. Automation & Remote Control, 25, 1175–1190.Google Scholar
  2. Alfonso-Reese, L. A., Ashby, F. G., & Brainard, D. H. (2002). What makes a categorization task difficult? Perception & Psychophysics, 64, 570–583.CrossRefGoogle Scholar
  3. Ashby, F. G., & Alfonso-Reese, L. A. (1995). Categorization as probability density estimation. Journal of Mathematical Psychology, 39, 216–233.CrossRefGoogle Scholar
  4. Ashby, F. G., & Gott, R. E. (1988). Decision rules in the perception and categorization of multidimensional stimuli. Journal of Experimental Psychology: Learning, Memory, & Cognition, 14, 33–53.CrossRefGoogle Scholar
  5. Ashby, F. G., & Maddox, W. T. (1992). Complex decision rules in categorization: Contrasting novice and experienced performance. Journal of Experimental Psychology: Human Perception & Performance, 18, 50–71.CrossRefGoogle Scholar
  6. Ashby, F. G., & Maddox, W. T. (1993). Relations between prototype, exemplar, and decision bound models of categorization. Journal of Mathematical Psychology, 37, 372–400.CrossRefGoogle Scholar
  7. Ashby, F. G., Waldron, E. M., Lee, W. W., & Berkman, A. (2001). Suboptimality in human categorization and identification. Journal of Experimental Psychology: General, 130, 77–96.CrossRefGoogle Scholar
  8. Beals, R., Krantz, D. H., & Tversky, A. (1968). Foundations of multidimensional scaling. Psychological Review, 75, 127–142.PubMedCrossRefGoogle Scholar
  9. Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press, Clarendon Press.Google Scholar
  10. Bousquet, O., & Elisseeff, A. (2002). Stability and generalization. Journal of Machine Learning Research, 2, 499–526.Google Scholar
  11. Bradley, R. A. (1976). Science, statistics, and paired comparisons. Biometrics, 32, 213–239.PubMedCrossRefGoogle Scholar
  12. Briscoe, E., & Feldman, J. (2006). Conceptual complexity and the bias-variance tradeoff. In R. Sun, N. Miyake, & C. Schunn (Eds.), Proceedings of the 28th Annual Conference of the Cognitive Science Society (pp. 1038–1043). Mahwah, NJ: Erlbaum.Google Scholar
  13. Brown, J. S. (1965). Generalization and discrimination. In D. I. Mostofsky (Ed.), Stimulus generalization (pp. 7–23). Stanford: Stanford University Press.Google Scholar
  14. Bülthoff, H. H., & Edelman, S. (1992). Psychophysical support for a two-dimensional view interpolation theory of object recognition. Proceedings of the National Academy of Sciences, 89, 60–64.CrossRefGoogle Scholar
  15. Bush, R. R., & Mosteller, F. (1951). A model for stimulus generalization and discrimination. Psychological Review, 58, 413–423.PubMedCrossRefGoogle Scholar
  16. Chater, N., & Vitányi, P. M. B. (2003). The generalized universal law of generalization. Journal of Mathematical Psychology, 47, 346–369.CrossRefGoogle Scholar
  17. Cristianini, N., & Schölkopf, B. (2002). Support vector machines and kernel methods: The new generation of learning machines. AI Magazine, 23(3), 31–42.Google Scholar
  18. David, H. A. (1988). The method of paired comparisons (2nd ed.). London: Griffin.Google Scholar
  19. Fass, D., & Feldman, J. (2003). Categorization under complexity: A unified MDL account of human learning of regular and irregular categories. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems 15 (pp. 35–42). Cambridge, MA: MIT Press.Google Scholar
  20. Feldman, J. (2000). Minimization of Boolean complexity in human concept learning. Nature, 407, 630–633.PubMedCrossRefGoogle Scholar
  21. Fried, L. S., & Holyoak, K. J. (1984). Induction of category distributions: A framework for classification learning. Journal of Experimental Psychology: Learning, Memory, & Cognition, 10, 234–257.CrossRefGoogle Scholar
  22. Garner, W. R. (1974). The processing of information and structure. Potomac, MD: Erlbaum.Google Scholar
  23. Ghirlanda, S., & Enquist, M. (2003). A century of generalization. Animal Behaviour, 66, 15–36.CrossRefGoogle Scholar
  24. Graf, A. B. A., & Wichmann, F. A. (2004). Insights from machine learning applied to human visual classification. In S. Thrun, L. K. Saul, & B. Schölkopf (Eds.), Advances in neural information processing systems 16 (pp. 905–912). Cambridge, MA: MIT Press.Google Scholar
  25. Graf, A. B. A., Wichmann, F. A., Bülthoff, H. H., & Schölkopf, B. (2006). Classification of faces in man and machine. Neural Computation, 18, 143–165.PubMedCrossRefGoogle Scholar
  26. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.Google Scholar
  27. Jäkel, F., Schölkopf, B., & Wichmann, F. A. (2007). A tutorial on kernel methods for categorization. Journal of Mathematical Psychology, 51, 343–358.CrossRefGoogle Scholar
  28. Jäkel, F., Schölkopf, B., & Wichmann, F. A. (2008). Similarity, kernels, and the triangle inequality. Manuscript submitted for publication.Google Scholar
  29. Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22–44.PubMedCrossRefGoogle Scholar
  30. Lamberts, K. (1994). Flexible tuning of similarity in exemplar-based categorization. Journal of Experimental Psychology: Learning, Memory, & Cognition, 20, 1003–1021.CrossRefGoogle Scholar
  31. Logothetis, N. K., Pauls, J., Bülthoff, H. H., & Poggio, T. (1994). View-dependent object recognition by monkeys. Current Biology, 4, 401–414.PubMedCrossRefGoogle Scholar
  32. Logothetis, N. K., Pauls, J., & Poggio, T. (1995). Shape representation in the inferior temporal cortex of monkeys. Current Biology, 5, 552–563.PubMedCrossRefGoogle Scholar
  33. Love, B. C., Medin, D. L., & Gureckis, T. M. (2004). SUSTAIN: A network model of category learning. Psychological Review, 111, 309–332.PubMedCrossRefGoogle Scholar
  34. Luce, R. D. (1959). Individual choice behavior: A theoretical analysis. New York: Wiley.Google Scholar
  35. Luce, R. D. (1963). Detection and recognition. In R. D. Luce, R. R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology (Vol. 1, pp. 103–189). New York: Wiley.Google Scholar
  36. Luce, R. D. (1977). The choice axiom after twenty years. Journal of Mathematical Psychology, 15, 215–233.CrossRefGoogle Scholar
  37. McKinley, S. C., & Nosofsky, R. M. (1995). Investigations of exemplar and decision bound models in large, ill-defined category structures. Journal of Experimental Psychology: Human Perception & Performance, 21, 128–148.CrossRefGoogle Scholar
  38. McKinley, S. C., & Nosofsky, R. M. (1996). Selective attention and the formation of linear decision boundaries. Journal of Experimental Psychology: Human Perception & Performance, 22, 294–317.CrossRefGoogle Scholar
  39. Medin, D. L., Goldstone, R. L., & Gentner, D. (1993). Respects for similarity. Psychological Review, 100, 254–278.CrossRefGoogle Scholar
  40. Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207–238.CrossRefGoogle Scholar
  41. Mostofsky, D. I. (Ed.) (1965). Stimulus generalization. Stanford: Stanford University Press.Google Scholar
  42. Navarro, D. J. (2002). Representing stimulus similarity. Unpublished doctoral dissertation, University of Adelaide, Adelaide, Australia.Google Scholar
  43. Navarro, D. J. (2007). On the interaction between exemplar-based concepts and a response scaling process. Journal of Mathematical Psychology, 51, 85–98.CrossRefGoogle Scholar
  44. Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology: General, 115, 39–57.CrossRefGoogle Scholar
  45. Nosofsky, R. M. (1987). Attention and learning processes in the identification and categorization of integral stimuli. Journal of Experimental Psychology: Learning, Memory, & Cognition, 13, 87–108.CrossRefGoogle Scholar
  46. Nosofsky, R. M. (1990). Relations between exemplar-similarity and likelihood models of classification. Journal of Mathematical Psychology, 34, 393–418.CrossRefGoogle Scholar
  47. Nosofsky, R. M. (1991a). Tests of an exemplar model for relating perceptual classification and recognition memory. Journal of Experimental Psychology: Human Perception & Performance, 17, 3–27.CrossRefGoogle Scholar
  48. Nosofsky, R. M. (1991b). Typicality in logically defined categories: Exemplar-similarity versus rule instantiation. Memory & Cognition, 19, 131–150.CrossRefGoogle Scholar
  49. Nosofsky, R. M. (1992). Exemplar-based approach to relating categorization, identification, and recognition. In F. G. Ashby (Ed.), Multidimensional models of perception and cognition (pp. 363–393). Hillsdale, NJ: Erlbaum.Google Scholar
  50. Nosofsky, R. M., & Zaki, S. R. (2002). Exemplar and prototype models revisited: Response strategies, selective attention, and stimulus generalization. Journal of Experimental Psychology: Learning, Memory, & Cognition, 28, 924–940.CrossRefGoogle Scholar
  51. Ohl, F. W., Scheich, H., & Freeman, W. J. (2001). Change in the pattern of ongoing cortical activity with auditory category learning. Nature, 412, 733–736.PubMedCrossRefGoogle Scholar
  52. Op de Beeck, H., Wagemans, J., & Vogels, R. (2001). Inferotemporal neurons represent low-dimensional configurations of parameterized shapes. Nature Neuroscience, 4, 1244–1252.PubMedCrossRefGoogle Scholar
  53. Op de Beeck, H., Wagemans, J., & Vogels, R. (2004). A diverse stimulus representation underlies shape categorization by primates (Abstract). Journal of Vision, 4(8), 518a.CrossRefGoogle Scholar
  54. Orr, G. B., & Müller, K.-R. (Eds.) (1998). Neural networks: Tricks of the trade. Berlin: Springer.Google Scholar
  55. Palmeri, T. J., & Gauthier, I. (2004). Visual object understanding. Nature Reviews Neuroscience, 5, 291–303.PubMedCrossRefGoogle Scholar
  56. Pitt, M. A., Myung, I. J., & Zhang, S. (2002). Toward a method of selecting among computational models of cognition. Psychological Review, 109, 472–491.PubMedCrossRefGoogle Scholar
  57. Poggio, T. (1990). A theory of how the brain might work. Cold Spring Harbor Symposia on Quantitative Biology, 55, 899–910.PubMedCrossRefGoogle Scholar
  58. Poggio, T., & Bizzi, E. (2004). Generalization in vision and motor control. Nature, 431, 768–774.PubMedCrossRefGoogle Scholar
  59. Poggio, T., & Edelman, S. (1990). A network that learns to recognize three-dimensional objects. Nature, 343, 263–266.PubMedCrossRefGoogle Scholar
  60. Poggio, T., & Girosi, F. (1989). A theory of networks for approximation and learning (Tech. Rep. No. A. I. Memo No. 1140). Cambridge, MA: MIT AI LAB & Center for Biological Information Processing Whitaker College.Google Scholar
  61. Poggio, T., Rifkin, R., Mukherjee, S., & Niyogi, P. (2004). General conditions for predictivity in learning theory. Nature, 428, 419–422.PubMedCrossRefGoogle Scholar
  62. Poggio, T., & Smale, S. (2003). The mathematics of learning: Dealing with data. Notices of the American Mathematical Society, 50, 537–544.Google Scholar
  63. Posner, M. I., & Keele, S. W. (1968). On the genesis of abstract ideas. Journal of Experimental Psychology, 77, 353–363.PubMedCrossRefGoogle Scholar
  64. Reed, S. K. (1972). Pattern recognition and categorization. Cognitive Psychology, 3, 382–407.CrossRefGoogle Scholar
  65. Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025.PubMedCrossRefGoogle Scholar
  66. Rosch, E., & Mervis, C. B. (1975). Family resemblances: Studies in the internal structure of categories. Cognitive Psychology, 7, 573–605.CrossRefGoogle Scholar
  67. Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8, 382–439.CrossRefGoogle Scholar
  68. Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–408.PubMedCrossRefGoogle Scholar
  69. Rosseel, Y. (2002). Mixture models of categorization. Journal of Mathematical Psychology, 46, 178–210.CrossRefGoogle Scholar
  70. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart, J. L. McClelland, & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1, pp. 318–362). Cambridge, MA: MIT Press, Bradford Books.Google Scholar
  71. Schoenberg, I. J. (1938). Metric spaces and positive definite functions. Transactions of the American Mathematical Society, 44, 522–536.CrossRefGoogle Scholar
  72. Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press.Google Scholar
  73. Shepard, R. N. (1957). Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space. Psychometrika, 22, 325–345.CrossRefGoogle Scholar
  74. Shepard, R. N. (1958). Stimulus and response generalization: Deduction of the generalization gradient from a trace model. Psychological Review, 65, 242–256.PubMedCrossRefGoogle Scholar
  75. Shepard, R. N. (1962). The analysis of proximities: Multidimensional scaling with an unknown distance function. Part I. Psychometrika, 27, 125–140.CrossRefGoogle Scholar
  76. Shepard, R. N. (1964). Attention and the metric structure of the stimulus space. Journal of Mathematical Psychology, 1, 54–87.CrossRefGoogle Scholar
  77. Shepard, R. N. (1965). Approximation to uniform gradients of generalization by monotone transformations of scale. In D. I. Mostofsky (Ed.), Stimulus generalization (pp. 94–110). Stanford: Stanford University Press.Google Scholar
  78. Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237, 1317–1323.PubMedCrossRefGoogle Scholar
  79. Shepard, R. N., & Chang, J.-J. (1963). Stimulus generalization in the learning of classifications. Journal of Experimental Psychology, 65, 94–102.PubMedCrossRefGoogle Scholar
  80. Shepard, R. N., Hovland, C. I., & Jenkins, H. M. (1961). Learning and memorization of classifications. Psychological Monographs, 75(13, Whole No. 517), 1–42.CrossRefGoogle Scholar
  81. Sigala, N., Gabbiani, F., & Logothetis, N. K. (2002). Visual categorization and object representation in monkeys and humans. Journal of Cognitive Neuroscience, 14, 187–198.PubMedCrossRefGoogle Scholar
  82. Sigala, N., & Logothetis, N. K. (2002). Visual categorization shapes feature selectivity in the primate temporal cortex. Nature, 415, 318–320.PubMedCrossRefGoogle Scholar
  83. Smith, J. D., & Minda, J. P. (1998). Prototypes in the mist: The early epochs of category learning. Journal of Experimental Psychology: Learning, Memory, & Cognition, 24, 1411–1436.CrossRefGoogle Scholar
  84. Smith, J. D., & Minda, J. P. (2000). Thirty categorization results in search of a model. Journal of Experimental Psychology: Learning, Memory, & Cognition, 26, 3–27.CrossRefGoogle Scholar
  85. Spence, K. W. (1937). The differential response in animals to stimuli varying within a single dimension. Psychological Review, 44, 430–444.CrossRefGoogle Scholar
  86. Tenenbaum, J. B., & Griffiths, T. L. (2001). Generalization, similarity and Bayesian inference. Behavioral & Brain Sciences, 24, 629–640.Google Scholar
  87. Train, K. E. (2003). Discrete choice methods with simulation. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  88. Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327–352.CrossRefGoogle Scholar
  89. Tversky, A., & Gati, I. (1982). Similarity, separability, and the triangle inequality. Psychological Review, 89, 123–154.PubMedCrossRefGoogle Scholar
  90. Vapnik, V. N. (2000). The nature of statistical learning theory (2nd ed.). New York: Springer.CrossRefGoogle Scholar
  91. Verguts, T., Ameel, E., & Storms, G. (2004). Measures of similarity in models of categorization. Memory & Cognition, 32, 379–389.CrossRefGoogle Scholar
  92. Wichmann, F. A., Graf, A. B. A., Simoncelli, E. P., Bülthoff, H. H., & Schölkopf, B. (2005). Machine learning applied to perception: Decision images for gender classification. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems 17 (pp. 1489–1496). Cambridge, MA: MIT Press.Google Scholar

Copyright information

© Psychonomic Society, Inc. 2008

Authors and Affiliations

  • Frank Jäkel
    • 3
    • 1
  • Bernhard Schölkopf
    • 2
  • Felix A. Wichmann
    • 3
    • 1
  1. 1.Bernstein Center for Computational NeuroscienceBerlinGermany
  2. 2.Max Planck Institute for Biological CyberneticsTübingenGermany
  3. 3.Technische Universität BerlinBerlinGermany

Personalised recommendations