Generalization and similarity in exemplar models of categorization: Insights from machine learning

Abstract

Exemplar theories of categorization depend on similarity for explaining subjects’ ability to generalize to new stimuli. A major criticism of exemplar theories concerns their lack of abstraction mechanisms and thus, seemingly, of generalization ability. Here, we use insights from machine learning to demonstrate that exemplar models can actually generalize very well. Kernel methods in machine learning are akin to exemplar models and are very successful in real-world applications. Their generalization performance depends crucially on the chosen similarity measure. Although similarity plays an important role in describing generalization behavior, it is not the only factor that controls generalization performance. In machine learning, kernel methods are often combined with regularization techniques in order to ensure good generalization. These same techniques are easily incorporated in exemplar models. We show that the generalized context model (Nosofsky, 1986) and ALCOVE (Kruschke, 1992) are closely related to a statistical model called kernel logistic regression. We argue that generalization is central to the enterprise of understanding categorization behavior, and we suggest some ways in which insights from machine learning can offer guidance.

This is a preview of subscription content, log in to check access.

References

  1. Aizerman, M. A., Braverman, E. M., & Rozonoer, L. I (1964). The probability problem of pattern recognition learning and the method of potential functions. Automation & Remote Control, 25, 1175–1190.

    Google Scholar 

  2. Alfonso-Reese, L. A., Ashby, F. G., & Brainard, D. H. (2002). What makes a categorization task difficult? Perception & Psychophysics, 64, 570–583.

    Article  Google Scholar 

  3. Ashby, F. G., & Alfonso-Reese, L. A. (1995). Categorization as probability density estimation. Journal of Mathematical Psychology, 39, 216–233.

    Article  Google Scholar 

  4. Ashby, F. G., & Gott, R. E. (1988). Decision rules in the perception and categorization of multidimensional stimuli. Journal of Experimental Psychology: Learning, Memory, & Cognition, 14, 33–53.

    Article  Google Scholar 

  5. Ashby, F. G., & Maddox, W. T. (1992). Complex decision rules in categorization: Contrasting novice and experienced performance. Journal of Experimental Psychology: Human Perception & Performance, 18, 50–71.

    Article  Google Scholar 

  6. Ashby, F. G., & Maddox, W. T. (1993). Relations between prototype, exemplar, and decision bound models of categorization. Journal of Mathematical Psychology, 37, 372–400.

    Article  Google Scholar 

  7. Ashby, F. G., Waldron, E. M., Lee, W. W., & Berkman, A. (2001). Suboptimality in human categorization and identification. Journal of Experimental Psychology: General, 130, 77–96.

    Article  Google Scholar 

  8. Beals, R., Krantz, D. H., & Tversky, A. (1968). Foundations of multidimensional scaling. Psychological Review, 75, 127–142.

    PubMed  Article  Google Scholar 

  9. Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press, Clarendon Press.

    Google Scholar 

  10. Bousquet, O., & Elisseeff, A. (2002). Stability and generalization. Journal of Machine Learning Research, 2, 499–526.

    Google Scholar 

  11. Bradley, R. A. (1976). Science, statistics, and paired comparisons. Biometrics, 32, 213–239.

    PubMed  Article  Google Scholar 

  12. Briscoe, E., & Feldman, J. (2006). Conceptual complexity and the bias-variance tradeoff. In R. Sun, N. Miyake, & C. Schunn (Eds.), Proceedings of the 28th Annual Conference of the Cognitive Science Society (pp. 1038–1043). Mahwah, NJ: Erlbaum.

    Google Scholar 

  13. Brown, J. S. (1965). Generalization and discrimination. In D. I. Mostofsky (Ed.), Stimulus generalization (pp. 7–23). Stanford: Stanford University Press.

    Google Scholar 

  14. Bülthoff, H. H., & Edelman, S. (1992). Psychophysical support for a two-dimensional view interpolation theory of object recognition. Proceedings of the National Academy of Sciences, 89, 60–64.

    Article  Google Scholar 

  15. Bush, R. R., & Mosteller, F. (1951). A model for stimulus generalization and discrimination. Psychological Review, 58, 413–423.

    PubMed  Article  Google Scholar 

  16. Chater, N., & Vitányi, P. M. B. (2003). The generalized universal law of generalization. Journal of Mathematical Psychology, 47, 346–369.

    Article  Google Scholar 

  17. Cristianini, N., & Schölkopf, B. (2002). Support vector machines and kernel methods: The new generation of learning machines. AI Magazine, 23(3), 31–42.

    Google Scholar 

  18. David, H. A. (1988). The method of paired comparisons (2nd ed.). London: Griffin.

    Google Scholar 

  19. Fass, D., & Feldman, J. (2003). Categorization under complexity: A unified MDL account of human learning of regular and irregular categories. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems 15 (pp. 35–42). Cambridge, MA: MIT Press.

    Google Scholar 

  20. Feldman, J. (2000). Minimization of Boolean complexity in human concept learning. Nature, 407, 630–633.

    PubMed  Article  Google Scholar 

  21. Fried, L. S., & Holyoak, K. J. (1984). Induction of category distributions: A framework for classification learning. Journal of Experimental Psychology: Learning, Memory, & Cognition, 10, 234–257.

    Article  Google Scholar 

  22. Garner, W. R. (1974). The processing of information and structure. Potomac, MD: Erlbaum.

    Google Scholar 

  23. Ghirlanda, S., & Enquist, M. (2003). A century of generalization. Animal Behaviour, 66, 15–36.

    Article  Google Scholar 

  24. Graf, A. B. A., & Wichmann, F. A. (2004). Insights from machine learning applied to human visual classification. In S. Thrun, L. K. Saul, & B. Schölkopf (Eds.), Advances in neural information processing systems 16 (pp. 905–912). Cambridge, MA: MIT Press.

    Google Scholar 

  25. Graf, A. B. A., Wichmann, F. A., Bülthoff, H. H., & Schölkopf, B. (2006). Classification of faces in man and machine. Neural Computation, 18, 143–165.

    PubMed  Article  Google Scholar 

  26. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.

    Google Scholar 

  27. Jäkel, F., Schölkopf, B., & Wichmann, F. A. (2007). A tutorial on kernel methods for categorization. Journal of Mathematical Psychology, 51, 343–358.

    Article  Google Scholar 

  28. Jäkel, F., Schölkopf, B., & Wichmann, F. A. (2008). Similarity, kernels, and the triangle inequality. Manuscript submitted for publication.

  29. Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22–44.

    PubMed  Article  Google Scholar 

  30. Lamberts, K. (1994). Flexible tuning of similarity in exemplar-based categorization. Journal of Experimental Psychology: Learning, Memory, & Cognition, 20, 1003–1021.

    Article  Google Scholar 

  31. Logothetis, N. K., Pauls, J., Bülthoff, H. H., & Poggio, T. (1994). View-dependent object recognition by monkeys. Current Biology, 4, 401–414.

    PubMed  Article  Google Scholar 

  32. Logothetis, N. K., Pauls, J., & Poggio, T. (1995). Shape representation in the inferior temporal cortex of monkeys. Current Biology, 5, 552–563.

    PubMed  Article  Google Scholar 

  33. Love, B. C., Medin, D. L., & Gureckis, T. M. (2004). SUSTAIN: A network model of category learning. Psychological Review, 111, 309–332.

    PubMed  Article  Google Scholar 

  34. Luce, R. D. (1959). Individual choice behavior: A theoretical analysis. New York: Wiley.

    Google Scholar 

  35. Luce, R. D. (1963). Detection and recognition. In R. D. Luce, R. R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology (Vol. 1, pp. 103–189). New York: Wiley.

    Google Scholar 

  36. Luce, R. D. (1977). The choice axiom after twenty years. Journal of Mathematical Psychology, 15, 215–233.

    Article  Google Scholar 

  37. McKinley, S. C., & Nosofsky, R. M. (1995). Investigations of exemplar and decision bound models in large, ill-defined category structures. Journal of Experimental Psychology: Human Perception & Performance, 21, 128–148.

    Article  Google Scholar 

  38. McKinley, S. C., & Nosofsky, R. M. (1996). Selective attention and the formation of linear decision boundaries. Journal of Experimental Psychology: Human Perception & Performance, 22, 294–317.

    Article  Google Scholar 

  39. Medin, D. L., Goldstone, R. L., & Gentner, D. (1993). Respects for similarity. Psychological Review, 100, 254–278.

    Article  Google Scholar 

  40. Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207–238.

    Article  Google Scholar 

  41. Mostofsky, D. I. (Ed.) (1965). Stimulus generalization. Stanford: Stanford University Press.

    Google Scholar 

  42. Navarro, D. J. (2002). Representing stimulus similarity. Unpublished doctoral dissertation, University of Adelaide, Adelaide, Australia.

    Google Scholar 

  43. Navarro, D. J. (2007). On the interaction between exemplar-based concepts and a response scaling process. Journal of Mathematical Psychology, 51, 85–98.

    Article  Google Scholar 

  44. Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology: General, 115, 39–57.

    Article  Google Scholar 

  45. Nosofsky, R. M. (1987). Attention and learning processes in the identification and categorization of integral stimuli. Journal of Experimental Psychology: Learning, Memory, & Cognition, 13, 87–108.

    Article  Google Scholar 

  46. Nosofsky, R. M. (1990). Relations between exemplar-similarity and likelihood models of classification. Journal of Mathematical Psychology, 34, 393–418.

    Article  Google Scholar 

  47. Nosofsky, R. M. (1991a). Tests of an exemplar model for relating perceptual classification and recognition memory. Journal of Experimental Psychology: Human Perception & Performance, 17, 3–27.

    Article  Google Scholar 

  48. Nosofsky, R. M. (1991b). Typicality in logically defined categories: Exemplar-similarity versus rule instantiation. Memory & Cognition, 19, 131–150.

    Article  Google Scholar 

  49. Nosofsky, R. M. (1992). Exemplar-based approach to relating categorization, identification, and recognition. In F. G. Ashby (Ed.), Multidimensional models of perception and cognition (pp. 363–393). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  50. Nosofsky, R. M., & Zaki, S. R. (2002). Exemplar and prototype models revisited: Response strategies, selective attention, and stimulus generalization. Journal of Experimental Psychology: Learning, Memory, & Cognition, 28, 924–940.

    Article  Google Scholar 

  51. Ohl, F. W., Scheich, H., & Freeman, W. J. (2001). Change in the pattern of ongoing cortical activity with auditory category learning. Nature, 412, 733–736.

    PubMed  Article  Google Scholar 

  52. Op de Beeck, H., Wagemans, J., & Vogels, R. (2001). Inferotemporal neurons represent low-dimensional configurations of parameterized shapes. Nature Neuroscience, 4, 1244–1252.

    PubMed  Article  Google Scholar 

  53. Op de Beeck, H., Wagemans, J., & Vogels, R. (2004). A diverse stimulus representation underlies shape categorization by primates (Abstract). Journal of Vision, 4(8), 518a.

    Article  Google Scholar 

  54. Orr, G. B., & Müller, K.-R. (Eds.) (1998). Neural networks: Tricks of the trade. Berlin: Springer.

    Google Scholar 

  55. Palmeri, T. J., & Gauthier, I. (2004). Visual object understanding. Nature Reviews Neuroscience, 5, 291–303.

    PubMed  Article  Google Scholar 

  56. Pitt, M. A., Myung, I. J., & Zhang, S. (2002). Toward a method of selecting among computational models of cognition. Psychological Review, 109, 472–491.

    PubMed  Article  Google Scholar 

  57. Poggio, T. (1990). A theory of how the brain might work. Cold Spring Harbor Symposia on Quantitative Biology, 55, 899–910.

    PubMed  Article  Google Scholar 

  58. Poggio, T., & Bizzi, E. (2004). Generalization in vision and motor control. Nature, 431, 768–774.

    PubMed  Article  Google Scholar 

  59. Poggio, T., & Edelman, S. (1990). A network that learns to recognize three-dimensional objects. Nature, 343, 263–266.

    PubMed  Article  Google Scholar 

  60. Poggio, T., & Girosi, F. (1989). A theory of networks for approximation and learning (Tech. Rep. No. A. I. Memo No. 1140). Cambridge, MA: MIT AI LAB & Center for Biological Information Processing Whitaker College.

    Google Scholar 

  61. Poggio, T., Rifkin, R., Mukherjee, S., & Niyogi, P. (2004). General conditions for predictivity in learning theory. Nature, 428, 419–422.

    PubMed  Article  Google Scholar 

  62. Poggio, T., & Smale, S. (2003). The mathematics of learning: Dealing with data. Notices of the American Mathematical Society, 50, 537–544.

    Google Scholar 

  63. Posner, M. I., & Keele, S. W. (1968). On the genesis of abstract ideas. Journal of Experimental Psychology, 77, 353–363.

    PubMed  Article  Google Scholar 

  64. Reed, S. K. (1972). Pattern recognition and categorization. Cognitive Psychology, 3, 382–407.

    Article  Google Scholar 

  65. Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025.

    PubMed  Article  Google Scholar 

  66. Rosch, E., & Mervis, C. B. (1975). Family resemblances: Studies in the internal structure of categories. Cognitive Psychology, 7, 573–605.

    Article  Google Scholar 

  67. Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8, 382–439.

    Article  Google Scholar 

  68. Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–408.

    PubMed  Article  Google Scholar 

  69. Rosseel, Y. (2002). Mixture models of categorization. Journal of Mathematical Psychology, 46, 178–210.

    Article  Google Scholar 

  70. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart, J. L. McClelland, & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1, pp. 318–362). Cambridge, MA: MIT Press, Bradford Books.

    Google Scholar 

  71. Schoenberg, I. J. (1938). Metric spaces and positive definite functions. Transactions of the American Mathematical Society, 44, 522–536.

    Article  Google Scholar 

  72. Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press.

    Google Scholar 

  73. Shepard, R. N. (1957). Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space. Psychometrika, 22, 325–345.

    Article  Google Scholar 

  74. Shepard, R. N. (1958). Stimulus and response generalization: Deduction of the generalization gradient from a trace model. Psychological Review, 65, 242–256.

    PubMed  Article  Google Scholar 

  75. Shepard, R. N. (1962). The analysis of proximities: Multidimensional scaling with an unknown distance function. Part I. Psychometrika, 27, 125–140.

    Article  Google Scholar 

  76. Shepard, R. N. (1964). Attention and the metric structure of the stimulus space. Journal of Mathematical Psychology, 1, 54–87.

    Article  Google Scholar 

  77. Shepard, R. N. (1965). Approximation to uniform gradients of generalization by monotone transformations of scale. In D. I. Mostofsky (Ed.), Stimulus generalization (pp. 94–110). Stanford: Stanford University Press.

    Google Scholar 

  78. Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237, 1317–1323.

    PubMed  Article  Google Scholar 

  79. Shepard, R. N., & Chang, J.-J. (1963). Stimulus generalization in the learning of classifications. Journal of Experimental Psychology, 65, 94–102.

    PubMed  Article  Google Scholar 

  80. Shepard, R. N., Hovland, C. I., & Jenkins, H. M. (1961). Learning and memorization of classifications. Psychological Monographs, 75(13, Whole No. 517), 1–42.

    Article  Google Scholar 

  81. Sigala, N., Gabbiani, F., & Logothetis, N. K. (2002). Visual categorization and object representation in monkeys and humans. Journal of Cognitive Neuroscience, 14, 187–198.

    PubMed  Article  Google Scholar 

  82. Sigala, N., & Logothetis, N. K. (2002). Visual categorization shapes feature selectivity in the primate temporal cortex. Nature, 415, 318–320.

    PubMed  Article  Google Scholar 

  83. Smith, J. D., & Minda, J. P. (1998). Prototypes in the mist: The early epochs of category learning. Journal of Experimental Psychology: Learning, Memory, & Cognition, 24, 1411–1436.

    Article  Google Scholar 

  84. Smith, J. D., & Minda, J. P. (2000). Thirty categorization results in search of a model. Journal of Experimental Psychology: Learning, Memory, & Cognition, 26, 3–27.

    Article  Google Scholar 

  85. Spence, K. W. (1937). The differential response in animals to stimuli varying within a single dimension. Psychological Review, 44, 430–444.

    Article  Google Scholar 

  86. Tenenbaum, J. B., & Griffiths, T. L. (2001). Generalization, similarity and Bayesian inference. Behavioral & Brain Sciences, 24, 629–640.

    Google Scholar 

  87. Train, K. E. (2003). Discrete choice methods with simulation. Cambridge: Cambridge University Press.

    Google Scholar 

  88. Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327–352.

    Article  Google Scholar 

  89. Tversky, A., & Gati, I. (1982). Similarity, separability, and the triangle inequality. Psychological Review, 89, 123–154.

    PubMed  Article  Google Scholar 

  90. Vapnik, V. N. (2000). The nature of statistical learning theory (2nd ed.). New York: Springer.

    Google Scholar 

  91. Verguts, T., Ameel, E., & Storms, G. (2004). Measures of similarity in models of categorization. Memory & Cognition, 32, 379–389.

    Article  Google Scholar 

  92. Wichmann, F. A., Graf, A. B. A., Simoncelli, E. P., Bülthoff, H. H., & Schölkopf, B. (2005). Machine learning applied to perception: Decision images for gender classification. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems 17 (pp. 1489–1496). Cambridge, MA: MIT Press.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Frank Jäkel.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Jäkel, F., Schölkopf, B. & Wichmann, F.A. Generalization and similarity in exemplar models of categorization: Insights from machine learning. Psychonomic Bulletin & Review 15, 256–271 (2008). https://doi.org/10.3758/PBR.15.2.256

Download citation

Keywords

  • Machine Learning
  • Category Structure
  • Kernel Method
  • Decision Boundary
  • Generalization Performance