Skip to main content

Training Deep Networks to Construct a Psychological Feature Space for a Natural-Object Category Domain


Many successful formal models of human categorization have been developed, but these models have been tested almost exclusively using artificial categories, because deriving psychological representations of large sets of natural stimuli using traditional methods such as multidimensional scaling (MDS) has been an intractable task. Here, we propose a novel integration in which MDS representations are used to train deep convolutional neural networks (CNNs) to automatically derive psychological representations for unlimited numbers of natural stimuli. In an example application, we train an ensemble of CNNs to produce the MDS coordinates of images of rocks, and we show that the ensemble can predict the MDS coordinates of new sets of rocks, even those not part of the original MDS space. We then show that the CNN-predicted MDS representations, unlike off-the-shelf CNN representations, can be used in conjunction with a formal psychological model to predict human categorization behavior. We further show that the CNNs can be trained to produce additional dimensions that extend the original MDS space and provide even better model fits to human category-learning data. Our integrated method provides a promising approach that can be instrumental in allowing researchers to extend traditional psychological-scaling and category-learning models to the complex, high-dimensional domains that exist in the natural world.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11


  1. By a “pure machine-learning model,” we mean that we are concerned only with the outputs that the CNNs produce, regardless of whether or not those outputs are achieved through human-like learning.

  2. We do not claim that there is anything special about the ResNet50 architecture; it simply yielded somewhat better model fits compared to the other architectures we tried, which included InceptionV3 (Szegedy et al. 2016) and VGG16/VGG19 (Simonyan and Zisserman 2014). For simplicity, we report the results from only the best-fitting network architecture among those that we tried. We emphasize as well that other more recently developed architectures such as InceptionResNet (Szegedy et al. 2017) or DenseNet (Huang et al. 2017) may yield even better results.

  3. This is the default image resolution assumed by ResNet50. Reducing the resolution to this size helps keep the training of the network computationally tractable, but it also obscures fine-grained details, which may have affected the networks’ ability to learn some of the MDS dimensions.

  4. Whereas earlier in the article, we focused on out-of-sample predictions for the deep networks, here we focus on within-sample predictions of the GCM to be consistent with our previous work. We describe the predictions as “within-sample” because a free parameter c is being estimated to fit the data. As described in a later section, we used the BIC statistic (Schwarz 1978) to address the issue of overfitting for cases of models involving different numbers of free parameters.

  5. As we discuss in detail in our “General Discussion,” the hidden-layer-activation approach may still be viable if transfer learning were performed in which the CNNs were trained directly on the rock categories, with newly derived hidden-layer activations then being used as inputs to the psychological models. In this section, our focus is on only true “off-the-shelf” representations that do not require further training of the networks.

  6. Whereas we modeled dissimilarities using distances between feature vectors, Peterson et al. (2018) and Battleday et al. (2017) modeled similarities using dot-products between feature vectors. We found that our approach led to better model fits.

  7. The physical layers’ dimension is partially captured by the “organization” MDS dimension, but the MDS space does not make a distinction between actual physical layers and stripes of different colors.

  8. Again, the aim of the present article is not to provide tests of the GCM against alternative models. Here, we simply use it as a tool for helping to evaluate the utility of alternative stimulus representations for predicting independent sets of classification-learning data. Nevertheless, one might argue that the need to expand the original MDS space with supplemental dimensions provides a challenge to the GCM, because typical applications make reference to only dimensions derived from independent sets of similarity-judgment data. In our view, this argument treats the exemplar-similarity model in a manner that is too constrained. People may classify objects based on their similarity to stored examples—whether the similarity comparisons are made in reference to “pre-existing” dimensions or to dimensions that are “discovered” in the service of categorization can be treated as a separate question. Yet another question is whether one needs to make use of the original similarity-judgment-derived MDS space at all: Why not simply create an entire researcher-defined set of features and collect direct ratings on all such features? Nosofsky et al. (2018b, 2018c) conducted extensive analyses to test such an approach, but found that the similarity-judgment-derived MDS space yielded far better accounts of both similarity-judgment data and independent sets of classification-learning data than did an approach that relied solely on participants’ ratings of individual researcher-defined features. Understanding the detailed basis for those previous findings remains a topic for future research. Some possibilities are that it is difficult for participants to provide accurate ratings for individual dimensions when they are highly interacting with other dimensions, and that the psychological scales of the dimensions are highly nonlinear transforms of the direct dimension ratings provided by participants. MDS spaces derived from analysis of similarity-judgment data do not suffer from those problems.

  9. Another specific example of deriving high-dimensional scaling solutions for complex, real-world categories is provided by the work of Getty, Swets, and their colleagues (Getty et al. 1988; Swets et al. 1991). Using a combination of MDS analyses of similarity judgments and direct ratings of individually specified dimensions, these investigators derived a 12-dimensional scaling solution for 24 instances of radiographs of benign versus malignant tumors in the domain of mammography. The derived dimensions corresponded to attributes such as roughness/smoothness of the border, the extent to which the tumor is invading neighboring tissue, the extent to which calcifications (small calcium deposits) are clustered, and so forth. Whereas Getty et al.’s MDS solution was limited to 24 instances of the radiographs of the benign and malignant tumors, with the present approach one could embed an unlimited number of such radiographs in the psychological scaling solution.


  • Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., et al. (2016). Tensorflow: large-scale machine learning on heterogeneous distributed systems. ArXiv Preprint ArXiv, 1603, 04467.

    Google Scholar 

  • Anderson, J. R. (1991). The adaptive nature of human categorization. Psychological Review, 98(3), 409.

    Google Scholar 

  • Austerweil, J. L., & Griffiths, T. L. (2011). A rational model of the effects of distributional information on feature learning. Cognitive Psychology, 63(4), 173–209.

    PubMed  Google Scholar 

  • Austerweil, J. L., & Griffiths, T. L. (2013). A nonparametric Bayesian framework for constructing flexible feature representations. Psychological Review, 120(4), 817–851.

    PubMed  Google Scholar 

  • Barsalou, L. W. (1985). Ideals, central tendency, and frequency of instantiation as determinants of graded structure in categories. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11(4), 629.

    PubMed  Google Scholar 

  • Bashivan, P., Kar, K., & DiCarlo, J. J. (2019). Neural population control via deep image synthesis. Science, 364(6439), eaav9436.

    PubMed  Google Scholar 

  • Battleday, R. M., Peterson, J. C., & Griffiths, T. L. (2017). Modeling human categorization of natural images using deep feature representations. ArXiv:1711.04855 [Cs, Stat]. Retrieved from

  • Battleday, R. M., Peterson, J. C., & Griffiths, T. L. (2019). Capturing human categorization of natural images at scale by combining deep networks and cognitive models. arXiv preprint, arXiv, 1904–12690.

    Google Scholar 

  • Bergstra, J., Yamins, D., & Cox, D. (2013). Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures (pp. 115–123). Presented at the International Conference on Machine Learning.

  • Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., & Shah, R. (1994). Signature verification using a “Siamese” time delay neural network. In Advances in neural information processing systems (pp. 737–744).

  • Chollet, F., et al. (2015). Keras.

  • Chopra, S., Hadsell, R., & LeCun, Y. (2005). Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) (Vol. 1, pp. 539–546 vol. 1).

  • Eckstein, M. P., Koehler, K., Welbourne, L. E., & Akbas, E. (2017). Humans, but not deep neural networks, often miss giant targets in scenes. Current Biology, 27(18), 2827–2832.e3.

    Article  PubMed  Google Scholar 

  • Erickson, M. A., & Kruschke, J. K. (1998). Rules and exemplars in category learning. Journal of Experimental Psychology. General, 127(2), 107–140.

    Article  PubMed  Google Scholar 

  • Elsayed, G. F., Shankar, S., Cheung, B., Papernot, N., Kurakin, A., Goodfellow, I., & Sohl-Dickstein, J. (2018). Adversarial examples that fool both human and computer vision. arXiv preprint, arXiv, 1802.08195 10.

    Google Scholar 

  • Geirhos, R., Janssen, D. H., Schütt, H. H., Rauber, J., Bethge, M., & Wichmann, F. A. (2017). Comparing deep neural networks against humans: object recognition when the signal gets weaker. ArXiv Preprint ArXiv, 1706, 06969.

    Google Scholar 

  • Getty, D. J., Pickett, R. M., D’Orsi, C. J., & Swets, J. A. (1988). Enhanced interpretation of diagnostic images. Investigative Radiology, 23(4), 240–252.

    PubMed  Google Scholar 

  • Guest, O., & Love, B. C. (2017). What the success of brain imaging implies about the neural code. Elife, 6, e21397.

    PubMed  PubMed Central  Google Scholar 

  • Hansen, L. K., & Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), 993–1001.

    Article  Google Scholar 

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition (pp. 770–778). Presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.

  • Holmes, W. R., O’Daniels, P., & Trueblood, J. S. (2019). A joint deep neural network and evidence accumulation modeling approach to human decision-making with naturalistic images. Computational Brain & Behavior, 1–12.

  • Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700–4708).

  • Ioffe, S., & Szegedy, C. (2015). Batch normalization: accelerating deep network training by reducing internal covariate shift (pp. 448–456). Presented at the International Conference on Machine Learning.

  • Jacobs, R. A. & Bates, C. J. (2019). Comparing the visual representations and performance of human and deep neural networks. Current Directions in Psychological Science, 28, 34-39.

  • Jones, M., & Goldstone, R. L. (2013). The structure of integral dimensions: contrasting topological and Cartesian representations. Journal of Experimental Psychology: Human Perception and Performance, 39(1), 111–132.

    PubMed  Google Scholar 

  • Khaligh-Razavi, S.-M., & Kriegeskorte, N. (2014). Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Computational Biology, 10(11), e1003915.

    Article  PubMed  PubMed Central  Google Scholar 

  • Kingma, D., & Ba, J. (2014). Adam: a method for stochastic optimization. ArXiv Preprint ArXiv:1412.6980

  • Kruschke, J. K. (1992). ALCOVE: an exemplar-based connectionist model of category learning. Psychological review, 99(1), 22.

  • Kruskal, J. B., & Wish, M. (1978). Multidimensional scaling. Beverly Hills: Sage.

    Google Scholar 

  • Lake, B. M., Zaremba, W., Fergus, R., & Gureckis, T. M. (2015). Deep neural networks predict category typicality ratings for images. Presented at the CogSci.

  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.

    Article  Google Scholar 

  • Lee, M. D. (2001). Determining the dimensionality of multidimensional scaling representations for cognitive modeling. Journal of Mathematical Psychology, 45(1), 149–166.

    PubMed  Google Scholar 

  • Love, B. C., Medin, D. L., & Gureckis, T. M. (2004). SUSTAIN: a network model of category learning. Psychological Review, 111(2), 309–332.

    PubMed  Google Scholar 

  • Meagher, B. J., Cataldo, K., Douglas, B. J., McDaniel, M. A., & Nosofsky, R. M. (2018). Training of rock classifications: the use of computer images versus physical rock samples. Journal of Geoscience Education, 66(3), 221–230.

    Article  Google Scholar 

  • Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines (pp. 807–814). Presented at the Proceedings of the 27th international conference on machine learning (ICML-10).

  • Nasr, K., Viswanathan, P., & Nieder, A. (2019). Number detectors spontaneously emerge in a deep neural network designed for visual object recognition. Science Advances, 5(5), eaav7903.

    PubMed  PubMed Central  Google Scholar 

  • Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology. General, 115(1), 39–57.

    Article  PubMed  Google Scholar 

  • Nosofsky, R. M. (1992). Similarity scaling and cognitive process models. Annual Review of Psychology, 43(1), 25–53.

    Google Scholar 

  • Nosofsky, R. M. (2011). The generalized context model: an exemplar model of classification. In Pothos, E. M. and Wills, A. J. (Eds.), Formal approaches in categorization, 18–39. Cambridge University Press.

  • Nosofsky, R. M., Sanders, C. A., Gerdom, A., Douglas, B. J., & McDaniel, M. A. (2017). On learning natural-science categories that violate the family-resemblance principle. Psychological Science, 28(1), 104–114.

    Article  PubMed  Google Scholar 

  • Nosofsky, R. M., Sanders, C. A., & McDaniel, M. A. (2018a). A formal psychological model of classification applied to natural-science category learning. Current Directions in Psychological Science, 27(2), 129–135.

    Article  Google Scholar 

  • Nosofsky, R. M., Sanders, C. A., & McDaniel, M. A. (2018b). Tests of an exemplar-memory model of classification learning in a high-dimensional natural-science category domain. Journal of Experimental Psychology: General, 147(3), 328–353.

    Article  Google Scholar 

  • Nosofsky, R. M., Sanders, C. A., Meagher, B. J., & Douglas, B. J. (2018c). Toward the development of a feature-space representation for a complex natural category domain. Behavior Research Methods, 50(2), 530–556.

    Article  PubMed  Google Scholar 

  • Nosofsky, R. M., Sanders, C. A., Meagher, B. J., Douglas, B. J. (2019a). Search for the missing dimensions: building a feature-space representation for a natural-science category domain. Computational Brain & Behavior, 1–21

  • Nosofsky, R. M., Sanders, C. A., Zhu, X., & McDaniel, M. A. (2019b). Model-guided search for optimal natural-science-category training exemplars: a work in progress. Psychonomic Bulletin & Review, 26(1), 48–76.

    Google Scholar 

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Dubourg, V. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12(Oct), 2825–2830.

  • Peterson, J. C., Abbott, J. T., & Griffiths, T. L. (2018). Evaluating (and improving) the correspondence between deep neural networks and human representations. Cognitive Science, 42(8), 2648–2669.

    Article  PubMed  Google Scholar 

  • Pothos, E. M., & Bailey, T. M. (2009). Predicting category intuitiveness with the rational model, the simplicity model, and the generalized context model. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(4), 1062.

  • Pothos, E. M., & Wills, A. J. (2011). Formal approaches in categorization. Cambridge University Press.

  • Rajalingham, R., Issa, E. B., Bashivan, P., Kar, K., Schmidt, K., & DiCarlo, J. J. (2018). Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. BioRxiv, 240614.

  • Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 806–813).

  • Roads, B. D., & Mozer, M. C. (2017). Improving human-machine cooperative classification via cognitive theories of similarity. Cognitive Science, 41(5), 1394–1411.

    PubMed  Google Scholar 

  • Roads, B. D., & Mozer, M. C. (2019). Obtaining psychological embeddings through joint kernel and metric learning. Behavior Research Methods, 51, 2180–2193.

    Article  PubMed  PubMed Central  Google Scholar 

  • Rosch, E. H. (1973). On the internal structure of perceptual and semantic categories. In Cognitive development and acquisition of language (pp. 111–144). Academic Press.

  • Rumelhart, D. E., & Todd, P. M. (1993). Learning and connectionist representations. Attention and performance XIV: synergies in experimental psychology, artificial intelligence, and cognitive neuroscience, 3–30.

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., … Bernstein, M. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.

  • Sanborn, A. N., Griffiths, T. L., & Navarro, D. J. (2010). Rational approximations to rational models: alternative algorithms for category learning. Psychological Review, 117(4), 1144–1167.

    PubMed  Google Scholar 

  • Sanders, C. A. (2018). Using deep learning to automatically extract psychological representations of complex natural stimuli. Unpublished Ph.D. dissertation, Indiana University.

  • Sanders, C. A., & Nosofsky, R. M. (2018). Using deep learning representations of complex natural stimuli as input to psychological models of classification. Madison: Proceedings of the 2018 Conference of the Cognitive Science Society.

    Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.

    Google Scholar 

  • Schyns, P. G., Goldstone, R. L., & Thibaut, J. P. (1998). The development of features in object concepts. Behavioral and Brain Sciences, 21(1), 1–17.

    Google Scholar 

  • Shepard, R. N. (1980). Multidimensional scaling, tree-fitting, and clustering. Science, 210(4468), 390–398.

    Article  PubMed  Google Scholar 

  • Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237(4820), 1317–1323.

    PubMed  Google Scholar 

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

  • Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.

    Google Scholar 

  • Steyvers, M., & Busey, T. (2000). Predicting similarity ratings to faces using physical descriptions. Computational, geometric, and process perspectives on facial cognition: contexts and challenges, 115–146.

  • Swets, J. A., Getty, D. J., Pickett, R. M., D'Orsi, C. J., Seltzer, S. E., & McNeil, B. J. (1991). Enhancing and evaluating diagnostic accuracy. Medical Decision Making, 11(1), 9–17.

    PubMed  Google Scholar 

  • Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-first AAAI conference on artificial intelligence.

  • Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818–2826).

  • Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. ArXiv:1312.6199 [Cs]. Retrieved from

  • Tamuz, O., Liu, C., Belongie, S., Shamir, O., & Kalai, A. T. (2011). Adaptively learning the crowd kernel. arXiv preprint arXiv:1105.1033.

  • Tarbuck, E. J., & Lutgens, F. K. (2015). Earth science (14th ed.). Boston: Pearson.

    Google Scholar 

  • Vanpaemel, W., & Storms, G. (2008). In search of abstraction: the varying abstraction model of categorization. Psychonomic Bulletin & Review, 15(4), 732–749.

    Google Scholar 

  • Voorspoels, W., Vanpaemel, W., & Storms, G. (2008). Exemplars and prototypes in natural language concepts: a typicality-based evaluation. Psychonomic Bulletin & Review, 15(3), 630–637.

    Google Scholar 

  • Yamins, D. L. K., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J. J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23), 8619–8624.

    Article  Google Scholar 

  • Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? (pp. 3320–3328). Presented at the Advances in neural information processing systems.

  • Zhou, Z., & Firestone, C. (2019). Humans can decipher adversarial images. Nature Communications, 10(1), 1334.

    PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Robert M. Nosofsky.

Additional information

This article is based on a PhD dissertation submitted by the first author to Indiana University (Sanders 2018). A report of preliminary versions of some of the work reported in this article was published in the 2018 Proceedings of the Cognitive-Science Society (Sanders and Nosofsky 2018). This research was supported by NSF grant 1534014 (EHR Core Research) to Robert Nosofsky.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix 1

Details of Deep Learning Models

Our deep learning models were implemented using the Keras Python package (Chollet et al. 2015), the Scikit-learn Python package (Pedregosa et al. 2011), and the Tensorflow computational framework (Abadi et al. 2016). As mentioned in the main text, we took a transfer-learning approach (Yosinski et al. 2014), using a pretrained implementation of ResNet50 (He et al. 2016) as the base network. More specifically, we kept each layer from ResNet50 up to the final pooling layer, and then used global average pooling to convert the activation of the pooling layer into a vector that could be used as input into a series of fully connected layers. For each of these layers, dropout (Srivastava et al. 2014) and batch normalization (Ioffe and Szegedy 2015) were used to improve generalization and accelerate learning. Rectified linear units (Nair and Hinton 2010) were used as the activation functions. The dropout rate was set to 0.5, and the hyperparameters for batch normalization were left at their default values. These layers fed into a final output layer consisting of 8 linear units corresponding to the 8 MDS dimensions.

We minimized the mean squared error (MSE) between the network’s output and the MDS coordinates of the rocks in the training set, using Kingma and Ba’s (2014) “Adam” as the optimization algorithm, with all of its hyperparameters left at their default values except for the learning rate. The network was trained until validation error stopped decreasing for at least 20 epochs, or for a maximum of 500 epochs. Only the newly added fully connected layers were trained at this stage. We used the hyperopt Python package (Bergstra et al. 2013) to optimize the following hyperparameters: the number of hidden layers added to the base CNN, the number of units in each hidden layer, the training batch size, and the initial learning rate. The optimal values were found to be 2, 256, 90, and 10–2.22, respectively. This model achieved a MSE of 1.494 on the validation set. For comparison, the lowest validation error we could achieve without using transfer learning was 1.856.

To further reduce validation error, the transfer-learning network was trained for another 500 epochs, using a fine-tuning procedure (Yosinski et al. 2014). This time all layers were trained. Because the parameters in the base CNN were expected to already be close to their optimal values, stochastic gradient descent with a low learning rate and high momentum (0.0001 and 0.9, respectively) was chosen as the optimization algorithm. After fine-tuning, the network achieved a MSE of 1.330 on the validation set. We repeated this entire procedure 9 more times to produce an ensemble of 10 CNNs. Final predictions were produced by averaging the output of all 10 networks. Each network in the ensemble had the same hyperparameter values. Code for training this ensemble can be found in the online repository ( This ensemble achieved MSE = 1.298 on the validation set and MSE = 1.355 on the test set.

A reviewer of an earlier version of the article was interested in the extent to which there was variability across different runs of the network and the degree of improvement achieved through using the ensemble-based predictions. Unfortunately, we did not record the individual network fits in conducting the original versions of these massive deep-learning investigations. However, to provide a sense of the issue, we repeated the training procedures except using a smaller number of total training epochs (200) than used for the results reported in the main text. The MSEs and R2s obtained for the validation and test sets for these reduced-training runs are reported for each individual network run and for the ensemble predictions in Appendix Table 4. As can be seen, the variability in fits across the individual network runs is relatively small, with a modest improvement in overall fit achieved by making using of the ensemble-based predictions.

Finally, to predict the supplementary dimensions, we created a new ensemble using the exact same procedure, but the networks were trained to predict both the 8 MDS dimensions and the 5 supplemental dimensions. The optimal parameter values this time were 3, 512, 30, and 10–2.05 for the number of hidden layers added to the base CNN, the number of units in each hidden layer, the training batch size, and the initial learning rate, respectively. This ensemble achieved a MSE of 1.326 on the validation set and 1.404 on the test set.

Appendix 2

Method for Collecting Similarity Judgments and Dimension Ratings

We closely followed the procedures for collecting similarity judgments and dimension ratings described in Nosofsky et al. (2018c). These data are available in the online repository. (


The participants were 174 students from the Indiana University, Bloomington community. Data from 11 participants were removed because their responses had low correlations with the averaged responses. Some participants received credit toward a course requirement, while others received $12 as compensation. All participants reported normal or corrected-to-normal vision and no expertise in geology. Of these participants, 85 provided similarity judgments; 20 provided ratings for the lightness/darkness of color, average grain size, and smoothness/roughness dimensions; 20 provided ratings for the shininess, organization, and chromaticity dimensions; 20 provided ratings for the porphyritic texture, conchoidal fractures, holes, and layers dimensions; and 29 provided ratings for the pegmatitic texture dimension.


The stimuli were the 120 rock images used in the categorization experiment described in the main text.

Similarity-Judgments Procedure

Participants were shown pairs of rock pictures and were instructed to judge the similarity of the rocks on a scale from 1 (most dissimilar) to 9 (most similar). On each trial, two subtypes were randomly selected, and then one token was randomly selected as a representative within each subtype (the same token could not be selected twice when the subtypes were the same). One token was placed on the left side of the screen, and the other was placed on the right. The participants gave their judgment for the pair using the computer keyboard. This procedure was repeated for all 435 unique pairs of the 30 rock subtypes, as well as all 30 within-subtype comparisons, for a total of 465 trials. Participants first completed 5 practice trials to get a sense of the types of stimuli they would see. (Because we removed the data of 6 participants due to low correlations with the averaged data, the data from a total of 79 participants—a total of 36,735 similarity-judgment trials—were included in the MDS analysis.)

Dimension-Ratings Procedure

Participants gave ratings for one dimension at a time. First, instructions explaining the dimension and its rating scale were shown. Then, on each trial, participants were shown one of the 120 rocks and were asked to provide a rating on a 1–9 scale along the dimension, with the exceptions of the holes and layer dimensions. For these dimensions, participants indicated whether each rock had holes, layers, or neither (no rock had both). Responses were entered using the computer keyboard. To promote a consistent scale across participants for each dimension, the scale was shown at the bottom of the screen with labeled anchor pictures at the middle and extreme ends of the scale. See the online repository for each dimension’s instructions and anchor pictures.

Table 4 Fit results from individual network runs and for the ensemble-based predictions for the MDS dimensions in the original 360-rock study

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sanders, C.A., Nosofsky, R.M. Training Deep Networks to Construct a Psychological Feature Space for a Natural-Object Category Domain. Comput Brain Behav 3, 229–251 (2020).

Download citation

  • Published:

  • Issue Date:

  • DOI: