Bayesian Inference for Multidimensional Scaling Representations with Psychologically Interpretable Metrics

Multidimensional scaling (MDS) models represent stimuli as points in a space consisting of a number of psychological dimensions, such that the distance between pairs of points corresponds to the dissimilarity between the stimuli. Two fundamental challenges in inferring MDS representations from data involve inferring the appropriate number of dimensions and the metric structure of the space used to measure distance. We approach both challenges as Bayesian model-selection problems. Treating MDS as a generative model, we define priors needed for model identifiability under metrics corresponding to psychologically separable and psychologically integral stimulus domains. We then apply a differential evolution Markov-chain Monte Carlo (DE-MCMC) method for parameter inference, and a Warp-III method for model selection. We apply these methods to five previous data sets, which collectively test the ability of the methods to infer an appropriate dimensionality and to infer whether stimuli are psychologically separable or integral. We demonstrate that our methods produce sensible results, but note a number of remaining technical challenges that need to be solved before the method can easily and generally be applied. We also note the theoretical promise of the generative modeling perspective, discussing new and extended models of MDS representation that could be developed.


Introduction
Multidimensional scaling (MDS) was developed in the 1950s in cognitive psychology as a statistical method for making inferences about human mental representations (Shepard 1957(Shepard , 1962Kruskal 1964) MDS models the similarities or psychological proximities between pairs of stimuli, representing each stimulus as a point in a multidimensional space, such that more similar stimuli are nearer each other. The core psychological motivation is that the similarities reflect the basic cognitive process of generalization. Generalization can be thought of as the ability to treat two stimuli as being the same, and has been argued to serve as a basis for the mental organization of knowledge, and the capability of the mind to make adaptive predictions about properties and consequences (Shepard 1987). For these reasons, mental representations found via MDS methods have been and remain widely used in cognitive process models of identification, categorization, and decision making (e.g., Nosofsky 1992).
Soon after its development in cognitive psychology, however, MDS algorithms found application as a statistical method that produces a low-dimensional representation of a set of objects, based on a measure of the similarities between them. As a data-reduction or visualization method, MDS has been applied in the natural, biological, and human sciences, with application areas as diverse as representing the similarities of skulls in archaeology, the tastes of colas in marketing, and the voting patterns of senators in politics (e.g., Borg and Groenen 1997;Cox and Cox 1994;Schiffman et al. 1981).
Whether viewed as a model of psychological representation or a data-reduction method, a foundational challenge in MDS modeling is determining the dimensionality M of the representational space. In his 1974 Presidential Address to the Psychometric Society, Roger Shepard identified six basic challenges for MDS, the third of which was "The problem of determining the proper number of dimensions for the coordinate embedding space" (Shepard 1974, p. 377). A number of methods for solving the problem of MDS dimensionality have been developed in both statistics and psychology. The most common approach is a scree test that aims to identify an "elbow" in the goodness-of-fit as dimensionality increases (Cox and Cox 1994;Kruskal 1964;Schiffman et al. 1981). Steyvers (2006) suggests the use of cross-validation methods, although this approach does not seem to be widely used.
Since choosing the correct dimensionality of an MDS is naturally regarded as a model-selection problemthat is, choosing between a one-dimensional versus twodimensional versus three-dimensional representation, and so on-the statistically principled approach offered by Bayes factors should provide a solution (Kass and Raftery 1995). Along these lines, Lee (2001) implements an approach based on the Bayesian Information Criterion (BIC). The difference between BIC values for representations with different dimensionality provides a crude approximation to the Bayes factor. Oh and Raftery (2001) provide a different approach to approximation by computing the marginal likelihoods of different representations using plugin point estimates for the stimulus locations. This is an approximation because the exact Bayes factor requires an integration across the stimulus location parameters. Oh (2012) developed a method based on spike-and-slab priors, in which the dimensionality is determined by the marginal posterior probabilities for each dimension that the coordinate locations are not zero for all stimuli.
From the perspective of MDS as psychological models however, none of these approaches qualifies as being principled and complete. The key issue is that the theory of mental representation developed by Shepard (1957Shepard ( , 1987Shepard ( , 1991 emphasizes the role that the metric structure of the space plays in capturing key psychological properties of the stimuli. In particular, the idea is that different metrics capture the theoretical and empirical distinction between separable and integral stimuli (Attneave 1950;Garner 1974). Separable stimuli are those for which the component dimensions can be attended to separately. An example is different shapes of different sizes, since it is possible for people to attend selectively to either shape or the size. Integral stimuli, by contrast, are those for which the component dimensions cannot be attended to independently. The standard example is color, since it is typically not possible for people to attend selectively to the underlying hue, saturation, and brightness components. Figure 1 shows how different metric structures are used to represent integral and separable stimuli. In the left panel, there are four stimuli, represented by the points p 1 , . . . , p 4 . The pairwise distances between these points, such as d 12 between the first point and the second point, are modeled using the Euclidean metric, and so correspond to standard straight lines. In the right panel, there are three stimuli, and the pairwise distances between them are modeled according to the city-block metric. Intuitively, this corresponds to comparing the stimuli on each underlying dimension independently, then adding those dissimarilities to get an overall measure of dissimilarity.
Admittedly, this account of integrality and separability is a theoretical and empirical caricature, and much more nuanced and detailed accounts are possible (Shepard 1991;Tversky and Gati 1982). The point is that psychological representations based on MDS need to make assumptions about the metric structure of the space, and use metrics other than the Euclidean metric. As Jäkel et al. (2008, p. 2) point out, from the origins of MDS as a psychological model, "There was no a priori reason to believe that mental representations should be Euclidean." Previous methods for determining the dimensionality of MDS representations using Bayesian model selection, however, have either been insensitive to the metric structure of the representation (Lee 2001) or have focused on the Euclidean metric (Oh 2012;Oh and Raftery 2001).
The use of non-Euclidean metrics raises another challenge, related to inferring MDS representations themselves. There is evidence that it can be computationally difficult to find multidimensional city-block MDS representations (Groenen et al. 1998;Hubert et al. 1992), as well as finding unidimensional MDS representations (Mair and Leeuw 2014). Given that these difficulties stem from basic geometric properties of the MDS representations, it seems likely they will continue to present an issue for Bayesian methods of inference.
Finally, there is the challenge of inferring the appropriate metric structure for an MDS representation. Shepard (1991) reviews the original statistical approach to this problem, which involved applying non-metric MDS algorithms for a large number of different metrics, and choosing the one with the best goodness-of-fit. As Lee (2008) pointed out, this approach neglects to account for the component of model complexity that arises from the functional form of parameter interaction (Pitt et al. 2006), which is often the only difference between MDS models using different psychologically interpretable metrics. Lee (2008) developed a Bayesian approach in which the possible metrics correspond to a parameter that is inferred jointly with the coordinate location parameters that represent the stimuli. Okada and Shigemasu (2010) developed and tested this approach further, and showed it is capable of recovering the correct metric in simulation studies. Both the Lee (2008) and Okada and Shigemasu (2010) methods, however, failed to resolve basic challenges in model identifiability that arise from treating the choice of metric structure as a parameter inference problem. It is possible these identifiability issues could be addressed by considering the choice as a modelselection problem, and restricting the set of possibilities to a few interpretable metrics.
Accordingly, the goals of this article are to examine the implementation of MDS models that use psychologically interpretable metrics, including both the Euclidean and a non-Euclidean metric, and explore the possibility of inferring the appropriate dimensionality and metric structure of these representations using Bayesian model-selection methods. The structure of the remainder of the article is as follows. In the next section, we define MDS models, and address the issue of model identifiability under different metrics. Consistent with previous literature, we argue that the city-block metric presents fundamental problems in making MDS representations identifiable. This leads to the development of joint prior distributions on the stimulus location parameters for the Euclidean metric, and non-Euclidean metrics other than the city-block metric. With these priors established, we apply an approach to Bayesian inference using differential evolution Markov-chain Monte Carlo (DE-MCMC) computational sampling methods. The DE-MCMC method helps address the difficulties inherent in inferring MDS representations, which are especially evident in non-Euclidean cases. We then use the Warp-III bridge sampling method to approximate the marginal densities needed to determine Bayes factors. We apply the method to five previously studied data sets, differing in the type of stimuli and expected dimensionality of their MDS representation. For all five applications, the method makes sensible inferences about dimensionality, and produces interpretable stimulus representations. We conclude with a discussion of remaining statistical and computational challenges, and potential directions for refining and extending the approach.

MDS Model Identifiability
The Identifiability Problem Formally, suppose there are N stimuli to be represented, based on observed proximity data from P participants, with d ij k measuring the proximity between the ith and j th stimulus provided by the kth participant. We assume these observed proximities are normalized to lie between 0 and 1. The point representing the ith stimulus in a M-dimensional space is p i = (p i1 , . . . , p iM ) and the distance between points p i and p j is measured by the Minkowski metric with metric parameter r, so that ( 1 ) The Minkowski metric has special cases of the city-block metric when r = 1 and the Euclidean metric when r = 2. Values of r between 1 and 2 can potentially be interpreted as intermediate assumptions about the independence of stimulus dimensions between the end point of complete separability and complete integrality. The goal of MDS is for the modeled distancesd ij k to correspond to the observed proximities d ij k . We use the probabilistic model where σ is the standard deviation with which the observed proximities are measured. 1 It is assumed to be the same for all of the proximities, and is given a prior σ ∼ TruncatedGaussian 0.15, 1 0.2 2 T (0, ) , where the T (0, ) indicates the sampled value is truncated to be a positive real number. This is an informative prior (Lee and Vanpaemel 2018), consistent with previous data and modeling. Intuitively, σ corresponds to the average standard deviation of different individual ratings of the same pair of stimuli. Empirical estimates of this standard deviation in previous data tend to range from about 0.1 to about 0.2 (Lee 2001;Lee and Pope 2003). 2 Accordingly, the prior is centered on 0.15, but allows a wide range of possibilities. We note that this MDS model does not incorporate individual differences. It is assumed that the same point p i represents the ith stimulus for all participants. We also emphasize, however, that individual-level proximity data d ij k are modeled, rather than averaged or aggregated data across participants. The problems inherent in averaging data have long been understood (Estes 1956), and have been studied in the specific cognitive modeling context provided by MDS representations (Lee and Pope 2003). Our approach is to require the same underlying MDS representation to provide an account of each individual proximity matrix.
To complete the generative model, a straightforward approach would be to give all of the coordinate locations for the representational points uniform priors p im ∼ Uniform −1, 1 . These priors, however, made the model non-identifiable, because the distances between points are invariant under transformations (Borg and Groenen 1997, Ch. 2). The distances between points are preserved under translation, reflection, axes permutation (for non-Euclidean metrics), and rotation (for the Euclidean metric). A principled Bayesian approach for controlling these invariances to ensure model identifiability constrains the coordinate location parameters through a joint prior distribution that depends on the assumed metric.

Previous Approaches
Existing MDS modeling methods that use Bayesian inference almost always rely on post-processing to address the issue of identifiability. The method developed by Lee (2008) post-processes posterior samples of the coordinate location parameters to control for translation, reflection, and permutation. For example, to control for translation, the method zero centers every posterior sample of the sets of coordinate location. The Lee (2008) method does not control for rotation, which is problematic, because the method also attempts to infer the r metric parameter, and so the inferred representational space can have a Euclidean metric, which requires rotational invariance.
Most other methods, in contrast, assume the MDS space is Euclidean. The post-processing of the coordinate location parameters used by both Oh and Raftery (2001) and Oh (2012) assumes a Euclidean space and controls for translation, reflection, and rotation. Okada and Mayekawa (2018) extend the approach developed by Okada (2012), which relies on Procrustes analysis. Their post-processing uses a loss function to align posterior samples of the coordinate location, but again assumes a Euclidean space.
Besides the lack of flexibility in the nature of the distance metric, post-processing methods have the effect of implementing modeling assumptions without explicitly specifying those assumptions as part of the model. While this is often practical, it is theoretically inelegant, and contrary to the goals of generative modeling. Ideally, the constraints required for model identifiability should be part of the model itself. In the case of MDS models, these constraints are naturally imposed through the specification of a joint prior over the coordinate location parameters that addresses the transformational invariances, removes the need for post-processing, and makes bridge sampling feasible.
This generative approach is used by the "parameter fixing" method considered by Okada and Mayekawa (2018), who evaluate it as a contrast with the Procrustes methods that are their focus. Parameter fixing corresponds to setting a structured joint prior over the coordinate location parameters. Okada and Mayekawa (2018) define the appropriate prior for a Euclidean space using results provided by Bakker and Poole (2013), which were derived using an analytic method based on matrix properties.
Our goal is to extend this approach to include non-Euclidean representations. We start by considering onedimensional MDS representations, before considering multidimensional representations in both Euclidean and non-Euclidean metric spaces. We take a geometric approach to identifying the required joint priors for invariance constraints, complementing the non-geometric approach of Bakker and Poole (2013) for the Euclidean metric.

One-dimensional Representation
For a one-dimensional representation, all of the psychologically interpretable metrics we consider give the same distances. The required constraints on the points are shown in Fig. 2, with one point fixed at the origin to control translation, and second point restricted to be positive to control reflection.
These constraints can be formalized by a joint prior with Euclidean Multidimensional Representations Figure 3 shows the constraints needed to identify Euclidean MDS representations in two and three dimensions. In the two-dimensional case, the first point p 1 is fixed at the origin, to control translation, the second point p 2 is constrained to the positive x-axis, to control reflection in the y-axis and rotation, and the third point p 3 is constrained to have a positive y-value to control reflection in the xaxis. The same logic is applied in the three-dimensional case, with p 1 controlling translation, p 2 and p 3 controlling reflection and rotation in successive axes, and p 4 controlling the final reflection. These are the first two cases of a general pattern, clear by induction, that applies to a M-dimensional representation, and corresponds to the matrix result provided by Bakker and Poole (2013). An intuitive presentation of the inductive pattern is shown below, where "0" denotes fixing a coordinate location to zero, "+" denotes constraining it to be positive, and "R" denotes imposing no constraint.
Formally, these constraints in D dimensions correspond to the joint prior The nature of iso-distance curves and the identifiability of mid-points for the three Minkowski metrics corresponding to r = 2 (Euclidean), r = 1 (city-block), and r = 1.5

Non-Euclidean Multidimensional Representations
Finding constraints for invariance in non-Euclidean metrics is more complicated, and is especially difficult for the city-block metric. The basic geometric problem was noted as early as Arnold (1971), and discussed in Shepard's (1974) presidential address. A simple demonstration of the fundamental problem is provided by Fig. 4. The three panels correspond to Euclidean (r = 2), city-block (r = 1), and a general non-Euclidean (r = 1.5) metric, and show unit iso-distance contours around the same two points in each metric, shown as black dots. These iso-distance contours are the "unit circles" of each metric, showing all the points in the space that are the same distance from the two points. For the Euclidean metric, these contours are familiar circles, and coincide at only one point, shown by the white dot. This means that there is a unique point in the space that is equally distant from the two points shown by black dots. In the context of an MDS representation, a stimulus that is equally different to both of the points can be uniquely identified.
For the city-block case, however, the iso-distance contours are diamonds, and there are infinitely many points that are equally different. Three specific possibilities are shown by white dots, but clearly any point along the line where the iso-distance contours coincide is possible. In the context of an MDS representation, this means that there is a fundamental difficulty in identifying a stimulus that is equally different to both of the points. This basic problem is not, in general, solved by the introduction of additional stimuli that provide additional constraints. Indeed, the problem compounds for potential city-block representations with many stimuli. Bortz (1974, see, especially, Figures 2 and 3) provides compelling examples, and the same point is emphasized in the seminal text by Borg and Groenen (1997, pp. 369-372). Figure 5 provides a concrete example, based on the more general configuration examined by Borg and Groenen (1997, Figure 17.6). Each panel shows a representation of six fictitious people in terms of two underlying dimensions. The city-block distance between each pair of people is identical in both configurations. This means, of course, that this proximity matrix is equally consistent with both representations, and either could be inferred from the data. But, the two representations are substantively different, in non-trivial ways. The representations do not differ simply by changing the axes, and have basic structural differences. For example: Cedric, Dingbats, and Ethelred are co-linear in the first representation, but not in the second, where Dingbats, Ethelred and Fiona become co-linear; the ordering of Albert and Beowulf changes on both dimensions between the configurations; and so on. In fact, once the lack of invariance revealed by the Borg and Groenen (1997, Figure 17.6) analysis is understood, it is clear that many additional representations for the proximity between the six people could be constructed, supporting a wide range of different meaningful interpretations.
A practical approach for identifying city-block representations, used by Nosofsky (1985), relies on determining the values of some stimuli on some dimensions, by means external to the MDS modeling. Ultimately, this strategy can solve the problem, if it is possible to find the values of every stimulus on every dimension. But, Fig. 5 suggests the strategy may not be effective in situations where the identification of just a few stimuli is possible. In both representations, Dingbats is at the same location, In addition, if, for example, Albert was additionally identified as being located in the position shown in the first representation, that would constrain the inference about Beowulf and Cedric, but would not constrain Ethelred and Fiona, who could still be inferred to be at either of the possibilities shown in the two representations. Thus, while the addition of stimuli, or the identification of dimension values for some stimuli, may work in some specific circumstances, we do not believe either represents a general approach to making city-block MDS representations identifiable.
We do not know how to solve the problem of MDS model invariance for the city-block metric. As the rightmost panel of Fig. 4 makes clear, however, the problem does not occur for Minkowski-metric parameters r > 1. For the r = 1.5 metric, the iso-distance contours again coincide at only one point. The asymmetry of these contours makes clear they do not have the rotational invariance of the Euclidean r = 2 metric. In this way, general non-Euclidean metrics, such as r = 1.5, capture the psychological idea that the dimensions in an MDS representation have meaning and allow selective attention, while avoiding the degenerate lack of identifiability inherent in the city-block metric. Figure 6 shows the constraints needed to identify these sort of non-Euclidean MDS representations in two and three dimensions. In the two-dimensional case, the first point p 1 is once again fixed at the origin, to control translation, the second point p 2 is constrained to the positive quadrant to control reflection. In addition, the constraint that p 22 ≤ p 21 is imposed, requiring the value of the second stimulus on the y-axis not to be larger than its value on the x-axis. This constraint controls for axis permutation, preventing the two dimensions from being swapped, and so allocates a specific underlying stimulus dimension to each axis. The three-dimensional case extends this logic by requiring that the z-axis value of the second point be positive, to prevent reflection, and be less than the value of the second point on the y-axis, to prevent permutation.
These first two cases once again make clear a general pattern, in which the coordinate values of the second point are positive and order constrained. 3 Formally, the constraints for non-city-block but non-Euclidean D dimensions are

Bayesian MDS Inference via DE-MCMC
When posterior samples for MDS models are obtained using conventional Markov-chain Monte Carlo algorithms (MCMC; e.g., Gamerman & Lopes, 2006), it can occur that chains get stuck in local maxima. In our experience, the reason is typically that the stimuli that are constrained are similar to each other. To prevent local maxima, we implemented a heuristic to order the stimuli in a way that those defining the constraints are dissimilar. We motivate and describe this heuristic in detail in Appendix 1. In addition, to improve sampling, we used the differential evolution Markov-chain Monte Carlo algorithm (DE-MCMC; e.g., Heathcote et al. in press;Turner et al. 2013) that helps to guide the chains to regions of high posterior density. DE-MCMC is a population-based MCMC algorithm that generates efficient proposals via a population of interacting chains (Turner et al. 2013). One strength of the algorithm is that it works well for highly correlated target distributions. However, we used DE-MCMC primarily for the reason that the interacting chains can guide each other to regions of high posterior density which helps to avoid the issue of chains getting stuck in local maxima. Specifically, during burn-in, we used a migration step that remedies the problem of outlier chains in an effective manner (for details, see Turner et al. 2013, Appendix 2). We found that the combination of the ordering heuristic and DE-MCMC provides effective sampling consistently for the Euclidean metric, and is partially effective for non-Euclidean metrics.

Marginal Likelihood
Comparing MDS models with different dimensions and metrics via Bayes factors and posterior model probabilities requires the computation of the marginal likelihood for all of the models, M m,r , being considered where m denotes the dimensionality and r the metric. Let D denote the observed data (i.e., the pairwise dissimilarity ratings d ij k ) and P denote the N × m matrix with the latent stimulus coordinates for each stimulus. The marginal likelihood for model M m,r corresponds to the normalizing constant of the joint posterior distribution for θ = (P , σ ): Prior on Imprecision dP dσ, where q(θ | D, M m,r ) denotes the unnormalized joint posterior density.

Bridge Sampling
Since the marginal likelihood in Eq. 7 is not available analytically, we use Warp-III bridge sampling (Meng and Schilling 2002) to estimate this potentially highdimensional integral. Bridge sampling (Meng and Wong 1996; for a recent tutorial, see Gronau et al. 2017) is based on the following identity: where the numerator is an expected value with respect to a proposal distribution g(θ), the denominator is an expected value with respect to the parameter posterior distribution p(θ | D, M m,r ), and h(θ ) is a function such The bridge sampling estimate is obtained by sampling from the proposal distribution g(θ ) and the posterior distribution p(θ | D, M m,r ) to approximate the two expected values. Meng and Wong (1996) showed that the optimal choice for h(θ) is given by where s i = n i /(n 1 + n 2 ), i ∈ {1, 2}, n 1 denotes the number of samples from the posterior p(θ | D, M m,r ), and n 2 denotes the number of samples from the proposal g(θ ). The optimal choice for h(θ ) depends on the marginal likelihood of interest. Therefore, in practice, the bridge sampling estimate is obtained via an iterative scheme, presented below, that updates an initial guess of the marginal likelihood until convergence. The variability of the bridge sampling estimate is governed not only by the number of samples but also, by the overlap between the proposal and the posterior distribution. To obtain estimates with low variability, it is therefore prudent to maximize the overlap between these two distributions. The Warp-III approach attempts to create a large overlap by fixing the proposal to a standard multivariate Gaussian distribution and then manipulating (i.e., "warping") the posterior in a way that matches the first three moments of the two distributions. 4 Crucially, the warping procedure retains the normalizing constant of the posterior (i.e., the marginal likelihood of interest).
A prerequisite for the warping procedure is that all elements of the parameter vector are allowed to range across the entire real line. This can be achieved via a change-of-variables of the form ζ = f (θ ), where f is a suitable 5 vector-valued function that transforms the constrained elements of θ so that all elements of ζ are unconstrained. 6 The Warp-III procedure is based on the following stochastic transformation of the unconstrained parameter vector ζ : where b ∼ Bernoulli (0.5) on {−1, 1}, μ denotes the expected value vector of the posterior samples, and = CC denotes the posterior covariance matrix (i.e., C is obtained via a Cholesky decomposition). Figure 7 illustrates the warping approach for the univariate case. In the upper-left panel, the solid line corresponds to the standard Gaussian proposal distribution and the gray histogram depicts synthetic posterior samples. Subtracting the posterior mean from all samples matches the 4 Note that other proposal distributions are conceivable. The only constraints are that the proposal has a zero mean vector, an identity covariance matrix, and exhibits no skewness. 5 The function f needs to be one-to-one and its inverse f −1 needs to have a well-defined Jacobian. 6 We use a function f that applies a log transformation to σ and (scaled) probit transformations to the non-zero elements of P . The transformation for the ordered coordinates of the second stimulus for the non-Euclidean case is described in Appendix 2. Note that it is irrelevant whether the coordinates are ordered as decreasing, as shown in Fig. 6 for easier visualization, or increasing, as implemented in our code. The transformation described in the appendix assumes the latter. These transformations can be applied after having obtained posterior samples for θ. Furthermore, where necessary, the expressions are adjusted by the relevant Jacobian term |det J f −1 (ζ )|. first moment of the proposal and the posterior distribution, as shown in the upper-right panel. Dividing all samples by the posterior standard deviation matches the second moment of the two distributions, as shown in the lower-right panel. Finally, attaching a minus sign with probability 0.5 to the posterior samples achieves symmetry and thus matches the third moment of the proposal and the posterior distribution, as shown in the lower-left panel.
The Warp-III bridge sampling estimate based on h o (θ) is computed via an iterative scheme where the value of the estimate at iteration t is given by (for more details see, Gronau et al. 2019): with and In Eqs. 12-13, q(· | D, M m,r ) denotes the unnormalized posterior density with respect to the unconstrained parameter vector ζ , {ζ * 1 , ζ * 2 , . . . , ζ * n 1 } denote n 1 posterior samples, and {η 1 ,η 2 , . . . ,η n 2 } denote n 2 samples from the standard multivariate Gaussian proposal distribution. To compute the Warp-III estimate, one obtains 2n 1 posterior samples: the first half of these samples is used to approximate μ and C with their sample versionsμ andĈ, the second half of the posterior samples is used in the iterative scheme (i.e., Eq. 11). We use the bridgesampling R package (Gronau et al. in press) to compute the bridge sampling estimate in Eq. 11.

Applications
In this section, we present applications of our method to five existing data sets. For each application, we describe the stimuli and the nature of the data, as well as make clear our expectations about the MDS representation that will be inferred. In particular, we state our expectations about both the dimensionality and metric structure of the representation whenever possible. The results we present are based on considering MDS models up to and beyond this expected dimensionality, so that the inference our method makes is clear. Where possible, we apply our method under the assumption that the metric space is both Euclidean (r = 2) and non-Euclidean (r = 1.5) so that an inference can also be made about the integrality or separability of the stimulus domain. For some applications, we were unable to generate samples with acceptable convergence for the r = 1.5 metric. In those cases, we only report results assuming the r = 2 metric.

Line Length
Our first application involves the similarity judgments between nine lines of equally increasing length provided by 27 participants, as reported in Cohen et al. (2001). We expect these stimuli to have a one-dimensional MDS representation, corresponding to line length. Because the Minkowski metrics are all equivalent in a one-dimensional space, we do not have any expectations about the metric structure. Thus, we applied our method to these data by assuming a Euclidean metric. 7 As for all of our applications, we used 15 chains and 500 burn-in samples. During burnin, the probability of a migration step was set to 0.05. After burn-in, migration was switched off, and the algorithm was run for 9000 iterations. We only retained every third sample so that we ended up with 3000 samples per chain for further use (i.e., a total of 45,000 samples collapsed across chains).
The left panel of Fig. 8 shows posterior model probabilities, assuming equal prior probabilities, for one-, two-, and three-dimensional MDS representations. To assess the stability of the posterior model probability estimates, we ran the Warp-III procedure five times based on new samples from the proposal distribution (we always used The key result is that the expected one-dimensional representation is inferred, with a posterior probability near one. The right panel of Fig. 8 shows the inferred onedimensional MDS representation. The black lines show the stimuli in terms of their physical line lengths, located at the posterior mean of their location in the psychological space. The blue histograms show the marginal posterior distributions for each line stimulus. The MDS representation arranges the line stimuli in order of their length, but they are not evenly spaced, despite the lines increasing in constant physical increments. Instead, the psychological representation shows compression for the longer lines, consistent with basic psychophysics (Fechner 1966). This compression is large enough that the posterior distributions begin to overlap for the longest line stimuli.

Colors
Our second application considers classic data reported by Helm (1964), involving the similarities between ten colors. The experimental procedure involved trials in which participants were presented with physical tiles of three different colors, and moved one of the tiles to reflect their perceived overall similarity of the color of this tile to the colors of the other two tiles. Based on these responses, Helm (1964) calculated measures of pairwise similarities between the colors, and the resulting proximity data have previously been considered in the MDS literature (e.g., Borg and Groenen 1997;Carrol and Wish 1974). We consider only the data from the ten participants with normal color vision.
We expect the MDS representation to use the Euclidean metric, consistent with the integral nature of the color stimulus domain. We also expect a two-dimensional representation, following the color circle found by previous MDS analyses of these and other color similarity data, such as the Shepard (1962) original MDS analysis of data reported by Ekman (1954). Figure 9 shows the results of applying our method, assuming a Euclidean metric. This was a case in which we were unable to generate samples with acceptable convergence for the r = 1.5 metric. For the Euclidean metric, there is uncertainty regarding the dimensionality, with a three-dimensional representation having probability a little over 0.6 and a two-dimensional representation having almost all of the remaining probability. The inferred threedimensional representation is shown by pairing the first two dimensions as a two-dimensional plot in the center of Fig. 9, and showing the remaining third dimension separately to the right along an axis. Because of our ordering heuristic, the yellow and purple-blue stimuli were fixed at the origin and on the first axis. These assignments mean that the first two dimensions effectively represent the expected color circle that "bends" the visible physical spectrum from red to purple colors into a circle that reflects the psychological similarity between the end points. The third dimension, which we did not expect, could correspond to something like luminance, since low luminance purple-like colors are generally located at one end of the dimension and high luminance yellow-like colors are generally located at the other end.

Rectangles with Line Segments
Our third application involves data reported by Kruschke (1993) involving the similarity between eight geometric stimuli. These stimuli consisted of rectangles with interior line segments, and varied in terms of the height of the rectangle and the horizontal location of the line segment. A total of 50 participants provided similarity ratings on a ninepoint scale for all 28 stimulus pairs. Based on the original (Kruschke 1993) and subsequent (e.g., Lee 2001Lee , 2008 analyses of these data, we expect a two-dimensional MDS representation. We also expect the two stimulus dimensions to be psychologically separable. Figure 10 shows the results of applying our method assuming both the r = 1.5 and r = 2 metrics. It is clear that a two-dimensional representation with the separable r = 1.5 metric is inferred. It has essentially all of the posterior probability, with one-and three-dimensional r = 1.5 representations, and all of the r = 2 representations having essentially no posterior probability. The inferred

Shepard Circles
Our fourth application involves data collected by Treat et al. (2001), involving the similarity between nine geometric stimuli known as "Shepard circles." These stimuli consist of a closed semi-circle with an interior ray from the center to the perimeter. The nine stimuli are constructed by exhaustively varying three different radius lengths and three different angles for the internal ray. As for the rectangles with line segments, we expect a separable twodimensional MDS representation. For these stimuli, we expect the dimensions to correspond to the radius and angle dimensions. Figure 11 shows the results of applying our method assuming both the r = 1.5 and r = 2 metrics. 8 It is clear, once again, that a two-dimensional representation with the separable r = 1.5 metric is inferred. The inferred representation also again closely matches the ways in which the stimuli physically vary, with the horizontal axis corresponding to the radius of the semi-circle and the vertical axis corresponding to the angle of the ray.

Colored Shapes
Our final application considers similarity data for nine colored shape stimuli collected by Lee and Navarro (2002). The stimuli were circles, squares, and triangles that were colored red, green, and blue. The data were collected from 20 participants, each of whom rated the similarity of each pair of stimuli on a five-point scale.
Following the previous analysis in Lee and Navarro (2002), we expect a four-dimensional representation. This representation is best understood as being the product of a pair of two-dimensional representations, with one representing the similarities between the shapes, and the other representing the similarities between the colors. There are only three shapes and three colors, and neither set of three has a natural ordering. Instead, the circle, square, and triangle are all approximately equally different from one another, and the same is true of the red, green, and blue colors. These equal similarities are naturally represented by two-dimensional approximately equilateral triangles. The four-dimensional representation we expect is simply the independent combination of these two two-dimensional subspaces.
Our expectations for the metric structure of the MDS representations are less straightforward. Theoretically, the interaction between the shape and color dimensions is a classic example of a separable relationship. The metric structure within the color subspace, however, is theoretically integral, as for the previous application. Countering these theoretical expectations is the fact that there are only three values for the color and shape dimensions present in the Results for colored shapes data reported by Lee and Navarro (2002). The left panel shows the posterior probabilities for onethrough five-dimensional MDS representations for the Euclidean metric. The middle and right panels show the inferred four-dimensional representation, with two dimensions shown in each panel. The colored shapes show the inferred locations of each stimulus and error bars show 95% credible intervals for the marginal posterior distribution for each dimension stimulus set. The corresponding approximately equilateral triangles could be equally well accommodated by any of the Minkowski metrics we are considering. Thus, from a statistical perspective-without regard to the theory of separable and integral stimuli-we expect the simplest metric to be inferred. Since all metrics should be able to fit the data, the one with the smallest functional form complexity should be preferred. We found that this was a third case in which we were unable to generate samples with acceptable convergence for the r = 1.5 metric. Accordingly, Fig. 12 shows the results of applying our method assuming the Euclidean metric. A four-dimensional representation is clearly favored. This representation is shown in terms of two two-dimensional subspaces, and has the expected structure. The middle panel of Fig. 12 shows a subspace that captures the similarity relationships between the red, green, and blue colors. The right panel shows a subspace that captures the similarity relationships between the circle, square, and triangle shapes. These subspaces were found using an orthogonal Procrustes method (Borg and Groenen 1997, p. 162). In particular, we solved for the orthogonal transformation matrix that most closely mapped the inferred coordinate locations to the expected representational structure, defined as the product of two subspaces each with an equilateral triangle configuration.

Discussion
Collectively, the five applications demonstrate that our method is able to make reasonable inferences about MDS representations. The inferred number of dimensions, and the inferred stimulus locations, generally matched theoretical expectations, with the exception of the color application. In addition, where inferences about whether a Euclidean or non-Euclidean metric structure were made, they matched theoretical expectations. It is interesting to note that all of the applications for which non-Euclidean metrics made inference difficult involved stimulus domains for which the expectation was that the Euclidean metric was appropriate.
We also think that the five applications serve to demonstrate the usefulness of our approach to determining dimensionality and metric structure. Our approach is to treat these determinations as Bayesian model-selection problems and use Bayesian posterior probabilities to make inferences. Complete Bayes posterior probabilities have not been used in this way previously to determine either dimensionality or metric structure, and our introduction of the Warp-III method to solve the difficult computational approximation problems involved represents progress on these long-standing challenges in MDS modeling.
Despite this progress, we think the greatest contribution of the current work is to highlight fundamental challenges in MDS models of mental representation, and suggest new avenues for theoretical development. The challenges largely stem from our insistence on fully Bayesian inference, which has enormous advantages in terms of reaching complete, coherent, and principled conclusions, but also raises technical hurdles. The opportunities largely stem from our adoption of a generative modeling approach (Lee 2018). In particular, we think there are many remaining possibilities relating to the use of different metrics in MDS representations, and that there is an opportunity to extend the generative approach to develop more complete cognitive process models for inferring MDS representations. We conclude by discussing some of these challenges and opportunities.

Technical Challenges
Developing a generative MDS model in a Bayesian setting required the key issue of identifiability and invariance to be solved in terms of prior information, rather than more heuristically through post-processing. We used an existing solution to this challenge for the Euclidean metric, and proposed a solution for psychologically interpretable non-Euclidean metrics with 1 < r < 2. We also highlighted, however, the fundamental intractability of MDS representations using the city-block metric. This intractability has been documented before (Bortz 1974;Frank 2006, Figure 5.4;Shepard 1974, Figure 11), but has not prevented the use of MDS representations inferred based on the city-block metric in the cognitive modeling literature (e.g., Kruschke 1993;Lee and Wetzels 2010).
Our current approach to determining the appropriate metric treats this inference as a model-selection problem, and only considers the possibilities r = 1.5 and r = 2. Allowing for other metrics is theoretically interesting, but computationally difficult. One obvious cost is the need to generate posterior probabilities across a larger set of candidate models. But it also seems likely that some models will be difficult to make inferences about. We tried our DE-MCMC approach for r = 1.1 on a number of data sets, and were not able to achieve satisfactory convergence. Furthermore, as explained above, for a few of the applications, we were also not able to achieve satisfactory convergence for r = 1.5. These challenging cases involved stimulus domains for which the expectation was that the Euclidean metric was appropriate, which leads to a speculative suggestion that failure is related to model mis-specification. This is a potential example of a general aspect of Bayesian model comparison that can be computationally challenging: in order to rule out models that are likely mis-specified, one needs to be able to infer them well enough that they can be part of the model comparison. Although we believe that DE-MCMC is a powerful sampling algorithm which substantially helps alleviate the issue of non-converging chains, future research should explore different sampling algorithms that may perform better, particularly for non-Euclidean metrics.
Collectively, these technical challenges mean that our approach cannot currently be applied to large naturalistic stimulus domains. For example, Nosofsky et al. (2018) consider MDS representations based on sparse matrices of pairwise similarity judgments for a set of 360 images of rocks, and Hebart et al. (2020) report extensive crowd-sourced triadic comparison similarity data for 1854 images of real-world objects. Being able to determine the dimensionality, metric structure, and psychological representations of MDS representations of these domains using the Bayesian framework would potentially offer deep insight into how people represent the real-world stimuli. The successful applications we presented-in which there were clear expectations about dimensionality, metric, and representational structure-provide a basis for believing the Bayesian framework can provide this insight to situations where, because there are no clear theoretical expectations, answers must be inferred from data, if and when the computational technical hurdles are overcome.

Other Representations
We did not consider Minkowski metrics with r < 1. This possibility has been proposed as a way of representing stimulus domains in which the component dimensions compete for attention (Shepard 1987(Shepard , 1991Tversky and Gati 1982). The identifiability constraints for this metric present an open research challenge, and it is not clear how well DE-MCMC sampling methods will perform in inferring representations.
There is also the possibility of moving beyond the Minkowski family of metrics. In his presidential address, Shepard (1974, Figure 11) presented a taxonomy of metric spaces, each of which makes different fundamental representational assumptions that could be appropriate for at least some stimulus domains. There has been relatively little work in exploring these possibilities. Lindman and Caelli (1978) investigated MDS representations using Riemannian spaces with constant curvature, and Cox and Cox (1991) presented compelling applications for a special case of this approach involving MDS representations on a sphere.
A new idea raised by our application to the colored shape stimuli involves the possibility of different metric structures within the same representation. These stimuli involved two sorts of stimulus dimensions: those representing color, which are usually considered to be integral, and those representing qualitatively different shapes, which seems more separable. Certainly the interaction between the color dimensions and the shape dimensions would be expected to be separable, since it seems likely people can selectively attend to either the color or the shape of a stimulus, depending upon the cognitive context. This suggests a generalization of the MDS models in which each pair of dimensions is associated with a metric.
Finally, there are alternative representational models, which do not assume stimuli are represented by values on dimensions, that can compete with or complement MDS models. These alternatives include feature-based representations (Tversky 1977), such as those found by additive clustering and related methods (Shepard and Arabie 1979) and special cases such as tree-based models (Corter 1996;Shepard 1980). One attraction of the Warp-III approach we used is that it could estimate Bayes factors between fundamentally different sorts of representations-such as comparing dimensional and featural representations-since it operates directly on posterior samples for each model applied independently to the data. Even further, Navarro and Lee (2003) proposed a hybrid model of stimulus representation that combined both dimensions and features, and it would be conceptually elegant to choose between all of the candidate models, with various combinations of dimensions and features, using our methods. Navarro and Lee (2003) used an approximate analytic approach for this purpose, which would be significantly improved by an approach based on Bayes factors.

MDS Cognitive Process Models
Our modeling approach is generative, but is based on an extremely simple cognitive model. In essence, we assume that all participants have the same MDS representation, and produce dissimilarity judgments for pairs of stimuli that directly reflect the distances between those stimuli in the representation. It is likely that much better generative models can be developed by considering more realistic processing assumptions, and especially by including individual differences.
One example, involving the line-length application, was presented in a preliminary form by Lee (2014). A simple plot of the raw behavioral data suggests that one of the 27 participants appears to have reversed the scale that was used to judge similarity. This means that their judgments contaminate the inference of the MDS representation. Lee (2014) used a simple latent-mixture model extension of the basic MDS generative model, in which either the scale was used correctly or reversed. One participant was inferred to have reversed the scale, as expected. Perhaps more importantly, however, the resulting inference about the onedimensional MDS representation was shown to have less uncertainty than the one shown in Fig. 8. In this way, the introduction of individual differences in the cognitive process of similarity judgment helped decontaminate the inference about the representation of stimuli.
The same basic generative approach could support much more general cognitive process modeling using MDS representations. The hierarchical, latent mixture, and common cause model structures advocated by Lee (2018) could allow for rich accounts of individual differences in judgment processes or stimulus representations, and allow for models that extend beyond the judgment of similarity to other cognitive capabilities like categorization and inference. As one example, Ennis (1992) considers extended assumptions about MDS representations that allow for the noisy representation of perceptual stimuli, which could be incorporated by adding hierarchical structure to the coordinate locations. As another example, there are extensions of the basic MDS model we considered that allow for structured individual differences, such as INDSCAL (Carroll and Chang 1970;Carroll 1972). These would be easy to implement within our generative modeling framework. A model like INDSCAL, which assumes individuals weight the latent stimulus dimensions differently, relies on the appropriate number of dimensions being inferred, and evidence that the stimulus domain is separable. In this way, the potential of our method to make these inferences is especially important. As a final example, the rectangle and line segment stimuli are used by Kruschke (1993) to study category learning, but the similarity data and category learning data are analyzed independently. In effect, the similarity data are used to generate the MDS representation, and that representation is then assumed to provide the fixed basis for category learning. An alternative approach would be to infer the MDS representation jointly from both the similarity judgments and the category learning choices. This sort of flexibility raises the possibility of tackling more complicated cognitive phenomena, such as the ability to adapt representations in response to changes in the external environment, or the current context or goals.

Conclusion
We adopted a Bayesian model-selection approach to the problem of determining the dimensionality and metric structure of MDS representations, while considering psychologically interpretable Euclidean and non-Euclidean metrics. Our methods for inferring the representations and choosing their dimensionality and metric structure show the promise of the approach, but computational challenges remain a barrier in terms of an easy-to-use general capability. Our methods and applications also show the promise of placing MDS representations in a generative cognitive modeling framework, offering the possibility of new models of how people represent stimuli, and how those representations help guide behavior.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/. Figure 13 provides a concrete example to motivate the need for the ordering heuristic. It is clear this is an inferior representation to the one presented in Fig. 8. In Fig. 13, the first-and second-line stimuli, which are the two shortest, are located at almost the same point, rather than being appropriately spaced to reflect their psychological dissimilarity. Consistent with this intuition, the posterior density is worse for the representation in Fig. 13 than the representation in Fig. 8. This suboptimality is caused by the naive application of the constraints identified in Fig. 2 for a one-dimensional representation. The first stimulus is fixed at the origin, and the second stimulus is constrained to be positive. It is clear from Fig. 13 that the second stimulus is indeed inferred to be positive, but is extremely close to zero, with the remaining longer line stimuli "flipping" to negative values in the MDS space. This configuration still satisfies the proximity data reasonably well, because the required distance between the first two stimuli is small, and the distances from the first and second stimuli to all of the others is approximately conserved. Thus, it is the choice of the two similar stimuli as those that are constrained that leads to this potential for a local maximum and suboptimal representation.

Appendix 1. The ordering heuristic
Accordingly, we developed an ordering heuristic to try and assign the constraints for the various dimensionalities and metrics to stimuli that are sufficiently dissimilar. Because higher dimensionalities place constraints on more than two stimuli, the general approach is to order all of the stimuli. Our heuristic for doing this is based on the across participants averaged pairwise dissimilarity ratings. The first two stimuli are chosen to be the ones with the largest averaged pairwise dissimilarity. The remaining stimuli are chosen, one at a time, by considering the minimum averaged pairwise dissimilarity to the already selected stimuli. Specifically, the next stimulus is always chosen to be the one with the maximum value for the minimum averaged pairwise dissimilarity to the already selected stimuli.
We used this ordering heuristic for the colors and colored shapes applications. For the line-length application, we used the heuristic as described but then, in an additional step, switched the first stimulus with the second stimulus. This switch helped prevent the posterior for the ninth stimulus, corresponding to the longest line, push against the upper bound of 1. For the rectangles with interior line segments and Shepard circles applications, we used the heuristic as a starting point, but we then reordered some of the stimuli manually since it seemed to help with convergence.

Appendix 2. Transformation ordered vector (0-1 bounded)
The constrained vector x, 0 ≤ x 1 ≤ x 2 ≤ . . . ≤ x K ≤ 1, can be transformed to an unconstrained vector y ∈ K as follows: where −1 (·) denotes the inverse of the normal CDF. The inverse transformation is given by: where (·) denotes the normal CDF. Note that x k is a function of y 1 , y 2 , . . . , y k (the dependence on y 1 , y 2 , . . . , y k−1 is "hidden" in x k−1 ). Crucially, x k does not depend on y k+1 , y k+2 , . . . , y K . Consequently, the Jacobian matrix J of the transformation is lower triangular so that its determinant |J | is obtained by multiplying its diagonal entries. The diagonal entries are given by: where φ(·) denotes the normal PDF. Hence, the determinant of the Jacobian matrix is given by: (1 − x k−1 ) φ (y k ) .