Introduction

Most psychometric models for the analysis of multivariate discrete choice data are based on the assumption that choices among options are determined by their underlying utilities for the decision-makers. Because of significant computational advances in estimating the parameters of the latent utility distribution, the analysis of multivariate choice data is now well on its way of becoming a routine matter (Böckenholt & Tsai, 2006; Caffo & Griswold, 2006). Even the joint modeling of discrete and continuous choice outcomes is starting to be commonplace (Hanemann, 1984; Gueorguieva & Agresti, 2001). It is therefore appropriate to revisit this methodology and to assess its past, current, and future value in modeling choice data.

Because this paper reflects my personal journey and views on the analysis of choice data, many important subjects as well as contributors are left out, for which I apologize. In particular, this paper will not get into the extensive literature on estimation and inference—careful discussions of these topics can be found in the books by Hensher, Rose, and Greene (2005), Train (2003), and Louviere, Hensher, and Swait (2000). The papers presented at the 2004 Invitational Choice Symposium (Chakravarti, Sinha, & Kim, 2005) provide valuable overviews of many interesting topics in modeling choice behavior. In addition, the addresses of two nobel prize laureates in economics (Kahneman, 2003; McFadden, 2001) contain excellent reviews of this field.

My journey is linked to two papers by Thurstone (1927, 1931) that started the era of psychometric choice modeling. The next section will briefly review the contributions of these papers, which is followed by a discussion of current versions of Thurstonian-based choice models. Subsequently, I will point to new research avenues in extending these models to account for seemingly inconsistent choice behavior. The paper concludes by pointing to the interdisciplinary frontiers of studying choice behavior and the resulting future challenges for psychometricians.

Utility Theories and Choice Data

Most psychometric choice models are based on the notion of decision utility. This approach infers utility from observed choices and in turn uses utility to explain choices. For example, if a person chooses a and rejects b, this behavior is interpreted as indicating that a has a higher utility than b to this person. A number of consistency conditions have been proposed in the literature to characterize choices based on decision utilities. Transitivity is awell-known consistency condition stating that a preference for a over b and b over c implies a preference for a over c.

In psychometrics, Thurstone’s (1927, 1931)work proved to be highly influential in promoting an extended version of the decision-utility concept that allowed for stochastic variability in choices. Invoking Fechner’s (1860) psychophysical concept of a sensory continuum, Thurstone (1927) argued in his “Law of Comparative Judgment” that choice options can be represented along a utility continuum by random variables that describe the options’ effects on a person’s cognitive apparatus. Thurstone used randomness as a device to represent factors that determine the formation of preferences but are unknown to the observer. This specification was driven by pragmatic considerations since Thurstone was well aware that he could not distinguish between the positions of whether a choice process is inherently random or determined by a multitude of different factors that may not be measurable. By not specifying the “discriminal” process by which a person “identifies, distinguishes, discriminates, or reacts to stimuli” (1927, p. 369) and defining it to be exogenous to all the factors that can affect a choice process, he broadened tremendously the applicability of this approach to the modeling of choice data. Importantly, he also utilized these ideas for modeling individual differences in choice behavior by arguing that heterogeneity in individual assessments of the same option can be described as realizations of the normal distribution. Introducing Thurstone’s work to economics, Marschak (1960) coined the term Random Utility Maximization (RUM) when referring to choice probabilities that result from the maximization of utilities with random elements.

Not unlike the notion of personality traits, a controversial corollary of decision utility is that the measured utilities are predetermined, decomposable, and stable over both time and across situations. Consistency conditions defined for stochastic preferences include stochastic transitivity and contraction and expansion consistency (Block & Marschak, 1960; Falmagne, 1985). Under contraction consistency, if a choice set T is narrowed to U and the options chosen from T are still in U, then no unchosen options should be chosen and no previously chosen options should be unchosen. Under expansion consistency, if a choice set T is expanded to U, then the probability of choosing an option from U should not exceed the probability of choosing the same option from T.

In addition to providing a probabilistic version of decision utility, Thurstone (1931) also promoted the benefits of experimental methods in collecting choice data. By asking respondents to state directly their preferences for a single or multiple sets of options in controlled experimental settings, he showed that it is possible to obtain information about a person’s preferences that may not be available by observing choices in the person’s daily context. These two data types became known later as “stated preference” and “revealed preference” data.

Thurstone (1931) showed that stated preference data can be used to test for indifference curves which at that time played a pivotal role in consumer demand theory. Initially, economists conceptualized the unit of utility as the “just perceivable increment of pleasure” (Edgeworth, 1881, p. 99) but moved later towards the concept of decision utility and, in this context, adopted Pareto’s (1900) work on indifference curves that depict different bundle combinations a person is indifferent to choose among. Interestingly, Thurstone’s (1931) experimental investigation of indifference curves was met initially with severe criticism by economists (Georgescu-Roegen, 1936; Wallis & Friedman, 1942) and had little impact on subsequent developments of demand theory. Reasons for the negative reactions included the hypothetical nature of the choice situation, the apparent empirical difficulties in detecting indifference, and the experimental lack of control for the effect of income and prices. Looking back, however, Thurstone’s (1931) study was rather significant because it is the first reported experiment in behavioral economics, and also prompted much future work on conjoint measurement and related methods (Luce & Tukey, 1964).

Thurstone’s contributions to experimental paradigms for collecting choice data and to the psychometric foundation of choice models with latent variables, in combination with parallel developments in biometrics, sociometrics, and econometrics, led to a rich set of choice models (Ashford & Sowdon, 1970; McFadden, 1984; Hausman & Wise, 1978). The next section presents a short discussion of these Thurstonian-based models for stated preference data and also points to related developments.

Thurstonian Random Utility Models

Typically, stated choice data are collected in the form of incomplete and/or partial rankings. Incomplete ranking data are obtained when a decision-maker considers only a subset of the choice options. For example, in the method of paired comparison, two choice options are presented at a time, and the decision-maker is asked to select the more preferred one (David, 1988). In contrast, in a partial ranking task, a decision-maker is confronted with all choice options and asked to provide a ranking for a subset of the options. For instance, in the best-worst method, a decisionmaker is instructed to select the best and worst options out of the offered set of choice options. Both partial and incomplete approaches can be combined by offering multiple distinct subsets of the choice options and obtaining partial or complete rankings for each of them. Pick any constant-sum, and ordinal versions of paired comparison and rankings are alternative methods for collecting stated preference data (Böckenholt, 1992, 2001a, 2001b; Böckenholt & Dillon, 1997a; Yao & Böckenholt, 1999).

The Response Model

Consider a set of J choice alternatives (j = 1, …, J) and n decision-makers (i = 1, …, n). Under Thurstone’s (1927) random utility approach, a choice of an option j is determined by an unobserved utility assessment, ν j , that can be decomposed into a systematic and a random part:

$${\upsilon _j} = {{\rm{\mu }}_j} + {\varepsilon _j}$$

, where the option mean μ j is assumed to stay the same in repeated evaluations of the option, but the random contribution ε j varies from evaluation to evaluation according to some distribution. The stochastic component represents unobserved attributes affecting choice, interindividual differences in the utility assessments, measurement errors, and functional misspecification (Manski, 1977). Clearly, this decomposition is fragile and not easy to perform in empirical applications, especially involving revealed choice data. Initially, (1) was applied to the analysis of paired comparison data obtained from a group of judges who were asked to choose their preferred option for all possible pairs of stimuli. Bock (1958) noted that in this case a two-level representation is needed and extended (1) to include person-specific effects:

$${\upsilon _{ij}} = {{\rm{\mu }}_{ij}} + {\varepsilon _{ij}}$$

, where μ ij represents person’s i mean evaluation of option j. Takane (1987) further extended this work by presenting a general class of covariance structures to represent individual differences in the assessments of the choice options.

In general, the choices made by person i for a single choice set of J options can be summarized by an ordering vector r i . For instance, r i = (h, j, …, l, k) indicates that choice option h is judged superior to option j which in turn is judged superior to the remaining options, with the least preferred option being k. The probability of observing this ordering vector for person i can be written as

$$\Pr ({r_i} = \left( {h,j, \ldots,l,k} \right)) = \Pr \left[ {\left( {{\upsilon _{ih}} - {\upsilon _{ij}} > 0} \right) \cap \ldots \cap ({\upsilon _{il}} - {\upsilon _{ik}} > 0)} \right]$$

. More compactly, a (J − 1) × J contrast matrix, C i , can be defined to indicate the sign of the differences among the ranked options. For example, for the ordering vectors r i = (j, l, k) of J = 3, the corresponding contrast matrix takes on the form

$$ C_i = \left[ {\begin{array}{*{20}c} 1 \\ 0 \\ \end{array} \begin{array}{*{20}c} 0 \\ { - 1} \\ \end{array} \begin{array}{*{20}c} { - 1} \\ 1 \\ \end{array} } \right] $$

, where the three columns of the contrast matrix correspond to the options j, k, and l, respectively.

The distributional assumptions on the ε ij ’s determine the probability for each response pattern. For example, a multivariate probit model is obtained when assuming that latent judgments ν i = (ν i1, …, ν iJ ) of the J options for judge i are multivariate normal with mean vector μ i and covariance matrix Σ. Ansari and Iyengar (in press) proposed a more general random-effects representation by specifying the ε ij ’s semiparametrically using Dirichlet process priors. In the following, the popular normality representation is adopted for ease of presentation. In this case, the probability of observing the ordering vector r i is obtained by evaluating a (J − 1)-variate normal distribution,

$$ Pr\left( {\left. {r_i } \right|\mu _i ,\Sigma } \right) = \frac{{\left| {\Gamma _i } \right|^{ - {1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-\nulldelimiterspace} 2}} }} {{\left( {2\pi } \right)^{{{\left( {J - 1} \right)} \mathord{\left/ {\vphantom {{\left( {J - 1} \right)} 2}} \right. \kern-\nulldelimiterspace} 2}} }}\int_0^\infty \cdots \int_0^\infty {\exp \left\{ { - \frac{1} {2}\left( {x - \delta _i } \right)^\prime \Gamma _i^{ - 1} \left( {x - \delta _i } \right)} \right\}dx} $$

, where δ i = C i μ i and Γ i = C i Σ C i .

Typically, choice options are presented to judges in multiple blocks, especially when the set of choice options is large. This approach simplifies the judgmental task since only a few options need to be considered at a time. It also facilitates studying whether judges are consistent in their evaluations of the choice options. Consider, for example, a paired comparison task in which a person is asked to choose the preferred option for a subset or all of the pairwise choice sets. Extending (2), the utilities of judge i that underlie the pairwise comparison of options j and k, respectively, can be written as

$$\eqalign{ & {\upsilon _{ij(k)}} = {{\rm{\mu }}_{ij}} + {\varepsilon _{ij(k)}}, \cr & {\upsilon _{ik(j)}} = {{\rm{\mu }}_{ik}} + {\varepsilon _{ik(j)}} \cr} $$

. The terms ε j(k) and ε k(j) capture random variations in the judgmental process, where ε j(k) denotes the random variation of option j when compared to option k. Thus, according to (3), the choice between j and k is determined by the sign of the latent judgment y ijk where

$${y_{ijk}} = {\upsilon _{ij(k)}} - {\upsilon _{ik(j)}} = {{\rm{\mu }}_{ij}} - {{\rm{\mu }}_{ik}} + {\varepsilon _{ij(k)}} - {\varepsilon _{ik(j)}}$$

. It can be either assumed that judgments across choice sets are independent or dependent on the individual level. Under independence, one obtains

$${y_{ijk}} = {{\rm{\mu }}_{ij}} - {{\rm{\mu }}_{ik}} + {\varepsilon _{ijk}}$$

, where \({\varepsilon _{ijk}} = {\varepsilon _{ij(k)}} - {\varepsilon _{ik(j)}}\) and the variability in the judgments comes solely from the ε’s which are independently distributed given the person-specific parameters. For multiple comparisons of J options, the latent pairwise judgments of judge i can then be written conveniently as a linear model. For example, in a comparison of the options j, k, l, and m, we obtain

$${y_i} = \left( {\matrix{ {{y_{ijk}}} \cr {{y_{ijl}}} \cr {{y_{ijm}}} \cr {{y_{ikl}}} \cr {{y_{ikm}}} \cr {{y_{ilm}}} \cr } } \right) = \left( {\matrix{ 1 & { - 1} & 0 & 0 \cr 1 & 0 & { - 1} & 0 \cr 1 & 0 & 0 & { - 1} \cr 0 & 1 & { - 1} & 0 \cr 0 & 1 & 0 & { - 1} \cr 0 & 0 & 1 & { - 1} \cr } } \right)\left( {\matrix{ {{{\rm{\mu }}_i}_j} \cr {{{\rm{\mu }}_{ik}}} \cr {{{\rm{\mu }}_{il}}} \cr {{{\rm{\mu }}_{im}}} \cr } } \right) + \left( {\matrix{ {{\varepsilon _{ijk}}} \cr {{\varepsilon _{ijl}}} \cr {{\varepsilon _{ijm}}} \cr {{\varepsilon _{ikl}}} \cr {{\varepsilon _{ikm}}} \cr {{\varepsilon _{ilm}}} \cr } } \right) = A{{\rm{\mu }}_i} + {\varepsilon _i}$$

, where A is the paired-comparison design matrix.

There are important links between multivariate, non-linear mixed IRT models for nominal and binomial responses and random-utility models (Rijmen, Tuerlinckx, De Boeck, & Kuppens, 2003). Many of these models are based on random-effects versions of Luce’s (1959) choice model which is consistent with a RUM model if and only if the random component in (2) follows a Gumbel distribution (Holman & Marley reported in Luce & Suppes, 1965). Examples include Bock’s (1969, 1972) multinomial logit and nominal models, McFadden’s (1974) conditional logit model, McFadden and Train’s (2000) mixed multinomial logit model, Böckenholt’s (2001b) ranking model, and Skrondal and Rabe-Hesketh’s (2003) multilevel logit model.

The interpretation of the RUM model parameters is limited by the comparative and discrete nature of choice data. Most importantly, it is not possible to identify the origin and scale of the individual utility scales. One option may be preferred to another but this result does not allow any conclusions about the attractiveness level of the options or about interpersonal differences in the utilities (Guttman, 1946). Tsai (2000, 2003) discusses the identifiability of parameter estimates for Thurstonian ranking and paired comparison models (see also Tsai & Böckenholt, 2002). An important implication of this work is that the Case distinctions that were originally proposed by Thurstone (1927) are misleading and have to be interpreted as equivalence classes of covariance structures. However, as noted already by Thurstone and Jones (1957), it is possible to identify the origin of the utility scale by extending the choice task, for example, by including comparisons between pairs of options. Böckenholt (2004) provides a recent review and discussion of different methods that can be used for this purpose. In general, methods for identifying an utility scale origin are not only instrumental for avoiding difficulties in the interpretation of the estimated parameters of a choice model, they also provide useful insights about the underlying judgmental process.

Current Uses of Thurstonian Random Utility Models

This section discusses three major current psychometric research streams in the analysis of choice data. First, much work is underway to enhance our current tool set for modeling individual differences both at a particular point in time and over time. Second, there is strong interest in going beyond modeling the relationship between choice options and decision-makers and to consider a wider range of influences including social context and social influences. A third research stream focusses on ways to supplement and complement choice data because choice outcomes alone provide only limited information about the choice process and its determinants.

Individual Differences

The analysis of individual difference effects is one of most fruitful aspects of Thurstonian random utility models. Past work considered nested and crossed random effects, person-specific regression weights, and factor structures to capture dependencies among options under flexible distributional assumptions that include mixtures of the extreme value distribution, multivariate normality, scale mixtures of multivariate normal distributions, and Dirchlet processes. For example, the following decomposition of an individual utility vector allows for the effects of judge-specific covariates x i as well as observed and unobserved attributes of the options, given by M and Λ, respectively,

$${{\rm{\mu }}_{\rm{i}}}{\rm{ = \mu }} + K{x_i} + M{\beta _i} + \Lambda {f_i} + {\nu _i}$$

, where K, β i , and f i contain regression coefficients. Effects that are not accounted for by the covariates are represented by ν i . Under normality assumptions, ν i N(0,Σ ν ), β i N(β, Σ β ), and f i N(0,Σ f ). Although the interpretation of individual taste differences is much simplified under the assumption that decision-makers use the same set of attributes in assessing the choice options, some care must be taken in relating these attributes to the option utilities. For example, an ideal-point model may prove superior to the linear representation provided by the factor-analytic model in (9) if individuals choose the option that is closest to their “ideal” or most preferred option (Böckenholt, 1998; Coombs 1964; MacKay, Easley, & Zinnes, 1995). Applications of ranking and paired comparison models with unobserved attributes are reported by Brady (1989), Chan and Bentler (1998), Maydeu-Olivares and Böckenholt (2005), Takane (1987), Tsai and Böckenholt (2001), and Yu, Lam, and Lo (1998). Examining preferences for multi-attributed options, conjoint analysis (Bock & Jones, 1968; Marshall & Bradlow, 2002) uses (9) for the development of new products with optimal combinations of attributes.

When preferences for only a single choice set are elicited, the effects of between- and with in judge variability are confounded. For example, if judges are asked to rank J options only once, it is not possible to identify separately the effects of ε ij and μ ij in (2). Fortunately, with multiple choice sets or when choices are observed over time, it becomes possible to distinguish explicitly these sources of variability. Specifically, in the context of time-dependent choices, parameter-driven dynamic approaches have proven useful to capture dependencies on the within- and between-subject levels (Böckenholt, 2002; Allenby & Lenk, 1994). For example, in the context of modeling voting data collected before and after the 1992 presidential election, Böckenholt (2002) applied the following vector-autoregressive model with

$${{\rm{\mu }}_<Emphasis Type="Italic"> </Emphasis>} = \Upsilon {{\rm{\mu }}_{i,t - 1}} + {\xi _<Emphasis Type="Italic"> </Emphasis>}$$

, where μ it are the options’ mean evaluations of person i at time t, and the (J × J ) coefficient matrix γ contains the multivariate regression effects between the evaluations of the options at two adjacent time points. The diagonal elements of γ contain each option’s autoregressive term and the off-diagonal elements contain the lagged effects of the J − 1 remaining random utilities. The J-dimensional random effect ζ it captures the part of μ it that cannot be predicted on the basis of past evaluations of the options. There are many interesting variations on studying temporal effects on choice including time-continuous models choice (Böckenholt, 2005) and latent change models (Böckenholt & Dillon, 1997b).

Aside from temporal dependencies, there is also strong interest in modeling proximity-based dependencies that are introduced by known or unknown (latent) relationships among individuals (Anselin, 2002; Bradlow et al., 2005). This work goes beyond the standard assumption that individuals make choices in isolation by allowing explicitly for the possibility that choices of individuals are correlated or influenced by each other.

Social Dependencies

During the last decade the study of the effect of social interactions on choice has become an important area of research. Although it has long been recognized (e.g., Veblen, 1899; Leibenstein, 1950) that choices of individuals may depend on the choice behavior of their social network, empirical research on this topic has been slow, mainly, because of methodological problems (Manski, 2000; Brock & Durlauf, 2001). Modeling issues that need to be addressed in empirical work on social interactions are:

  1. (a)

    the identification of the reference group for which social interaction effects are sought to be established;

  2. (b)

    self-selection processes of peer or group members;

  3. (c)

    controls for correlated effects that affect all group members in a similar way; and

  4. (d)

    controls for contextual effects such as exogenous social background characteristics of group members.

Because these issues are difficult to tackle in any observational study, progress is most likely to be made by combining observational with quasi-experimental or laboratory studies.

A basic version of an interaction-based model is obtained by extending (2) to include the influence of the choice behavior of others (Soetevent & Kooreman, 2007).

$${\upsilon _{ij}} = {{\rm{\mu }}_{ij}} + {\varepsilon _{ij}} + {\lambda _{ij}}$$

, where λ ij = γw ij Σki w kj and w ij = 1 if option j is chosen by person i and −1, otherwise. Thus, for γ > 0, the utility of choosing option j when another person chooses the same option as well is larger than the utility of choosing this option when it is not chosen by another person. For γ = 0, the model reduces to the standard Thurstonian choice model. Even for the simple model (11), with fixed effects for the social interaction component, multiple equilibria exist that complicate parameter estimation.

In general, the modeling of interaction structures has been based mainly on mathematical models originating in the area of statistical mechanics (Yeomans, 1992). Although the physical interpretation of these models may be of little interest to social scientists, their mathematical properties are intriguing and deserve to be explored in experiments on social decision-making.

Beyond Choice Data

Because choice data contain little information about the underlying choice process and its determinants, there have been many attempts both to supplement them by considering other data sources such as reaction times or process-tracing data (Böckenholt & Hynan, 1994; Johnson & Busemeyer, 2005) and to complement them by combining revealed and stated preference data (Ben-Akiva et al., 1997), or by collecting comparative judgment data on perceived attributes of the choice options. Below I discuss examples of both approaches by considering risky and multiattribute choices.

Uncertain Choice Outcomes

Many, if not most, real-life decisions are based on a mix of information and subjective expectations about the choice options under consideration. For example, the selection of a job may be based on a job description as well as on expectations about the career path. Purchases of over-the-counter drugs may be influenced by the drugs’ ingredients but also by expectations about the ingredients’ effectiveness and quality perceptions of the manufacturers. If these subjective expectations are rational or well-calibrated, it is possible to infer both expectations and utilities from choices alone. However, there is much evidence to suggest that this assumption is difficult to justify in general (see Kahneman, 2003, for a recent review). As a possible solution to this dilemma, Manski (2004) proposed to measure separately expectations in the form of subjective probabilities and combining them with the choice outcomes. Although much care needs to be taken in the elicitation of subjective probabilities, this approach is likely to mitigate problems arising from assuming that expectations are rational.

Consider option j with M outcomes (e j1, …, e jM ). By specifying the utility function to be additively separable in outcome and the outcomes to be binary, (2) can be extended to

$${\upsilon _{ij}} = \sum\limits_{m = 1}^M {\left[ {{P_{ij}}({e_m} = 1){{\rm{\mu }}_{im({e_j} = 1)}} + (1 - {P_{ij}}({e_m} = 1)){{\rm{\mu }}_{im({e_j} = 0)}}} \right] + {\varepsilon _{ij}}} $$

, where ε ij denotes the unsystematic response variability in the evaluation of option j and P ij (e m = 1) is the subjective probability that outcomemoccurs when option j is selected. A simpler version of (12) was proposed by Shuford, Jones, and Bock (1960) who asked respondents to choose between lotteries with known outcome values and probabilities (see also Böckenholt, 2004). Although (12) is a useful starting point for studies on decision-making under uncertainty, it may require further extensions that take into account nonbinary outcomes and rank dependence in the valuation of the option outcomes and their subjective probabilities (Luce, 2000). However, this call for additional research should not distract from the observation that combining preference data with subjective probabilities is a promising avenue for psychometric analyses of decision-making when only partial information about the choice options is available.

Multiattribute Choices

In many applications, it is useful to collect choice data with respect to several attributes to understand howthey relate to each other. In this case, one needs to distinguish between two sources of response dependencies introduced by the comparison of the different options with respect to the same attribute (within-attribute dependence), and by the evaluation of the same options on different attributes (between-attribute dependence). Böckenholt (1996) suggested the following decomposition of the covariance matrix of the multivariate judgments (r ia , r ib , …, r iq ) with respect to q attributes:

$$\sum {_{\rm{\mu }} = } \sum {_C} \otimes \sum {_o} $$

, where Σ C represents the associations among the q attributes, and Σ O represents the associations among the J options. For example, in the two-attribute case, when the covariance matrix of the options is represented by an identity matrix, we obtain

$$\sum {_{\rm{\mu }} = } ({\rho _{ab}}I) \otimes I = \left( {\matrix{ I \cr {{\rho _{ab}}I} \cr } \matrix{ {{\rho _{ab}}I} \cr I \cr } }\right)$$

, and ρ ab is the correlation between the attributes a and b.

Future Uses of Thurstonian-Based Analyses

Parallel to these expansions of random utility models, a growing literature has developed over the years that questions the descriptive value of RUMs. This work showed that the consideration of unsystematic error in (2) is insufficient to account for inconsistent choice behavior. Already May (1954) presented convincing evidence that strategies for comparing alternatives can lead to systematic violations of such internal consistency criteria as weak stochastic transitivity which is defined for the three options j, k, and l as

$$\Pr (j \succ k) \ge .50{\rm{ and Pr(}}k \succ l) \ge .50{\rm{ }} \Rightarrow {\rm{ Pr}}(j \succ l) \ge .50$$

. More generally, in the same way as the meaning of a word is clarified by the context of the sentence in which it appears, the valuation of a choice option seems to depend frequently on the context of the choice set. The “asymmetric dominance” and the “compromise” effects are two well-known examples of systematic violations of expansion consistency (Simonson & Tversky, 1992). In the asymmetric dominance case, the dominance relation between two options affects their evaluations. This effect is not present when each of the options are presented in combination with an option that is neither dominating or dominated. The compromise effect occurs when a two-attribute option is presented as falling between two extreme options that are strong on one but weak on the other attribute. In this case, the “compromise” option seems to receive a boost for having medium values on both attributes. Such relational effects cannot be captured by models that consider context-independent evaluations of choice options.

Shafir and LeBoeuf (2002) review a long list of factors that have been shown to affect decision processes as well as possible explanations that can account for these effects. This list includes contextual effects (e.g., relational features such as dominance among choice options, temporal features of the choice situation), choice processes (e.g., the evaluability of options, decision strategies, information search strategies), presentation formats, frames, cultural and social norms, as well as characteristics of the decision-maker (e.g., emotional state, general intelligence, numeracy). In view of this list, it is perhaps surprising how well (2) can work in providing satisfactory accounts of choice data.

Because of the experimental nature of studies investigating systematic violations of RUMs, the literature contains only a few quantitative accounts on the degree to which observed choice behavior is consistent or inconsistent with a random utility representation (Ryan, Netten, Skatun, & Smith, 2006). This lack of information contributes to the ongoing controversy about the status of RUM models—whether they should be abandoned or extended in specific ways to increase their applicability. To resolve this debate, meta-analyses are needed but also new modeling approaches that allow identifying factors that characterize consistent and inconsistent choosers. For example, the π* model (Rudas, Clogg, & Lindsay, 1994) could be employed usefully by including one mixture component based on (2) and a second one that captures deviations from this model to estimate the proportion of judges whose choices are inconsistent with a random utility representation. Alternatively, multiheuristic mixture models could be developed where each mixture component represents a different choice heuristic (Mellers & Biagini, 1994; Gonzalez- Vallejo, 2002; Wedel & Kamakura, 1999) that may lead to both consistent as well as inconsistent choices. A third avenue towards a more descriptive account of choice behavior is to replace the notion that utilities are predetermined and stable by a more dynamic view of utilities (Liechty, Fong, & DeSarbo, 2005). This approach was taken by Tsai and Böckenholt (2006) who introduced a more general framework for the analysis of pairwise judgments that does not require the two assumptions of independent judgments on the individual level and fixed utilities across choice sets. Instead, they decomposed the latent judgments as

$$\eqalign{ & {\upsilon _{ij(k)}} = {{\rm{\mu }}_{ij(k)}} + {\varepsilon _{ij(k)}}, \cr & {\upsilon _{ik(j)}} = {{\rm{\mu }}_{ik(j)}} + {\varepsilon _{ik(j)}} \cr} $$

, where ε j(k) and μ ij(k) denote the random variation and person-specific evaluation of option j, respectively, when compared to option k. As a result, (6) changes to

$${y_{ijk}} = {\upsilon _{ij(k)}} - {\upsilon _{ik(j)}} = {{\rm{\mu }}_{ij(k)}} - {{\rm{\mu }}_{ik(j)}} + {\varepsilon _{ij(k)}} - {\varepsilon _{ik(j)}}$$

.

For multiple comparisons with J items, the paired comparison judgments can again be written conveniently as a linear model. Let ε = (ε 1(2), ε 1(3), …, ɛ 1(J), ε 2(1), ε 2(3), …, ε,(J)(J−1))′ be a J(J − 1)-dimensional random vector, and let B be a \(\left( {\matrix{ J \cr 2 \cr } } \right) \times J(J - 1)\) matrix where each column of B corresponds to one of the \(\left( {\matrix{ J \cr 2 \cr } } \right)\) paired comparisons and each row to one of the J (J − 1) terms in ε i . For example, when J = 3, ε = (ε 1(2), ε 1(3), ε 2(1), ε 2(3), ε 3(1), ε 3(2))′ and

$$B = \left( {\matrix{ 1 & 0 & { - 1} & 0 & 0 & 0 \cr 0 & 1 & 0 & 0 & { - 1} & 0 \cr 0 & 0 & 0 & 0 & 0 & { - 1} \cr } } \right)$$

. For example, in a comparison of the items j, k, l, and m with context-free mean evaluations of the choice options, we obtain

$${y_i} = \left( {\matrix{ {{y_{ijk}}} \cr {{y_{ijl}}} \cr {{y_{ijm}}} \cr {{y_{ikl}}} \cr {{y_{ikm}}} \cr {{y_{ilm}}} \cr } } \right) = \left( {\matrix{ 1 & { - 1} & 0 & 0 \cr 1 & 0 & { - 1} & 0 \cr 1 & 0 & 0 & { - 1} \cr 0 & 1 & { - 1} & 0 \cr 0 & 1 & 0 & { - 1} \cr 0 & 0 & 1 & { - 1} \cr } } \right)\left( {\matrix{ {{{\rm{\mu }}_i}_j} \cr {{{\rm{\mu }}_{ik}}} \cr {{{\rm{\mu }}_{il}}} \cr {{{\rm{\mu }}_{im}}} \cr } } \right) + \left( {\matrix{ {{\varepsilon _{ij(k) - }}{\varepsilon _{ik\left( j \right)}}} \cr {{\varepsilon _{ij\left( l \right) - }}{\varepsilon _{il(j)}}} \cr {{\varepsilon _{ij(m)}} - {\varepsilon _{im(j)}}} \cr {{\varepsilon _{ik(l)}} - {\varepsilon _{il(k)}}} \cr {{\varepsilon _{ik(m)}} - {\varepsilon _{im(k)}}} \cr {{\varepsilon _{il(m)}} - {\varepsilon _{im(l)}}} \cr } } \right) = A{{\rm{\mu }}_i} + B{\varepsilon _i}$$

, where ε i = (ε ij(k) , ε ij(l) , ε ij(m) , ε ik(j) , ε ik(l) , ε ik(m) , …, ε im(j) , ε im(k) , ε im(l) )′

Decomposing \({{\rm{\mu }}_i}{\rm{ = \mu }} + {\nu _i}\) and under multivariate normality assumptions for ε i N(0, Σ ε ), ν i N(0, Σ ν ), and ε i independent of ν i , we obtain for the joint distribution of ν i , ε i , and y i ,

$$ \left[ {\begin{array}{*{20}c} {v_i } \\ {\varepsilon _i } \\ {y_i } \\ \end{array} } \right] \sim N\left( {\left[ {\begin{array}{*{20}c} {0_J } \\ {0_{J\left( {J - 1} \right)} } \\ {A\mu } \\ \end{array} } \right],\left[ {\begin{array}{*{20}c} {\Sigma _v } \\ 0 \\ {A\Sigma _v } \\ \end{array} \begin{array}{*{20}c} {0'} \\ {\Sigma _\varepsilon } \\ {B\Sigma _\varepsilon } \\ \end{array} \begin{array}{*{20}c} {\Sigma _v A'} \\ {\Sigma _\varepsilon B'} \\ {\Sigma _y } \\ \end{array} } \right]} \right) $$

, and \( \Sigma _y = A\Sigma _v A' + B\Sigma _\varepsilon B' \).

The covariance matrix Σ ε can provide useful insights about the dependencies between the options’ utilities over different choice sets. For example, a high covariance between the utilities of option j across different option pairs indicates strong consistency in the choices involving this option and hence a low incidence of intransitive choices. A parsimonious, and easy to interpret, representation of Σ ε is obtained when decomposing it as

$${\sum _\varepsilon } = \sum\limits_{j = 1}^J {({D_j} \otimes {S_j}} )$$

, where D j is a J × J matrix with positive (j, j) element d j and zero otherwise, and S j is a (J − 1) × (J − 1) covariance matrix for option j.

Tversky’s (1969) Gambling Study

Tsai and Böckenholt (2006) applied (16) in a replication of Tversky’s (1969) intransitivity study. In this experiment, subjects were presented with four lotteries of the form ($x,p; 0) where the amount x can be won with probability p and nothing otherwise. The winning probabilities of the four lotteries were \({7 \over {24}}\), \({8 \over {24}}\), \({9 \over {24}}\), and \({{10} \over {24}}\) and the corresponding payoffs were $20, $19, $18, and $17, respectively. The gambles were constructed in such a way that payoffs are negatively correlated with expected values. Probabilities were presented in the form of difficult-to-compare pie sector diagrams, with the expectation that some participants would ignore small probability differences and focus on the payoffs instead. For large differences between the probabilities, it was expected that participants would consider probabilities in combination with the payoff in their decision to select the more attractive lottery. To test this hypothesis formally, Tsai and Böckenholt (2006) specified the following covariance matrix for the lottery ($20, \({7 \over {24}}\); 0) when compared to the other three lotteries:

$${S_a} = \left( {\matrix{ 1 & { - \tau } & { - \tau } \cr { - \tau } & 1 & \tau \cr { - \tau } & \tau & 1 \cr } }\right)$$

. Because of the expected shift in the decision strategy, evaluations of lottery ($20, \({7 \over {24}}\); 0) when compared to lottery ($19, \({8 \over {24}}\); 0) were expected to be negatively correlated. In contrast, evaluations of lottery ($20, \({7 \over {24}}\); 0) when compared with lotteries ($18, \({9 \over {24}}\); 0) and ($17, \({{10} \over {24}}\); 0) were expected to be positively correlated since they are based on similar judgmental processes in this case. The correlational pattern for the other three lotteries can be derived by applying the same reasoning.

The covariance structure (16) in combination with (17) was estimated for single subject data. Table 1 displays the observed binary and trinary marginal probabilities as well as the model predictions for one of the subjects and shows that (16) can capture well the dependencies between the repeated evaluations of the lotteries (with ^τ = .63 (.06)) and, equally important, it can account for the observed violations of weak stochastic transitivity. Thus, in contrast to the standard formulation of random utility models which assume independent judgments on the individual level and fixed utilities across choice sets, (16) can represent both consistent and inconsistent choices satisfactorily.

Table 1 Observed and expected probabilities for RUM models with stable and correlated utilities

Tsai and Böckenholt (2007) illustrate further the usefulness of distinguishing explicitly between within- and between-judge variability based on (15) in combination with (16) in a reanalysis of the intertemporal choice data reported by Roelofsma and Read (2000). As in the replication of the Tversky (1969) study, (16) proved to be well-suited to account for systematic transitivity violations which were caused in this study by inconsistent trade-offs between “time” and “money” attributes characterizing the choice options. Extensions of the model framework to analyze other choice anomalies are currently underway. These studies show that both the compromise and the attraction effect can be modeled parsimoniously using the decomposition (16) of the within-judge covariance matrix (Böckenholt & Tsai, 2007).

Relaxing the assumption of fixed and predetermined utilities appears to be a promising approach to describe seemingly inconsistent choice behavior. By allowing the individual-level utilities to vary in repeated evaluations of the same options in different choice sets, this approach can accomodate a considerably wider range of inconsistent choice behavior than has been possible so far. Equally important, with this framework it becomes possible to relate the utilities’ reliability estimates to both context- and person-specific covariates with the result that one can test rigorously determinants of variables and possibly inconsistent utility assessments.

Conclusion

Do people choose the options they enjoy most? There is much evidence to suggest that the answer is negative: People do not always know what they like and their ability to forecast future utilities of potential choices appears to be systematically biased (Kahneman & Thaler, 2006). The subsequent question about necessary and/or sufficient (environmental) conditions that facilitate utility maximization has received less attention so far. Notable exceptions include the notion of “libertarian paternalism” (Sunstein & Thaler, 2003) which sets up default options in such a way as to help people in their utility maximization. Along the same lines, a recent large-scale study by McFadden (2006) on choices among Medicare-approved plans demonstrates the importance of aiding consumers in helping themselves instead of relying on their self-interest in making optimal choices (see also Lynch & Wood, 2006).

Although behavioral research on choice behavior points convincingly to limitations of random utility models, few would dispute the usefulness of these models in rendering parsimonious descriptions of how individuals perceive and evaluate choice options. Because they are based on a limited explanatory framework for how people make choices, random utility models can provide both a parsimonious quantitative description of choice outcomes and a flexible framework for modeling individual and contextual differences in choice behavior at a particular point in time and over time. However, generalizations of RUM results to different choice situations and options require care and need to be based on additional validation studies. The consideration of random response errors and unstable preferences alone is not sufficient to account for deviations from utility maximization.

There are many research opportunities on the horizon for psychometricians. Better measures and indicators of utility are needed that capture the hedonic experience associated with a choice outcome. To infer utility from choices alone without taking into account anticipated or experienced hedonic reactions no longer seems sufficient (Kahneman, 2003). Measures of brain activity and brain electrochemistry in combination with experimental treatments are starting to become available to provide much needed insights on the links between choice and sensations of pleasure and pain but statistical methods are lacking for effective analyses of these links (Montague, King-Casas, & Cohen, 2006). The development of new measures and indicators is facilitated further by expanding their connections to psychological concepts and processes. For example, affective and motivational mechanisms are becoming integrated in theories of individual choice behavior and have led to new concepts such as “irrational wanting” and “subrational liking” (Winkielman & Berridge, 2003), pointing to obvious limitations in current choice modeling frameworks. Recently, Mourali, Böckenholt, and Laroche (in press) showed that the compromise and asymmetric dominance effects can be weakened or strengthened depending on whether a person has a promotion or prevention focus (Higgins, 1997), demonstrating that motivational factors need to be taken into account when modeling choice data. In general, the search for choice models that are behaviorally more realistic but still tractable is complicated greatly by identifiability and endogeneity issues. Current choice models for unstable preferences, uncertain outcomes, or social interactions suffer from difficulties both in separating different sources of heterogeneity and in identifying multiple equilibria, all of which needs to be studied carefully with the help of well-designed empirical studies. Clearly, challenges for modeling and predicting choice behavior are abundant but they also assure the future well-being of psychometrics.