6.1 WTP-Space Versus Preference Space

The estimation of WTP is one of the main outcomes of DCE studies. WTP measures are crucial, as they inform policy makers with regard the values people attach to goods and/or services which, in turn, can help the former tailor pricing (Hanley et al. 2003) and they can be typically used in cost–benefit analyses.

Computation of WTP values, however, is not always straightforward (see also Sects. 5.4 and 5.5). This is because WTP estimates are typically calculated as the ratio of two coefficients (with the cost coefficient being the denominator), which, in the case of models with random parameters, leads to a ratio of two distributions. In the case of WTP estimates derived from a RP-MXL, the estimates depend on the distributional assumptions imposed by the researcher for each of the coefficients. Conventional utility specifications often imply implausible distributions of welfare estimates, given that the typical estimate of WTP is retrieved by the ratio of two distributed coefficient estimates. The problem is that values of the denominator that are close to zero (most likely within standard distributions such as the log-normal) cause the ratio to be extremely large, thereby implying an unrealistic derived distribution of WTP due to a long upper tail (Train and Weeks 2005; Scarpa et al. 2008).

Some attempts to address this problem rely on strong and generally unrealistic assumptions, such as imposing a fixed cost coefficient: when the coefficient of an attribute is distributed normally, this implies that the WTP for that attribute is also normally distributed, as the two distributions take the same form. This assumption cannot be universally recommended for general application, although it may be appropriate in some cases (Hole and Kolstad 2012). As Train and Weeks (2005) stressed, assuming a fixed price coefficient implies that the standard deviation of unobserved utility (i.e. the scale parameter) is the same for all observations, thereby implying that the marginal utility of money is the same for each respondent. In fact, the scale parameter can—and in many situations clearly does—vary randomly over observations. Hence, one increasingly adopted alternative is the estimation of models in which the distribution of the welfare measure is modelled directly. In the traditional framework, models parameterised in coefficient distributions are called “models in preference space”, whereas models parameterised in WTP distributions are “models in WTP space”. In other words, the parameters are the (marginal) WTP for each attribute rather than the utility coefficient of each attribute. The first examples of models in WTP space are provided by Cameron and James (1987) and Cameron (1988).

For random coefficient models, the issue of which parameterisation to use is more complex and potentially more important, as noted by Train and Weeks (2005). The latter are the seminal papers of parametrisation in WTP space, Scarpa et al. (2008) provided the first application in environmental economics. Mutually compatible distributions for coefficients and WTPs can be specified either way but differ in their convenience for assigning parameter distributions and imposing constraints on these distributions.

According to (1.4) and to (1.5), the utility of respondent n for alternative j in choice occasion t is specified in the WTP-space (Train and Weeks 2005) as:

$$U_{njt}^{*} = { }\alpha_{n}^{*} \left( {\omega_{n} x_{njt}^{^{\prime}} - p_{njt} } \right) + \varepsilon_{njt}$$

where \(\alpha_{n}^{*}\) is the price coefficient/scale parameter, \(\omega_{n}\) is a conformable vector of marginal WTPs for each non-monetary attribute and \(p_{njt}\) is the cost attribute.

So the question then becomes: Is it better to use preference space or WTP space? Train and Weeks (2005) compare models using normal and log-normal distributions in preference space with those using normal and log-normal distributions in WTP space. Both studies found that models in preference space fit the within-sample data better than models in WTP space using different data sets. Both studies also found that distributions of WTP derived from estimated models in preference space have unreasonably large variances. Models in preference space imply that large proportions of respondents will pay unreasonably large sums to obtain/avoid extra units of non-price attributes. Models in WTP space exhibited smaller variances for WTP, implying smaller proportions of very large WTP values. Similar results were obtained by following previous studies (e.g. Hole and Kolstad 2012). Some studies, however, found models in WTP space that outperform models in preferences space also in terms of goodness of fit (Scarpa et al. 2008; Bae and Rishi 2018; Waldman and Richardson 2018). Overall, the accumulated empirical evidence suggests that models in WTP space yield a more reliable estimation of WTP space value, while there is no conclusive evidence about which model parametrisation performs better in terms of data fit. The adoption of models in preference or WTP space should therefore be evaluated according to the empirical case and to the purpose or the study (Hess and Rose 2012). Sensitivity testing is also recommended when choosing between preference and WTP space models (Hole and Kolstad 2012). Hypotheses regarding the WTP distributions can be tested directly by imposing restrictions on the distribution (i.e. the mean or the standard deviation) in the estimation and a subsequent likelihood-ratio test of the unrestricted and the restricted model (Thiene and Scarpa 2009). Importantly, when the goal is to obtain WTP measures to be used for policy purposes, parametrisation in WTP space seems to be the best option.

Models in WTP space, which might have convergence issues, can be estimated with several software commonly used in choice models, such as Biogeme (Bierlaire 2020), R (Sarrias and Daziano 2017; Hess and Palma 2019) or Stata by means of the codes provided by Hole (2020), and the DCE package for Matlab (Czajkowski 2020).

6.2 Scale Heterogeneity

The RUM assumes that latent utility consists of a deterministic part and a random error term (see Sect. 1.2). According to Eq. (1.3), the utility \(U_{njt}\) is decomposed into \(V_{njt} + \varepsilon_{njt}\), where \(V_{njt}\) is the deterministic, quantifiable proportion of utility including both observables of the alternatives (the choice attributes) and of the individuals (age, gender, income, etc.) and \(\varepsilon_{njt}\) is the unobservable or random part associated with the utility. A RUM model requires an assumption about the distribution of these random effects. The scale parameter, which is by definition part of a RUM model, expresses the relationship between the observable and the random component as part of the overall latent utility. They are inversely related to the variance of the random component and cannot be separately identified from taste parameters generally denoted as \(\beta\).

If the utility of all alternatives is multiplied by a constant, the alternative with the highest utility does not change. To take this fact into account, the scale of the utility must be normalised. As mentioned in Sect. 1.2, the model (1.4)

$$U_{njt} = V_{njt} + \varepsilon_{njt} = x_{njt}^{^{\prime}} \beta + \varepsilon_{njt}$$

is equivalent to (1.5)

$$U_{njt}^{*} = \lambda V_{njt} + \lambda \varepsilon_{njt} = x_{njt}^{^{\prime}} \left( {\lambda \beta } \right) + \lambda \varepsilon_{njt} ,$$

where the scale parameter is denoted as \(\lambda\). Scale then describes the relationship between the observable and the random component of utility. If scale is equal across individuals and thus the same within a given data set, it does not impact on the relationships between utilities.

Normalising the scale of utility is usually achieved through normalising the variance of the error terms (see Sect. 1.2). The error variance, however, can differ across respondents and data sources, for example. Combining different data sets therefore requires us to control for scale differences as identical utility specifications from different data sources with unequal variance will differ in magnitude (Swait and Louviere 1993). Scale heterogeneity, and thus the variance of the error term, might not only differ among data sets but also among respondents within a data set. Given that scale describes the relationship between factors included in the model and factors not included in the model, it might be interpreted as an indicator for choice consistency. The variance of the random error component could indicate whether respondents made more deterministic or more random choices, with a higher error variance indicating less consistent choices. Examples for applications using this approach are, among many others, studies concerned with the effects of choice task complexity (e.g. Carlsson and Martinsson 2008b), of preference uncertainty (e.g. Uggeldahl et al. 2016) or of learning effects when respondent move through the sequence of choice tasks (e.g. Carlsson et al. 2012; Czajkowski et al. 2014) or anchoring and learning effects due to instructional choice sets (Ladenburg and Olsen 2008; Meyerhoff and Glenk 2015).

All these approaches are based on the specific assumption that the observed simultaneous correlation between all attributes is a result of scale heterogeneity. However, as noted by Hess and Rose (2012) and Hess and Train (2017), correlations among coefficients can have various sources, and scale heterogeneity is only one of them. If models are constrained in such a way that scale heterogeneity should come out as a separate parameter, they are also likely to pick up other forms of correlation and it is not possible, as already said, to separate them.

Two conclusions can be drawn from this. Firstly, it is important to check carefully whether scale heterogeneity is a valid indicator for what the analyst wishes to investigate given that it may pick up more than one source of correlation. Comparing scale parameters across exogenously determined groups might be an indicator for differences in the choice behaviour of the groups, but the interpretation has to account for the restriction the analyst imposed on his or her model. Secondly, after running basic MNL models to become familiar with the data, the next step could be to proceed with the analysis of the most flexible model specification (see Sect. 5.3). Depending on the model specification, a RP-MXL with a full random utility coefficient covariance matrix can account for correlations due to a behavioural phenomena as well as scale heterogeneity (Hess and Train 2017). Given the results of this model, the analyst might subsequently impose restrictions.

An important question in applied research is often whether the restrictions lead to different welfare measures. Thus testing welfare measures resulting from models with full and restricted covariance can be informative. Mariel and Meyerhoff (2018), for example, have not found significant differences between an unrestricted (accounting for scale heterogeneity) and a restricted (not accounting for scale heterogeneity) RP-MXL. This finding, of course, cannot be generalised as it might be data specific.

6.3 Information Processing Strategies

DCE is based on the economic theory of consumer behaviour (McFadden 1974), which posits three axioms about an individual’s preferences: they are complete, monotonic and continuous. Continuity of preferences implies that individuals use compensatory decision-making processes, that is, they take into account all the available information to make their decisions. Typically, in a DCE, this implies that respondents make trade-offs between the levels of each attribute to choose their preferred alternative. However, in practice individuals often lack both the ability and the cognitive resources to evaluate all the information provided to them (Cameron and DeShazo 2010). For this reason, it has been argued that individuals behave in a rationally adaptive manner by seeking to minimise their cognitive efforts while at the same time aiming to maximise their benefits when making choices (DeShazo and Fermo 2004). For instance, people may not have well-defined preferences, but may construct them at the moment of the choice occasion. Moreover, rather than using a fixed decision strategy in all choice occasions, individuals may adopt different strategies in different situations. Often such strategies imply selective use of information and avoidance of trade-offs (Chater et al. 2003). Accounting for these aspects is important in DCE applications, as incongruency between DCE modelling assumptions and actual choice behaviour can lead to biased results and inaccurate forecasts (Hensher 2006). For this reason, a rapidly increasing body of literature deals with the investigation of information processing strategies that individuals adopt when making choices.

The study of information processing strategies is rooted in psychological theories of choice that assume a dual-phase model of the decision-making process (Kahneman and Tversky 1979; Thaler 1999). The first phase relates to the editing of the problem, whereas the second relates to the evaluation of the edited problem. The main function of the editing operations is to organise and reformulate the alternatives in order to reduce the amount of information to be processed and thus simplify choices (Kahneman and Tversky 1979). The main function of the evaluation operations is to select the preferred alternative (Hess and Hensher 2010). The editing phase often involves the adoption of heuristic strategies that determine the way in which information is processed to produce the choice outcome. A heuristic is a strategy that mainly, although not exclusively, consists of ignoring part of the information, with the purpose of making decisions more quickly and with less cognitive effort (Gigerenzer and Gaissmaier 2011). The adoption of heuristics strategies can be influenced by the cognitive ability of the respondent, his/her attitudes and believes, his/her knowledge of the item to be evaluated and socio-demographic characteristics (Deshazo and Fermo 2002). Yet, processing strategies, while important for setting up correct econometric models, are not specific to DCE or SP surveys, as people use these quite often in real life.

One of the information processing strategies most commonly investigated in the DCE literature is the so-called attribute non-attendance (ANA), which refers to respondents ignoring certain attributes when making their choices (Hensher et al. 2006). ANA will be described in detail in Sect. 6.5. Other strategies that have received attention in the DCE literature are: (i) lexicographic preferences; (ii) elimination-by-aspects or selection-by-aspects; (iii) majority of confirming dimensions.

Lexicographic preferences have been commonly investigated in DCE studies (Sælensminde 2006; Campbell et al. 2008; Rose et al. 2013). Individuals that adopt such a strategy rank the attributes from the most to the least important and make their choices based solely on the levels of the most important one(s) (Foster and Mourato 2002). Lexicographic preferences violate the continuity axiom of the neoclassical framework and empirical studies suggest that this strategy has a significant impact on the results obtained from discrete choice models, i.e. biased WTP values (Campbell et al. 2008; Rose et al. 2013). The adoption of this strategy seems to be influenced by the design of the study and by the respondents’ characteristics. For example, Sælensminde (2006) found that individuals with a relatively high level of education are less likely to adopt lexicographic behaviour.

Another common information processing strategy is the elimination-by-aspects heuristic. When this heuristic is applied, individuals gradually reduce the number of alternatives in a choice set, by eliminating those that include an undesirable aspect. For example, respondents may eliminate alternatives that are deemed too expensive, as for some of them cost is a key attribute (Campbell et al. 2014). One alternative is evaluated at time until a limited number of alternatives remain in the choice task and the choice requires lower cognitive effort. Individual motivations and/or goals (also known as antecedent volitions, as they antecede and direct decision-making process) may lead respondents to reduce and select choice sets (Thiene et al. 2017). Several DCE studies (Campbell et al. 2014; Erdem et al. 2014; Daniel et al. 2018) found evidence of the adoption of such a heuristic in empirical applications and highlight the importance of accounting for it in the econometric analysis. Daniel et al. (2018), for example, found that WTP values are overestimated when the adoption of this strategy is not taken into account.

Selection-by-aspects is a heuristic akin to elimination-by-aspects: in this case, rather than excluding alternatives with undesirable traits, individuals form choice tasks which include only alternatives with desirable ones (e.g. high level for an important attribute). Finally, under the majority of confirming dimensions heuristics (Russo and Dosher 1983), respondents compare alternatives in pairs, rather than evaluating all of them simultaneously. The “winning alternative” is then compared to another one until the overall preferred alternative is identified. Hensher and Collins (2011) and Leong and Hensher (2012) provide corroborating evidence of the adoption of such a strategy in DCE studies.

While the adoption of heuristics depends on an individual’s characteristics and cognitive ability, there is corroborating evidence that their use also depends on the design of the DCE (Mørkbak et al. 2014; Campbell et al. 2018). Heuristics have been found more likely to be adopted when the DCE exercise requires substantial cognitive effort, for example, due to a large number of attributes, levels and alternatives (Collins and Hensher 2015). As such, accounting for heuristics is particularly advisable in empirical applications that require complex scenarios in order to accurately describe the good/service under evaluation. From a practical viewpoint: (i) the application of the above-applied information strategies is closely related to the definition of the choice tasks, so careful work with a focus group, rigorous preparation of the choice task and piloting can partially help avoid them; (ii) as the list of heuristic strategies is quite long, we will focus on one example, ANA, particularly relevant for DCE (see Sect. 6.5).

6.4 Random Regret Minimisation—An Alternative to Utility Maximisation

RRM has been introduced by Chorus et al. (2008) as an alternative to the RUM paradigm commonly used in discrete choice studies. RRM assumes that individuals choose between alternatives with the goal of avoiding the situation in which a non-chosen alternative turns out to be better than the chosen one, i.e. a choice that the individual would regret. Hence, the individual is assumed to minimise regret rather than to maximise utility (Chorus 2012). From an analytical point of view, the level of anticipated random regret is composed of an i.i.d. random error, which represents unobserved heterogeneity in regret, and a systematic regret component. Systematic regret is given by the sum of all binary regrets that arise from the comparison between an alternative and each of the other alternatives.

A central question in RRM literature has been whether and under which conditions RRM can be used as an alternative to the RUM. Chorus et al. (2014) carried out a meta-analysis on 21 studies to explore “to what extent, when and how RRM can form a viable addition to the consumer choice modeller’s toolkit”. Their analysis highlights how neither of the two paradigms performs consistently better than the other, and how the differences in goodness of fit are in most cases small. Interestingly, the authors note that there are some specific empirical contexts in which a paradigm performs frequently better than the other. They highlight how regret plays an important role when the choice is either considered difficult and/or important for the individuals, when they feel they will need to justify their choice to other people and when they are not familiar with the good or service under analysis. For example, RRM has been found to perform better than RUM in contexts such as car type and energy choices (Boeri and Longo 2017), whereas choices are often more consistent with RUM in other contexts such as leisure time activities (Thiene et al. 2012). It is also important to note that while often performing similarly in terms of model fit, several studies found that the two paradigms substantially differ in terms of forecasting and prediction of market shares for products or services (Chorus et al. 2014; van Cranenburgh and Chorus 2017). For this reason, the choice of paradigm may have a substantial practical impact. Some studies also found evidence that the relative performance of the two paradigms is influenced by experimental design features. For example, the effect of the opt-out option on RRM and RUM performance has received attention in the literature. Chorus (2012) and Thiene et al. (2012) suggest RRM may be less suitable for the analysis of choices where an opt-out alternative is presented, since this alternative cannot be compared to other alternatives at the attribute level and as such regret cannot be experienced. Hess et al. (2014) further investigate such effects and conclude that not only does the presence of an opt-out option affect the model’s performance, but also the way in which the option is presented. In particular, their results suggest that RRM performs worse when the opt-out option is framed as a “none of these”, while the RUM performs worse when it is framed as an “indifferent” option. Moreover, van Cranenburgh et al. (2018) found that RUM-efficient designs can be statistically highly inefficient in cases where RRM better represents an individual’s choice behaviour, and vice versa. A possible alternative to the choice of a RUM- or RRM-based choice model is the adoption of hybrid models where both decision processes co-exist in the same population. Hess et al. (2012) proposed the use of a latent class approach where different decision rules are used in different classes. Boeri et al. (2014) add to this, by allowing for random taste heterogeneity within each behavioural class. Chorus et al. (2013), however, proposed a MXL where instead of distinguishing sub-groups of respondents, a subset of attributes is subject to RUM, and another subset subject to RRM. Kim et al. (2017) incorporated a hybrid RUM–RRM model into an HCM framework. In most cases, these hybrid models perform better than models that assume the same rule of behaviour (RRM or RUM) for each attribute. Chorus et al. (2014), however, do not suggest the blind adoption of hybrid RUM–RRM models and rather suggest the adoption of the same practices outlined for the choice between pure RUM or RRM model (e.g. comparison of model fit or simultaneous estimation of different models). RRM models have some difficulties in deriving WTP measures, which is crucial as this is often one of the purposes in most environmental DCE applications, although recently relevant progress has been made (Dekker and Chorus 2018).

In terms of best practices, the following suggestions can be derived from the literature, which are for the most part in line with the indications of Chorus et al. (2014): (1) to choose RUM or RRM in contexts in which one of the paradigms often performs better than the other (e.g., RRM for car choices); (2) in the case of studies which specifically focus on RRM models, the formulation of the opt-out option (if present) and the type of experimental design should be carefully chosen; (3) to estimate both RUM and RRM models on a given dataset and then choose the model with the best fit for further analyses and the derivation of relevant output for policymakers (e.g. elasticities); and (4) to avoid choosing either of the two models, rather to implement them both simultaneously and then jointly use outcomes from RUM and RRM to construct a number of behavioural scenarios using either a RUM, RRM or hybrid RUM–RRM model.

A comprehensive website that, among other features, provides estimation codes for different random regret models (P-RRM model, µRRM model, the G-RRM model and various latent class models) and different software packages (Biogeme, Apollo R, Matlab, Latent Gold Choice) is van Cranenburgh (2020). Moreover, also available on this website is advice on how to generate decision rule robust experimental designs (see also van Cranenburgh et al. 2018 and van Cranenburgh and Collins 2019).

6.5 Attribute Non-attendance

As mentioned in Sect. 6.3, respondents do not necessarily consider all attributes within a choice set when making their choices. Non-attended attributes in the choice set imply non-compensatory behaviour: independently from the improvement of an attribute level—if the attribute itself is ignored by the respondent—then such an improvement will fail to compensate for a worsening in the levels of the other attributes. As a consequence, respondents using such a strategy raise a problem for neoclassical analysis as they cannot be represented by a conventional utility function (Hensher 2006). In the absence of continuity, there is no trade-off between two different attributes. Without a trade-off, there is no computable marginal rate of substitution and, crucially for non-market valuation, no computable WTP.

ANA may arise due to several reasons, such as an individual’s attitudes (Balbontin et al. 2017), knowledge/familiarity with the attributes (Sandorf et al. 2017), task complexity (Weller et al. 2014; Collins and Hensher 2015), unrealism of the attribute’s levels (i.e. respondents ignore an attribute because they feel the proposed levels are unattainable) and genuine disinterest towards an attribute (Alemu et al. 2013). Carlsson et al. (2010) and Campbell et al. (2008) found that ignoring ANA impacted model fits and welfare estimates.

The identification of ANA has been carried out with different methods in the literature. The two most common approaches are stated ANA and inferred ANA (Hensher 2006; Scarpa et al. 2009a, 2010). A third approach to identify ANA has been recently proposed, and it is based on eye-tracking. This has been referred as visual or revealed ANA (Balcombe et al. 2015).

The stated ANA approach involves asking respondents directly whether they ignored one or more attributes when making choices. This is usually done by including such a question in the survey specific questions after the choice scenarios. Several approaches have been proposed in the literature to inform discrete choice models with the answers to such questions. A common approach, described in Hensher et al. (2005) and then adopted by following studies (Hensher et al. 2007b; Kaye‐Blake et al. 2009; Kragt 2013), is to specify choice models in which the coefficient of attributes that respondents state to have ignored is constrained to zero. Some authors have extended this kind of approach, by reducing the magnitude of ignored coefficients by means of shrinking parameters, instead of constraining them to zero (Alemu et al. 2013; Balcombe et al. 2014, 2015; Chalak et al. 2016; Hess and Hensher 2010). Alternative approaches involve (i) specifying error component models with different scale parameters for subsets of respondents that ignored different numbers of attributes (Campbell et al. 2008); (ii) estimating heteroskedastic MNL models that account for variance induced by ANA (Scarpa et al. 2010); (iii) estimating separate attributes coefficients for the group of respondents who stated they did not ignore the attributes and for those who stated they had (Hess and Hensher 2010; Scarpa et al. 2013). Another approach to modelling attribute decision rules involves the use of a hybrid choice modelling approach. Each individual question about attribute attendance is treated as a binary indicator variable that represents a latent variable (Hess and Hensher 2013).

Along with differences in how ANA information is incorporated in choice models, stated ANA approaches also differ in terms of how such information is collected. In particular, stated ANA can be divided into two forms: serial ANA and choice task ANA. In serial ANA, respondents are asked to report at the end of the sequence of choice tasks if they systematically ignored one or more attributes when making choices. Whereas, in the choice task ANA, such a question is asked after each choice task.

Scarpa et al. (2010) compared serial and choice task ANA and found that accounting for choice task ANA significantly improves model fit and yields more accurate marginal WTP estimates. Caputo et al. (2018) found similar results, and they suggest that respondents may not follow the same attribute processing strategies throughout the entire sequence of choice tasks. As such, they conclude that collecting ANA information at the choice task level may be more desirable than at the serial level.

Inferred ANA, however, consists of inferring ANA behaviour through the estimation of analytical models. This approach typically makes use of an equality-constrained latent class model where the classes, rather than latent preference groups, represent different attribute processing strategies and during estimation parameters are set to zero in specific classes to account for ignored attributes (Campbell et al. 2011; Caputo et al. 2013; Glenk et al. 2015; Hensher et al. 2012; Hensher and Greene 2010; Hole et al. 2013; Lagarde 2013; Scarpa et al. 2009b; Thiene et al. 2015), while they are constrained to be equal across classes when non-zero.

In most applications, estimated coefficients are assumed to take the same values across classes (Scarpa et al. 2009b; Hensher and Greene 2010; Campbell et al. 2011). Only a few studies have investigated preferences heterogeneity within ANA classes: Thiene et al. (2015) mixed ANA classes with preference classes, whereas Hess et al. (2013), Hensher et al. (2013) and Thiene et al. (2017) adopted a Latent Class-Random Parameters (LCRP-MXL) specification that accounts for both attribute non-attendance and continuous taste heterogeneity. Another method to infer ANA was proposed by Hess and Hensher (2010) which involves the estimation of the individual posterior conditional distributions of coefficients from a RP-MXL. In particular, the authors, as well as other studies employing this approach (e.g. Scarpa et al. 2013) retrieved the mean (\(\mu\)) and the variance (\(\sigma\)) of such distributions and computed the ratio between them (\(\sigma /\mu\)). When the ratio for an attribute is high (>2), it can be assumed that the respondent did not attend to it when making his/her choices.

Finally, the so-called visual or revealed ANA involves detecting ANA by means of eye-tracking technologies, which monitor the fixations and time spent on each attribute (Balcombe et al. 2015, 2016; Spinks and Mortimer 2016; Chavez et al. 2018). This approach, which seems very promising compared to the other two (Uggeldahl et al. 2017), has the advantage of retrieving information without eliciting them from respondents, providing a less biased measure than that retrieved from stated ANA (Balcombe et al. 2015). Data retrieved by using this approach are usually modelled as in the stated ANA approach, that is by estimating parameters that shrink the coefficients for non-attended attributes (Balcombe et al. 2015; Chavez et al. 2018). These studies found inconsistencies between stated and visual ANA and that models informed with both approaches had the best results in terms of statistical fit. Furthermore, authors found that the time spent on choice tasks tends to diminish during the sequence. In particular, Spinks and Mortimer (2016) found that the number of attributes ignored by each respondent can vary among choice tasks, therefore supporting the existence of differences between choice task level and serial non-attendance.

A central question in the ANA literature concerns which of the three approaches should be used to account for it. Several studies advocated the use of inferred ANA over stated ANA, due to some limitations with the latter approach. Some authors, in particular, questioned whether respondents’ statements are reliable (Campbell and Lorimer 2009). Respondents may not answer follow-up questions completely truthfully for several reasons, such as social pressure to care about an attribute (especially when surveys are carried out by means of face-to-face interviews), or to consider all attributes as relevant (Balcombe et al. 2011). Yet, inferred ANA might suffer from other limitations, as for example, questionable assumptions in the model. Another issue with using respondents’ statements is potential endogeneity bias that arises from conditioning a model on self-reported ANA (Hess and Hensher 2013). Several studies employed both the stated non-attendance and the inferred non-attendance approach (Hensher et al. 2007a; Hensher and Rose 2009; Campbell et al. 2011; Scarpa et al. 2013; Mørkbak et al. 2014). The overall finding is that results from inferred and stated ANA are inconsistent with each other, and that the inferred approach generally provides a better model fit. Mørkbak et al. (2014) highlight that ANA is not a problem for DCE only, as it is also present in real life and in incentivised settings.

Finally, another important question is whether there are some situations in which accounting for ANA is particularly advisable. Based on the evidence concerning underlying drivers of ANA behaviour, it seems especially important to account for it in studies with complex designs with, for example, high number of attributes and/or alternatives (Weller et al. 2014), in contexts in which a part of the population is unlikely to be very interested in certain attributes (e.g. categories of visitors in destination studies) and in cases in which some respondents are likely to have a low familiarity with some of the attributes. On the other hand, ANA seems to have a lower impact on choices in applications in which the target population is bound to be very knowledgeable about the good/service under evaluation (e.g. doctors for medicines attributes, Hole et al. 2013).

6.6 Anchoring and Learning Effects

Anchoring is a term used in psychology to describe the disproportionate influence on individuals that an initially presented value may have on their judgements (Tversky and Kahneman 1974). In the environmental valuation literature, anchoring or starting point bias refers to the concern that initial bids in a choice experiment may provide respondents facing unfamiliar environmental goods with an anchor that may bias the elicitation of their true preferences (Mitchell and Carson 1989). Although anchoring may be due to informative and non-informative information while starting point effects are always due to informative information, both concepts are usually treated indifferently in the literature (for more detailed discussion see Glenk et al. 2019).

Anchoring effects, as with other context effects found in the SP literature, have challenged the alleged stability and coherence of preferences, as assumed by microeconomic theory underlying DCE. In the context of DCE for the valuation of public goods, anchoring or starting point bias refers to the use of previous information (e.g. information provided by instructional choice sets or initial choice sets and cost bids) as reference points that affect subsequent choices and, accordingly, welfare estimates. The literature distinguishes two forms of anchoring or starting point effect: price vector anchoring effects, i.e. the effect on preferences of using different price or cost vectors; and starting point anchoring effects, i.e. the price used in the first choice set may influence respondents’ preferences.

Evidence regarding the existence of anchoring effects in DCE is mixed: while some authors have found no evidence of preference instability after changing the range of prices used in a survey (Ohler et al. 2000; Ryan and Wordsworth 2000; Hanley et al. 2005), others have found that increasing the price levels had a significant upward effect on preferences and estimated WTP (Carlsson et al. 2007; Carlsson and Martinsson 2008a; Mørkbak et al. 2010). Although they may be present in many SP experiments, price vector effects have been found to be more likely to appear when involving non-use environmental goods (Burrows et al. 2017).

Ladenburg and Olsen (2008) find that certain groups of respondents are susceptible to starting point bias whereas others are not. Importantly, their results indicate that the impact of the starting point bias decays as respondents evaluate more and more choice sets. When faced with an unfamiliar choice situation, respondents are initially influenced by value questions, but progressing through a sequence of choice sets, they become more familiar with the choice situation and discover their own preferences, in line with the Discovered Preference Hypothesis (Braga and Starmer 2005). Learning and fatigue effects have also received specific attention (Campbell et al. 2015; Carlsson et al. 2012; Meyerhoff and Glenk 2015).

Careful survey design is a precondition for minimising biases in DCE. A clear description of the decision rule (i.e. the conditions under which the environmental good is provided) is crucial for minimising not only strategic behaviour (see Sect. 2.9), but for obtaining WTP estimates which are more internally consistent and less dependent on anchoring (Aravena et al. 2018). Randomisation of choice sets, attributes and alternatives is also recommended to reduce the impact of starting point effects (Glenk et al. 2019). Practitioners should also be aware that using multiple valuation questions through a sequence of choice sets might affect the consistency of elicitated preferences. However, the recent literature has found learning effects in repetitive choice sets, given the unfamiliarity that respondents usually show when valuing environmental goods and services. So, the repetitive nature of choice tasks may indeed help respondents through a process of learning about their true preferences and provide more consistent parameter estimates (Bateman et al. 2008; Ladenburg and Olsen 2008; Brouwer et al. 2010).