Introduction

Standard practice in choice modelling assumes that all information related to the choice task and available to the respondent is used in evaluating the alternatives. However, a growing body of work across a number of different fields has posed the question as to whether some respondents may actually make their choices based on only a subset of the attributes that describe the alternatives at hand. This phenomenon is typically referred to as attribute non-attendance or attribute ignoring, and an in-depth review of work in this area is given in Hensher (2010).

Three quite separate strands of research can be identified in this context. Analysts have focussed on why non-attendance occurs (e.g. Alemu et al. 2011; Cameron and DeShazo 2011), how it can be accommodated in our models (e.g. Hess and Rose 2007) and what its impacts are on model results (e.g. Hensher et al. 2005). The latter two points are strongly inter-related, with the impact on results being to a possibly large extent a function of the specific care used by the analyst in model specification and estimation.

Work has gradually shifted from relying on respondent reported information relating to non-attendance (e.g. Puckett and Hensher 2008; Rose et al. 2011; Scarpa et al. 2010; Carlsson et al. 2010) towards a focus on inferring the actual processing strategies from data on observed or experimental behaviour (e.g Scarpa et al. 2009; Hess and Hensher 2010). A key role in this context was played by the early discussions in Hess and Rose (2007), who proposed the use of a latent class approach to accommodate attribute non-attendance, a method since adopted by numerous other studies (e.g. Hensher et al. 2012; Scarpa et al. 2009; Hensher and Greene 2010; Hole 2011a; Campbell et al. 2010). With this approach, different latent classes relate to different combinations of attendance and non-attendance across attributes. For each attribute treated in this manner, there exists a non-zero coefficient (to be estimated), which is used in the attendance classes, while the attribute is not employed in the non-attendance classes, i.e. the coefficient is set to zero. In a complete specification, covering all possible combinations, this would thus lead to 2K classes, with K being the number of attributes.

The overall findings of the growing body of work using the latent class specification point towards a significant portion of people ignoring attributes, including cost variables. As an example, Scarpa et al. (2009) produce results that imply a staggering rate of 90 % of respondents not attending to cost when using the standard latent class specification. Significant rates are also retrieved in Campbell et al. (2011) where 61  % of respondents are revealed as not attending to cost. In the context of the present paper, albeit using a different method, it is noteworthy to point out that using a more flexible specification, Campbell et al. (2012) later observe a very significant drop in that share.

In the present paper, we argue that an important shortcoming of this simple latent class approach is the reliance on only two possible values for each coefficient, one of which is fixed to zero. The key point is that this specification is a confirmatory latent class model rather than an exploratory latent class model—the structure is imposed by the analyst, and class allocation probabilities are estimated. This equates to a reduction in model flexibility, and a more flexible specification would estimate the parameter values in each class. This more general specification would allow the model to still yield one class capturing zero sensitivity (i.e. non-attendance) for a given attribute if this has sufficient weight in the data, but could similarly produce two distinct non-zero values. This is not the case for a model which constrains the coefficients in one class to be equal to zero. Our key argument is that if substantial heterogeneity not related to attribute non-attendance exists in the data, then the simple confirmatory model will at best produce inferior model results, by not being able to capture this heterogeneity, but more worryingly, may produce results affected by serious confounding between non-attendance and taste heterogeneity.

The above arguments point to an issue that has not been duly recognised in existing work in this area, which has often taken the estimated probabilities for the class with coefficients fixed to zero as an exact measure of the rate of attribute non-attendance. However, as we will discuss in this paper, it is also possible that the share of respondents allocated to the non-attendance class need not necessarily be respondents with a zero sensitivity, but simply respondents with a relatively low sensitivity. This is reinforced by recent discussions in Hess and Hensher (2012), Campbell et al. (2012) and Collins et al. (2012) suggesting that real non-attendance may be rarer than commonly assumed and that low attribute importance may be a more plausible explanation of behaviour. In the most extreme scenario, a possibility exists that no respondents actually ignored a given attribute and that the latent class construct simply captures regular heterogeneity. Clearly, if this is the case, findings overstating the true degree of non-attendance are thereby misleading.

On the other hand, it is clear that confounding can also arise in the opposite direction, i.e. where respondents with zero sensitivity are not explicitly accommodated in the model and where their behaviour is captured through reducing the overall sensitivity levels. This is likely to arise especially in cases where just a single point value is estimated, but could also be the case in models with more than one value per coefficient, especially when the rate of non-attendance is too low to pull the estimate in one class close enough to zero to become insignificant. Even in models allowing for continuous random taste heterogeneity, respondents with real attribute non-attendance have the potential to affect the overall results.

In this paper, we first illustrate the shortcomings of the confirmatory latent class approach by contrasting its performance with an exploratory model, i.e. one where both values are estimated to allow for both high and low sensitivity for each attribute. Each time, we note improvements in model performance, and, for the majority of attributes, we now observe non-zero sensitivity in both classes. This thus highlights confounding between non-attendance and standard heterogeneity in the confirmatory model. However, it should be clear that the exploratory model arguably has the reverse weakness, with some of the share of respondents captured in the lower sensitivity class potentially being respondents with a true zero sensitivity. On the other hand, a simple continuous mixture model is not likely to be able to accommodate non-attendance with standard distributional assumptions. To deal with this limitation, we put forward a framework that combines a continuous mixed logit model with a latent class model (see Greene and Hensher 2013 or Bujosa et al. 2010 for recent discussions of such a structure in a more general context). The model still makes use of two values for each coefficient, where one value is still constrained to zero, but where the value in the second class is allowed to vary across respondents using a continuous distribution. The aim of the model is for this second class to capture regular heterogeneity, increasing the probability that any behaviour captured by the class with the zero coefficient does indeed relate to attribute non-attendance. We highlight the applicability of the method using three separate datasets. Each time, we observe significant gains in model fit over the simple latent class specification and show a sizeable reduction in the share of respondents assigned to the non-attendance classes. It is worth noting that an alternative approach would be a model in which the non-zero heterogeneity would be captured by discrete rather than continuous mixtures within the non-zero class.

Before proceeding with the remainder of the paper, it should also be noted that any reductions in wrongly inferred non-attendance would serve to ease issues in the computation of willingness-to-pay (WTP) measures, especially when the cost attribute is concerned, albeit that this does not necessarily avoid the extreme WTP estimations. In our view, WTP implications of non-attendance have thus far not received sufficient discussion in the literature. Non-attendance of attributes other than cost simply leads to a reduction in the mean WTP measure for that attribute across the sample population (if the affected respondents are included in sample level calculations). However, what is not always acknowledged is that implied non-attendance to the cost attribute for a part of the sample population leads to major issues in the computation of WTP measures, as the implied WTP for these respondents would be infinite for the remaining attributes.

The remainder of this paper is organised as follows. The methodology section contrasts the standard latent class approach to non-attendance with our proposed method. This is followed by a discussion of the results from three case studies. Finally, we present the conclusions from the research.

Methodology

The widely used confirmatory latent class approach to non-attendance relies, in its complete form, on 2K different classes, where K is the number of attributes in the model. Each of the 2K different classes makes use of a different combination of estimated coefficients and coefficients fixed to zero. Crucially, a given coefficient will take the same value in all classes where that attribute is included, thus not allowing for additional random heterogeneity, a point we will return to below. While the majority of classes make use of combinations of zero and non-zero coefficient values, the two extreme classes make use of a full set of estimated parameters and a set of all zero coefficients respectively (i.e. a purely random choice process).

In mathematical terms, we make use of a vector β containing a separate element for each of the K attributes. In addition, we have a S×K matrix \(\Uplambda,\) in which each row contains a different combination of zero and one elements, where S = 2K. As an example, if the first class equates to full attendance, then the first row of the matrix, say \(\Uplambda_1,\) is a vector with each element being equal to one. Next, let A° B be the element-by-element product of two equally sized vectors A and B, yielding a vector C of the same size, where the kth element of C is obtained by multiplying the kth element of A with the kth element of B. Using this notation, the specific values used for the taste coefficients in class s are then given by the vector \(\beta_s=\beta\circ\Uplambda_s.\)

The likelihood of the observed sequence of T choices for respondent n, say y n , is then given by:

$$ L\left(y_n\mid\beta,\pi\right)=\sum_{s=1}^S\pi_s\prod_{t=1}^TP \left(i_{nt}^*\mid\beta_s=\beta\circ\Uplambda_s\right), $$
(1)

where i nt * is the alternative chosen by respondent n in choice task t. With \(\beta_s=\beta\circ\Uplambda_s\) where \(\Uplambda\) is fixed a priori, we need to estimate the vector β as well as \(\pi=\left<\pi_1,\ldots,\pi_S\right>,\) the vector of probabilities for the S different classes.

While the above exposition uses a full set of 2K classes, it should be acknowledged that, with a few notable exceptions (e.g. Campbell et al. 2010; Hensher et al. 2012), existing studies typically make use of a simplification, generally focussing on just a subset of attributes that are given a latent class treatment. Although this potentially gives rise to further confounding issues, our focus in this paper is on a more general issue with the use of the simple latent class framework for the purpose of accommodating attribute non-attendance.

A final point of importance relates to the estimation of class probabilities. The probability for class s is given by π s , with 0 ≤ π s  ≤ 1 and ∑ S s=1 π s  = 1. Rather than imposing constraints in estimation, an easier approach is to use \(\pi_s=\frac{e^{\delta_s}}{\sum_{m=1}^Se^{\delta_m}},\) with one δ m , i.e. the parameter used in the class allocation probabilities, being fixed to zero. Nevertheless, this specification still involves estimating 2K − 1 separate δ terms, of which many will be very negative, equating to very small class probabilities. In the context of the applications presented in this paper, we make use of a simplified approach, by instead setting

$$ \pi_s=\prod_{k=1}^K\left(\Uplambda_{s,k}\left(1-P_{\hbox {N-A},k}\right)+\left(1-\Uplambda_{s,k}\right)P_{\hbox {N-A},k}\right), $$
(2)

where \(\Uplambda_{s,k}\) gives the entry in \(\Uplambda\) relating to attribute k in class s. With this specification, previously used by Hole (2011b), we only need to estimated K separate δ elements (with \(P_{{\rm N-A},k}=\frac{e^{\delta_k}}{e^{\delta_k}+1}\)), as opposed to 2K − 1, leading to significant reductions in the number of parameters. While this simplification equates to an assumption that the non-attendance is independent across attributes, we observed only small reductions in model fit with this approach, along with increases in model stability. Indeed, the resulting model seemed to be less susceptible to issues with local maxima than its counterpart with separately estimated class allocation probabilities.

As already alluded to in the introduction, the above specification is an example of a confirmatory latent class model, with restrictions on parameters instead of freely estimated cofficients. Our key argument is that a class with a given coefficient fixed to zero is likely to capture not just respondents with a zero sensitivity for that attribute but also respondents with a low sensitivity. To illustrate this, we first note that within a given dataset, the values for a given coefficient are likely to vary continuously across respondents, with both high and low sensitivities arising, as well as a possibility of zero sensitivity for some respondents. Let us assume that the true distribution of non-zero sensitivities covers a wide interval, as is commonly observed in applications. A number of possible scenarios can now be imagined.

Firstly, if there are is a sizable share of respondents with zero sensitivities and we do not account for their presence in the model, then they are likely to have a downwards (in absolute terms) effect on the estimated coefficients. In a fixed coefficients model, they will simply reduce the absolute value of the point estimate. In a continuous random coefficients specification, they will reduce the estimated mean and potentially increase the estimated variance. Finally, in a model estimating a finite number of different values for each coefficient, the presence of these respondents is likely to pull at least one of these values towards zero, and, if the share of non-attending respondents is large enough, the coefficient in one class should collapse to zero. This reasoning is the key argument behind using the confirmatory latent class approach discussed above, i.e. it should capture non-attendance even if the share of such respondents in the sample is small, and potentially too small to be retrieved in an exploratory specification.

Conversely, let us imagine a situation where there is no non-attendance for a given attribute, but where we maintain strong residual heterogeneity (i.e. both low and high sensitivities). Such heterogeneity can be captured either in a continuous random coefficient specification or a latent class model with a number of distinct values for each coefficient. It is this latter case that our key concern applies to. If an analyst estimates a model with only two values for a given coefficient, with one of them being fixed to zero, then the presence of continuous heterogeneity in the data is likely to impact on both the estimate in the non-zero class, and the probability of the zero class. In the context of a model with a fixed coefficient for each attribute, the presence of respondents with very low sensitivities will simply be captured in a downwards pull of that point estimate. If we however use two classes for each coefficient, with one value being constrained to zero, then their behaviour could also be captured by this latter class. The relative impact of such respondents on the probability of the class at zero and the estimate in the non-zero class will depend on the shape of the true distribution. The key point is that respondents captured in the non-attendance class may not necessarily have zero sensitivities.

The model with parameters estimated for both classes to reflect higher and lower sensitivity is given by:

$$ L\left(y_n\mid\beta,\pi\right)=\sum_{s=1}^S\pi_s \prod_{t=1}^TP\left(i_{nt}^*\mid\beta_s= \left(\beta_1\circ\Uplambda_s+\beta_2\circ(1-\Uplambda_s)\right)\right), $$
(3)

with:

$$ \pi_s=\prod_{k=1}^K\left(\Uplambda_{s,k}P_{\rm k,1}+\left(1-\Uplambda_{s,k}\right)P_{\rm k,2}\right), $$
(4)

where \(P_{k,1}=\frac{1}{e^{\delta_k}+1}\) and \(P_{k,2}=\frac{e^{\delta_k}}{e^{\delta_k}+1}.\) This exploratory LC model is applied for each case study and contrasted with the remaining specifications.

This model will be less susceptible to capture low sensitivity as non-attendance, but on the other hand may not be able to reveal true non-attendance if the rate thereof is low compared to remaining heterogeneity.

The solution put forward in the present paper is to make use of a combined latent class—mixed multinomial logit model, hereafter referred to as LC-MMNL, with LC used for the latent class model. Our proposed model still makes use of two values for each coefficient within a latent class framework, much as in Eq. 1. However, β is no longer a vector of point values for the different coefficients, but each element in β now follows a distribution in the sample population. This ensures that, for any given class in our LC-MMNL model, those coefficients not fixed to zero allow for additional random heterogeneity. The likelihood function for respondent n is simply rewritten as:

$$ L\left(y_n\mid\Upomega,\pi\right)=\sum_{s=1}^S\pi_s\int\limits_{\beta} \prod_{t=1}^TP\left(i_{nt}^*\mid\beta_s=\beta\circ\Uplambda_s\right) f\left(\beta\mid\Upomega\right)\mathrm{d}\beta, $$
(5)

where \(f\left(\beta\mid\Upomega\right)\) gives the multivariate distribution for all elements in β, with a vector of parameters \(\Upomega.\)

The advantage of this approach is that it will allow some (but not necessarily all) of the random heterogeneity to be captured in the randomly distributed non-zero elements within β s . This added flexibility should reduce the risk of a class where attribute k is not used (i.e. coefficient fixed to zero) simply capturing heterogeneity, including low sensitivity, rather than genuine attribute non-attendance. We specifically use the word reduce as some risk of confounding clearly still remains.

Empirical work

In this section, we present results from three separate case studies using stated choice data. Our motivation for using three separate datasets is to produce reliable conclusions that are not based on just a single data source as is all too often the case. Each time, we conduct an in-depth comparison between model structures, looking at model fit, patterns of heterogeneity and implied willingness-to-pay measures.

For each study, we first estimate a simple multinomial logit (MNL) model. This is followed by a confirmatory LC specification as given in Eq. 1, with class allocation probabilities defined as in Eq. 2, and an exploratory LC specification as given in Eq. 3, with class allocation probabilities defined as in Eq. 4. Finally, we also estimate two random coefficients structures, namely a simple MMNL model, and a LC-MMNL model as given in Eq. 5. All models were coded in Ox 6.2 (Doornik 2001). In the latent class structures, multiple runs with different starting values were used to attempt to avoid issues with inferior local maxima.

For the two mixture models, the estimation made use of 250 MLHS draws (Hess et al. 2006) per respondent and per random parameter, where Lognormal distributions were used. This specific choice of distribution is motivated by the fact that the Lognormal is particularly well suited to capture both low and high attribute sensitivity, given that it has a low median with a long tail and high mean. This should permit the distribution to capture such low attribute sensitivity rather than it being falsely captured by the non-attendance class. In addition, for the datasets at hand, previous experience showed good performance with the Lognormal distribution.

The results focus on a number of key indicators, in particular the differences in fit between the various models, the implied rates of non-attendance in the LC and LC-MMNL models, and the heterogenity patterns for individual coefficients. As alluded to in the introduction, the computation of WTP indicators is fraught with complications in the presence of a non-zero share for non-attendance, especially when this is the case for the cost attribute. In the present paper, the focus on WTP is reduced, and we give more weight to the above key indicators. Nevertheless, we present summary WTP statistics too, where, in the confirmatory LC model and the LC-MMNL model, we exclude the non-attendance class for fare.

First case study

Data and model specification

Our first case study makes use of data collected in a stated choice (SC) survey for rail and bus commuters, conducted online in the UK, early 2010. A sample of 368 respondents were each faced with ten different choice tasks, each time involving the choice between three unlabelled alternatives, of which the first is a reference trip (invariant across the ten tasks). The three alternatives were described by six attributes, namely travel time (in minutes), fare (in \(\pounds\)), the level of crowding (trips out of ten for which you have to stand), the rate of delays (trips out of ten affected by delays), the average length of delays (across delayed trips), and the provision of a free information delay service (via sms text message). Linear effects were specified for the three continuous attributes, while the coefficients for crowding and the rate of delays were multiplied by the rate of affected trips (a division of these two attributes by a factor of ten led to them ranging from zero to one). A simple dummy term was included for the delay information service, and alternative specific constants (ASC) were estimated for the first two alternatives.

Results

The main estimation results for the first case study are summarised in Table 1. This shows the point estimates from the MNL and LC models, while, for the MMNL and LC-MMNL models, the use of the Lognormal distribution implies that we are actually estimating the parameters of the normally distributed logarithms of the individual marginal utility coefficients, where a sign change applies to all coefficients except the information service coefficient. For example, \(\mu_{\ln\left(-\beta_{\rm tt}\right)}\) gives the mean for the normal distribution for the logarithm of the negative of the travel time coefficient, with \(\sigma_{\ln\left(-\beta_{\rm tt}\right)}\) giving the associated standard deviation. The sign of the mean estimates relates to the absolute size, rather than sign, of the resulting Lognormal coefficient. A negative estimate for \(\mu_{\ln\left(-\beta_{\rm tt}\right)}\) simply means that the median of the travel time coefficient is greater than −1 but below zero. A positive estimate would imply a median that is smaller than −1. In all models, we also estimate two additional alternative specific constants (ASC 1 and ASC 2).

Table 1 Main estimation results for first case study

In the latent class models, we have two sets of values for coefficients, i.e. β1 and β2, where, in the confirmatory LC model and in the LC-MMNL model, the elements in β2 are constrained to zero. For the sake of presentation, the elements for each coefficient were ordered in such a way that the larger value (in absolute terms) is in class 1. In all three latent class structures, we additionally estimate six δ parameters which are used to compute the class allocation probabilities, as shown in Eq. 4. As an example, a negative value for δtt,2 would mean that the share for the βtt,2 is smaller than 50 %. The results from Table 1 are then used to compute probabilities for the two values for each coefficient, as reported in Table 2, where, in the confirmatory LC model and the LC-MMNL model, the second class corresponds to the non-attendance class, while, in the exploratory LC model, it corresponds to the low sensitivity class.

Table 2 Implied probabilities for second class in LC models in first case study

We first note that all four advanced models comprehensively outperform the MNL model, with gains in log-likelihood by between 538.9 and 714.6 units, for at worst an increase by 12 in the number of parameters. The exploratory latent class model outperforms the confirmatory latent class model, with an improvement in log-likelihood by 106.46 units, which is highly significant at the cost of six additional parameters. Using the adjusted ρ2 measure, we note that the MMNL model outperforms both LC models, suggesting that it is able to give a more complete representation of the random heterogeneity. In the combined LC-MMNL, the probabilities for non-attendance collapse to zero (i.e. very negative δ) for all but the final two coefficients (average delay and delay sms). Given the resulting difference in specification, the LC model is no longer nested inside the LC-MMNL model, however the differences in fit are apparent when looking at the adjusted ρ2 measure. Finally, the LC-MMNL model obtains an improvement in fit over the MMNL model by 10.23 units, which, although modest in terms of absolute improvements, is still statistically significant at the cost of two additional parameters. More sizable differences arise when looking at the actual estimation results, and these justify our recommendation for using the LC-MMNL model.

We observe substantial increases in coefficient magnitude (i.e. scale) when moving from the MNL to the confirmatory LC model, which is consistent with the marginal utility coefficients now relating to the non-ignoring classes. The results from Table 2 show very large shares for the second value for each coefficient in the confirmatory LC model, and thus high rates of implied attribute non-attendance. These are arguably so high as to be considered unrealistic, with e.g. 44 % of commuters apparently ignoring fare, where the implied rate of attribute-ignoring is even higher for the quality-of-service attributes. As already mentioned above, the exploratory LC model obtains statistically significant improvements in log-likelihood over the confirmatory LC model. With the values for β2 now being freely estimated for each coefficient, the class allocation probabilities cannot be compared to the confirmatory model as they relate to different values. Crucially, the estimates in the low importance classes are significantly different from zero with the exception of the delay information attribute. If the high rates of non-attendance in the confirmatory model had been in line with real behaviour in the data, we would have expected the estimates in the second class in the exploratory model not to be significantly different from zero. These observations should serve to confirm our assertions that a two-class LC model will capture the heterogeneity by using classes for both high and low sensitivity, and that classes with coefficients constrained to zero are likely to at least in part capture low attribute sensitivity rather than attribute non-attendance.

Turning our attention to the continuous mixture models, i.e. MMNL and LC-MMNL, we note that both models retrieve significant random heterogeneity for all six coefficients, as highlighted by the estimates for the σ parameters. In the LC-MMNL models, the shares for the second class (i.e. β = 0) drop to zero for travel time, fare, crowding, and the rate of delays. For the remaining two attributes, namely the average delay and the provision of a free delay sms service, high implied rates of ignoring remain, but they are still lower than in the confirmatory LC model. These findings give further support to our assertion that the confirmatory LC model is likely to produce misleading results for attribute non-attendance, with the latent class construct, specifically the class at zero, serving as a proxy for capturing variations in sensitivity across respondents not related to attribute non-attendance.

We next proceed with a comparison between the estimates for the MMNL and LC-MMNL models. We first note drops in significance for the parameters relating to average delay and the delay information service, which is expected given that they now only relate to a smaller share of respondents. While the mean estimates for other coefficients remain quite stable, indicating no major scale differences, the estimates for \(\mu_{\ln\left(-\beta_{{\rm average}\; {\rm delay}}\right)}\) and \(\mu_{\ln\left(-\beta_{\rm delay\;sms}\right)}\) are more positive in the LC-MMNL model, indicating a stronger absolute sensitivity in class 1, as would be the case when class 2 captures low or zero sensitivity. It is also of interest to look at the degree of continuous heterogeneity for parameters in the MMNL and LC-MMNL model after transformation to the Lognormal scale. The results are reported in Table 3, where, with the specification in Eq. 5, these calculations clearly only relate to the non-ignoring classes for each given coefficient in the LC-MMNL model. By inspecting the coefficients of variation (cv) for the two attributes where a non-zero probability for non-attendance is obtained in the LC-MMNL model (i.e. average delays and delay sms service), we note a drop in heterogeneity when moving from the MMNL model to the LC-MMNL model, where some of what was previously captured as continuous heterogeneity is now accommodated in the non-attendance class. For the remaining four coefficients, we see two increases and two decreases in heterogeneity. This occurs despite the fact that the treatment of these four coefficients is identical in the two models. A possible interpretation is that the specification used for one coefficient can have an impact on the estimated heterogeneity for a different coefficient.

Table 3 Heterogeneity patterns in non-ignoring classes for continuous mixture models in first case study

Finally, Table 4 summarises the WTP findings for the different models. In the MNL, exploratory LC, and MMNL models, these relate to sample level estimatesFootnote 1, while, in the confirmatory LC models, they exclude the non-attendance class for fare. For the three models with sample level estimates (i.e. MNL, exploratory LC, and MMNL), the mean WTP measures are higher in the two mixture models, where the levels of heterogeneity are higher in the MMNL model when compared with the exploratory LC model. When comparing the MNL and confirmatory LC models, the latter clearly obtains lower WTP measures given the inclusion of the zero class for all coefficients other than fare in these calculations. This highlights the problematic issue of calculating WTP measures from such models; we obtain a downwards bias in the WTP measures. Indeed, the issue of whether what is picked up is true non-attendance aside, the calculation accounts for the zero WTP for respondents captured in the non-attendance classes for non-fare attributes, but not the infinite WTP for respondents captured in the non-attendance class for fare.

Table 4 Implied WTP measures for first case study

Turning to the MMNL and LC-MMNL models, the differences are relatively small. This is in line with having retrieved a positive share for the non-attendance class for only two of the coefficients, and the absence of non-attendance for fare. Additionally, the results would suggest that similar WTP patterns can be obtained by the MMNL model on its own, confirming our hypothesis that the Lognormal distribution is well suited to dealing with low or zero attribute sensitivity.

Second case study

Data and model specification

Our second case study once again makes use of data collected through an online SC survey conducted in the UK in early 2010, using a sample of 996 rail commuters, each facing eight choice tasks. In each choice task, a respondent is given a choice between three alternatives, described by attributes with levels pivoted around reported reference values, where the reference alternative was however not included as one of the options in the survey. The five attributes used to describe the alternatives were travel time (min), fare (\(\pounds\)), the guarantee of a reserved seat (yes/no dummy), the provision of free wifi (yes/no dummy), and whether the ticket had rebooking flexibility (yes/no dummy). The specification is similar to that used in the first case study, with the exception that a log-transform was used for the fare attribute, given strong evidence of decreasing marginal sensitivity to fare.

Results

The results are once again split across four tables, using the same approach to notation as in the first case study. Specifically, the main estimation results are presented in Table 5, with implied probabilities of non-attendance in Table 6, estimated levels of heterogeneity in Table 7, and implied WTP patterns in the non-ignoring classes in Table 8. We use the same five models as in the first case study.

Table 5 Main estimation results for second case study
Table 6 Implied probabilities for second class in LC models in second case study
Table 7 Heterogeneity patterns in non-ignoring classes for continuous mixture models in second case study
Table 8 Implied WTP measures for second case study, calculated at a fare of \(\pounds 40\)

We note that all four advanced models once again offer hugely significant improvements in log-likelihood over the simple MNL model, where slightly better fit is again obtained for the models allowing for continuous random heterogeneity. The exploratory LC model comprehensively outperforms the confirmatory LC model, with a gain in log-likelihood by 132.6 units at the cost of five additional parameters. For the continuous mixture models, the differences are larger than in the first case study, with an improvement in log-likelihood by 45.94 units for the LC-MMNL model over the MMNL model. This is a direct result of the fact that we are able to estimate non-zero probabilities for non-attendance for all five marginal utility coefficients, in contrast with the first case study. This new model also obtains an improvement in log-likelihood by 211.10 units over the LC model. Both improvements are significant at the highest levels of confidence, coming at the cost of 5 additional parameters.

In terms of effects on parameter estimates, we observe increases in scale in the marginal utility coefficients when moving from MNL to the confirmatory LC model, where, rather counter-intuitively, we also obtain a negative estimate for βflex, where it needs to be noted that this value only applies to a small share, with the class probability of just nine percent. In the exploratory LC model, we again retrieve small and large values for all coefficients, where all estimates are of the expected sign in both classes. We note that the estimates for βwi-fi,2 and βflex,2 are not significantly different from zero. These were also the two coefficients with the highest rates of implied non-attendance in the confirmatory LC model. The share for this class drops only slightly in the exploratory LC model. For those three coefficients where the estimates in the second class are significantly different from zero, we also see some increases in class allocation probability compared to the confirmatory LC model.

When moving from MMNL to LC-MMNL, we observe that the coefficients increase in magnitude (after transforming from the underlying Normal distribution). Also, while there is a shift towards statistical significance for \(\mu_{\ln\left(\beta_{\rm seat}\right)},\) the opposite applies for \(\mu_{\ln\left(\beta_{\rm wi-fi}\right)}\) and \(\mu_{\ln\left(\beta_{\rm flex}\right)},\) where major increases in standard errors are also observed for the standard deviation parameters, especially \(\sigma_{\ln\left(\beta_{\rm flex}\right)}.\) The changes, including increases in magnitude, are most substantial for the final two coefficients which are the ones with the highest implied probabilities for non-attendance. In comparison with the confirmatory LC model, the LC-MMNL results show visibly lower rates of implied non-attendance, which is reassuring in the case of travel time and fare. However, high rates of implied non-attendance are still retrieved for the wifi and ticket flexibility attributes, where this finding is more realistic, given the likelihood of such quality of service attributes only being relevant to a part of the travelling population. This is also consistent with the results for the exploratory LC model. However, the remaining 16 % rate of implied non-attendance for the fare attribute remains problematic. In line with the fact that for each coefficient, some of the heterogeneity in the LC-MMNL model is captured by the non-attendance classes, we observe a reduction in the remaining continuous random heterogeneity for each coefficient in comparison with the MMNL model (cf. Table 7).

Turning our attention finally to the implied WTP patterns (cf. Table 8), we see drops when moving from MNL to confirmatory LC, in line with the implied rates of non-attendance and the fact that the non-attendance group for fare is excluded from these calculations. The negative WTP for flexibility is the result of the negative estimate for βflex in the confirmatory LC model. In the exploratory LC model, we note substantially higher WTP measures, which are a result of the much lower value for βlog-fare,2—this model produces evidence of very low as well as very high fare sensitivity. The differences between MMNL and LC-MMNL are more pronounced than was the case in the first study, and this is in line with the higher rates of remaining implied non-attendance. We see decreases for all four measures, as a result of accounting for the presence of the non-attendance class for these attributes but disregarding the 16 % probability for βlog-fare,2 in these WTP calculations.

Third case study

Data and model specification

Our final case study makes use of data collected in an online SC survey by the Latin American airline LAN in 2009, using a sample of 915 respondents, with ten observations per respondent. In each scenario, a respondent was faced with a choice between three flight options described by six attributes, where there was no reference alternative. The attributes used to describe the alternatives were fare (in thousands of Chilean pesos), the accrual of frequent flyer benefits (LANPASS kms), the ability to reserve a specific seat (no seat reservation/reservation of standard seat/reservation of preferential seat), the ability to change reservation before the departure date, the ability to change reservations after the outbound departure date (yes/no dummies), and the ability to obtain a refund for unused tickets (yes/no dummies). A linear in attributes specification of the utility function was used, with constants for the first two alternatives.

Results

The results are once again split across four tables, using the same approach to notation as in the first two case studies. The main estimation results are presented in Table 9, with implied probabilities of non-attendance in Table 10, estimated levels of heterogeneity in Table 11, and implied WTP patterns in the non-ignoring classes in Table 12. The same five models as in the first two case studies are estimated.

Table 9 Main estimation results for third case study
Table 10 Implied probabilities second class in third case study
Table 11 Heterogeneity patterns in non-ignoring classes for continuous mixture models in third case study
Table 12 Implied WTP measures for third case study

As shown in Table 9, we again have the familiar pattern of differences in model fit. All four advanced models comprehensively outperform the MNL model. While the differences between the remaining four models are once again smaller, we see that better performance is obtained by the continuous mixture models, while the exploratory LC model again easily rejects the confirmatory LC model (137.08 units, at the cost of seven parameters), and the LC-MMNL model also obtains a significant improvement over the MMNL model (51.19 units in log-likelihood, at the cost of six additional parameters). In the combined LC-MMNL, the probability for non-attendance collapsed to zero for the regular seat attribute, meaning that the LC model is no longer nested inside the LC-MMNL model. However, the difference in fit is clearly significant when looking at the adjusted ρ2 measure. As in the previous two case studies, we observe some changes in significance for the individual parameters when moving from MMNL to LC-MMNL.

By allowing for a class with zero sensitivities, we observe the usual increases in marginal sensitivities when moving from MNL to LC, and also in the mean sensitivities when moving from MMNL to LC-MMNL. In the latter, the means of the underlying normal distributions are no longer significant for \(\mu_{\ln\left(\beta_{\rm accr. km}\right)}\) and \(\mu_{\ln\left(\beta_{\rm chg. after}\right)},\) noting that this does not imply the same to be true for the marginal utilities after transformation.

The results from Table 10 show quite a varied picture in terms of implied attribute non-attendance in the confirmatory LC model, ranging from 13 % for the regular seat attribute to 81 % for the refund attribute. The propensity of non-attendance for ticket price is in line with findings from a similar choice setting in Rose et al. (2011). As mentioned above, in the LC-MMNL models, the implied rates of non-attendance drops to zero for the regular seat attribute which, notably, was the attribute with the lowest retrieved rate of non-attendance in the confirmatory LC model. It is also worth noting the high share for β2 for this coefficient in the exploratory model—this is a reflection of a large share of respondents with low but not zero sensitivity, and in line with the estimate for βregseat,1 in the confirmatory model. Moreover, we also observe reductions in the rates for all other attributes. With this specific dataset, a non-trivial rate of attribute non-attendance is realistic, given the use a set of qualitative attributes that may not be relevant to all respondents. The 9 % rate of implied non-attendance for the fare attribute however still remains an issue. In the exploratory LC model, the estimates in both classes are statistically significant for all marginal utility coefficients except βchgafter,2 and βrefund,2. This possibly suggests a share of respondents with real non-attendance for these attributes, where this share is however much lower than in the confirmatory LC model, or indeed the LC-MMNL model.

As expected, the results in Table 11 show reduced continuous heterogeneity in the LC-MMNL model compared to the MMNL model, with one exception, namely the regular seat attribute. This is the only attribute for which the implied rate for non-attendance in the LC-MMNL model is zero. For the remaining attributes, part of what was previously captured as heterogeneity in the MMNL model is now captured through the non-attendance class. For βregseat, the only explanation is that the heterogeneity for this attribute was captured by other random coefficients, i.e. the same argument of confounding that we used in the first case study for the three attributes where increases in heterogeneity were observed.

Turning finally to the implied WTP patterns in the non-ignoring classes, the changes between MNL and confirmatory LC are this time slightly less pronounced, except for the accrual of LANPASS kms and the ability to obtain a refund for unused tickets. The most pronounced increase in WTP, is found, similarly to the second case study, for attributes with the highest non-attendance rates. When moving to the exploratory LC model, we observe increases for all WTP measures, including βchgafter and βrefund for which a class with zero sensitivity was retrieved. These heightened WTP measures are the result of the low fare sensitivity identified in the second class, with a share of 41 %. The WTP measures for the MMNL model are higher, while we see a drop in all mean WTP measures when moving to the LC-MMNL model, as a result of these calculations not taking into account the non-attendance class for fare.

Conclusions

In the face of ever more widespread use of confirmatory latent class structures for capturing attribute non-attendance, this paper has asked the question whether results showing high rates of implied non-attendance in such models are possibly misleading. We argue that a confirmatory LC structure where one support point is fixed to zero for each coefficient is vulnerable to confounding, where the phenomena captured by the model may relate more to taste heterogeneity, including low attribute importance, than to actual attribute non-attendance.

We investigate the issue using two different approaches. Firstly, we compare the confirmatory LC approach to an exploratory LC model in which both values are estimated for each coefficient, rather than one being fixed to zero. Secondly, we offer an alternative approach by putting forward the use of a model that additionally allows for continuous random heterogeneity, in the form of a combined LC-MMNL structure, where the coefficients in one class are constrained to zero while the second class allows for continuously distributed coefficients rather than just a point value. The hope here is that the continuous representation of heterogeneity in one class will allow the model to capture both high and low attribute sensitivity, increasing the chance that what is captured in the non-attendance class really is attribute non-attendance, rather than just low sensitivity.

Our findings from three separate empirical case studies seem to give definite support to our assertion that the confirmatory LC model is likely to produce misleading results for attribute non-attendance, with the class at zero serving as a proxy for capturing continuous variations in sensitivity across respondents. Across the three studies, the confirmatory LC model is outperformed by the exploratory LC model, which shows just a single coefficient out of six with value that is not significantly different from zero in one class in the first case study, two out of five in the second case study, and two out of seven in the third case study. This suggests that some of the apparent patterns of non-attendance retrieved in the confirmatory LC model were a result of confounding. Clearly, with such an exploratory LC model, there is also risk of confounding in the opposite direction (cf. Hole 2011b). Indeed, if the true share of non-attendance is small, and there is substantial remaining (i.e. non-zero) heterogeneity in the data, then it is likely that both classes will retrieve significant sensitivities. To some extent, this could be addressed by gradually increasing the number of classes, so that eventually the weight required for a single class would be small enough to capture the non-attendance. An example of working with up to three attendance and one non-attendance class is given by Campbell et al. (2012). However, increases in the number of classes will often lead to problems with inferior local optima and classes collapsing to the same value.

The experience with the LC-MMNL model in the present paper is that, across all three case studies, and across most coefficients, we see significant reductions in the shares for the non-attendance classes when additionally allowing for random heterogeneity within non-zero classes. In several cases the apparent non-attendance even drops to zero. Across the three LC models (confirmatory LC, exploratory LC, LC-MMNL), the results are consistent across models for some of the coefficients (e.g. wifi and flexibility in the second case study), while in others, the confirmatory LC model is affected by substantial confounding. This is arguably a strong indication that at least a part of what is typically captured by such classes in a confirmatory model is in fact not attribute non-attendance, i.e. zero sensitivity, but simply taste heterogeneity, and reduced attribute importance. This finding also calls for the reappraisal of a large body of empirical work making use of the simple confirmatory LC structure and showing significant shares of attribute non-attendance. As an aside, it should be noted that in our three case studies, the simple MMNL model always gives better fit than the confirmatory LC model (and indeed the exploratory LC model), suggesting that it is indeed able to give a more complete representation of the random heterogeneity.

As always, it is important to acknowledge the limitations of our study. Firstly, the choice of continuous distribution for β is sure to have an impact on the weight of the non-attendance classes. In our case studies, the Lognormal distribution had been found to work well with the data at hand in previous work. Additionally, it is arguably a distribution that is especially well suited for dealing with a mix of low and high attribute importance. Conversely, analysts should also use care as the distribution for the cost coefficient may be pulled towards zero in datasets where non-attendance really arises for this attribute, and this would lead to very long tails for the distribution of WTP measures. Secondly, an argument could be made to compare the LC-MMNL model used here to one with freely estimated β k in both classes (potentially with random heterogeneity in both), much in the same way as we estimated both values in the exploratory LC model. Our results may thus still be affected by confounding, but hopefully to a lesser extent than applications using the standard confirmatory LC approach. Thirdly, and despite the above comment regarding identification, future work should also look into using more than two classes. Finally, it would also be of interest to revisit the specification of the class probabilities, and in particular, whether an independence assumption such as used in the present paper is justified more broadly.

In closing, it should also be argued that while attribute non-attendance may be an interesting concept from a behavioural perspective, it poses significant complications in the interpretation of results, and especially the computation of WTP indicators when there is a certain share of respondents with implied non-attendance for the cost attribute. Indeed, it is for this reason that we have paid reduced attention to computing WTP measures in the present analysis - the presence of a non-attendance class for cost rules out WTP computation for a share of the sample population. If cost is fully attended, the problems are less severe, and we simply obtain a zero WTP for a given share of respondents for given attributes; in our case, this situation would only occur in the first case study where only non-cost attributes have a risk of non-attendance in the LC-MMNL model. If there is a non-zero probability for a non-attendance class for cost, then the WTP measures produced by an analyst are likely to be underestimated as this class cannot be taken into account in WTP computation; clearly, if this behaviour is simply captured through low sensitivity in a MMNL structure, no such problems arise.

More crucially, the question can be asked whether non-attendance for cost actually makes sense. It is likely to be an SP artifact rather than real life behaviour, so these respondents should not be used in WTP computation anyway. To some extent, the same argument can be made for other continuous attributes. Any attribute non-attendance retrieved in a stated preference context clearly only relates to the ranges of the attributes used in the survey. Finally, we should also note that model estimates may not be caused by actual non-attendance, but by the fact that our designs do not include sufficient scenarios in which the given attribute can influence the choice, a point that applies especially for less important attributes.

As usual, there are thus plenty of unanswered questions, but reducing non-attendance classes by capturing heterogeneity in a more adequate manner is certainly a first step to avoiding some of the problem. The evidence from both the exploratory LC model and the LC-MMNL model have shown conclusively that the rates of non-attendance retrieved in the widely used confirmatory LC model are likely to be severely overstated. For a number of reasons, including WTP computation, it could be preferable to simply capture such behaviour within a random heterogeneity specification, losing some insights into behaviour but avoiding a number of other issues.