Introduction

A forecast of the value of travel time (VTT) is a key parameter in the appraisal of infrastructure investment, since time saving is the main benefit of many transport investments. Some countries take the intertemporal income elasticity to be the elasticity estimated on cross-sectional data. Sweden for instance applies the cross-sectional income elasticity to the cost parameter in the national forecast model. Other countries, such as the UK and The Netherlands, use data collected at different points in time to make assumptions on the intertemporal income elasticity, either through meta-studies (Wardman et al. 2016) or repeated SP studies (Gunn 2001; Gunn et al. 1999); also Sweden applies this method for valuing transport benefits.

However, there are few such studies based on data collected at different points in time to study the development of the VTT over time, and they give puzzling results (see below). This paper contributes to this literature by analysing data from national VTT surveys collected at two or three points in time, in the Netherlands (1997 vs. 2009/2011) and Sweden (1994 vs. 2007 vs. 2008). The aim of this paper is twofold. First, we analyse how the VTT has changed over time by estimating the VTT on the two data sets for each year by applying the same econometric model inspired by Hess et al. (2017) and De Borger and Fosgerau (2008). Second, we analyse the comparability of the surveys. Even if the data from the different years were state-of-practice national stated choice surveys when they were conducted, they are subject to differences regarding sampling and recruitment methods, response rates, experimental design, and survey design. We therefore aim at documenting these differences and exploring to what extent they impact the comparability of the different studies.

The VTT depends on three parameters (DeSerpa, 1971) that all may vary over time for different reasons.

$$VTT = \frac{\mu }{\lambda } - \frac{{u_{t} - u_{w} }}{\lambda },$$

where \(\mu\) is the opportunity utility of time, \(\uplambda\) is the marginal utility of money and \({u}_{t}-{u}_{w}\) is the direct utility of travel time, i.e. the differences between the utility of time during travel (\({u}_{t}\)) and the utility of time spent working (\({u}_{w}\)).Footnote 1 At the individual level, \(\uplambda\) should reduce with income. However, the relationship between \(\uplambda\) and income found in the cross-section might not apply over time at the population level, for instance if the income distribution or population composition changes. The opportunity utility of time reflects the value of time as a resource and equals the utility that could be attained if the travel time could be used for some other activity. This utility may therefore change if people become more (or less) productive or busy at the origin or destination location. The direct utility of travel time (compared to time spent working or on some other reference activity) depends on the comfort of the travel time, but also on how productive or enjoyable the travel time is. It may change over time due to new technology (e.g. laptops and smartphones), crowding and congestion etc..Footnote 2

The average VTT in the population can also change due to changes in the composition of the population (e.g. in terms of employment, or age). The gross income might also have different impact on the VTT if the tax systems change. This paper does not, however, attempt to answer how the average VTT has changed over time in the national populations, but rather how the VTT has changed for similar individuals in the two samples.

The implications of moving away from the assumption that the VTT grows over time proportionally with wage rate or income because the marginal utility of travel time declines, would reduce the future year time benefits in the appraisal of transport projects. On the other hand, if the time parameter in forecast models decreases, more or longer trips would be forecast, increasing the benefits in the appraisal.

We know of four previous studies that have used repeated stated choice data collected roughly 10 years or more apart to study the temporal change in the VTT. The first was based on data conducted in the Netherlands in 1988 and 1997 (Gunn 2001; Gunn et al. 1999). The second study used data collected in Britain in 1985 and 1994 (Wardman 2001). The third study used data collected in Britain in 1994 and 2006 (Tapley et al. 2007), and the fourth used data collected for drivers in Sweden in 1994 and 2007 (Börjesson et al. 2012a). The first study found that the VTT had decreased within each income group, but the income increase was large enough to cancel out the trend decline at each real income level, such that the real average VTT remained largely unchanged. The second and third studies found a slight trend decline in the VTT, in spite of an income increase. Gunn (2001) suggests that the non-increasing VTT could be explained by a declining direct disutility of travel time.

The fourth repeated study by Börjesson et al. (2012a), using log willingness to pay with random variation, found no significant difference between the cross-sectional and intertemporal income elasticity. Using the same data, Börjesson (2014) found that the travel time coefficient had remained constant over time but that the travel cost parameter had declined in real terms–which is a central result for implementation in forecasting models. Using RP data Fox (2015) came to the same conclusion (see "Previous studies" section). Hence, of all the repeated SP surveys, only the Swedish repeated car study consistently indicated increased VTT over time. This is also the only study where exactly the same questionnaire, experimental design, and survey method were used in the second wave; the sampling of drivers was also identical, including the season, time of day and locations; a reanalysis of this data for the current paper confirms these earlier results. This indicates that differences in recruitment method, interview method, response rate, season, questionnaire and experimental design can become important reasons for differences in the estimated VTT. As far as possible, we control for such differences in this paper, attempting to do this more thoroughly than in the previous repeat studies. Still our results suggest that the differences that we cannot control for have such a large impact that the resulting VTT cannot be meaningfully compared. In particular it seems likely that the trend decline in response rates, partly due to changes in recruitment method, leads to a substantial downward bias in the VTT in the later year.

A fifth repeat study is Weis et al. (2021), covering only five years (2010 and 2015), where both waves were designed as a repeated follow up SP experiment to the Mobility and Transport Microcensus (MTMC). The authors state that the survey method, recruitment strategy and experimental design remained similar, even if some attributes were slightly revised. They also found that the VTT was stable between the samples. Another study, Flügel et al. (2020), compares the Norwegian VTT from 1997, 2009 to 2018 published in the official reports (i.e. without specific modelling), and finds that they had mostly increased from 1997 to 2009 (except for air trips) and from 2009 to 2018. However, for car, the VTT had declined from 2009 to 2018, in spite of income increases.

"Previous studies" section reviews evidence from cross-sectional models, meta studies and repeated RP studies. "Data" section briefly describes the data used and "Methodology" section discusses the model specification. "Results" section then summarises the results and discusses the findings; "Conclusions" section concludes. Appendices describe the data more fully and present the detailed model results.

Previous studies

This section reviews earlier studies on the intertemporal and cross-sectional income elasticity in addition to the repeated SP studies reviewed above. It shows that here is a considerable disagreement about VTT elasticity and how to forecast VTT when income changes, which is the main justification for more research such as the present study.

Stated preference cross-sectional data yield income elasticities in a wide span of 0.07–0.9, with a consensus value of 0.3–0.5 (see Table 1). Daly and Fox (2012) find a cross-sectional income elasticity in the middle of this consensus span but point out that income elasticities estimated on cross-sectional models cannot necessarily be applied when forecasting the VTT over time because they could be biased by unobserved variables influencing the VTT and correlating with income in the cross-section. To avoid such bias, the data should include longitudinal variation.

Table 1 Cross-sectional income elasticity of the VTT from SP studies. Note that income elasticities obtained from using banded income are usually subject to more uncertainty than those directly estimated as part of the model specification

Using RP cross-sectional data collected at different points in time in Toronto, Fox (2015) finds that the time parameter remained constant and that the cost parameter declined with income. Swärdh (2008) estimated the income elasticity on the VTT for commuting trips based on revealed choices on income and commuting time for workers changing jobs. Swärdh used Swedish register data from 1983, 1990 and 1993 and found that the intertemporal income elasticity did not significantly differ from unity.

Assuming that the marginal utility of money is inversely proportional to income, implying that the VTT would increase with income, would all else equal also imply reducing price elasticities for travel. That is, if the importance of cost declined over time because income increases, this would not only increase the VTT but it would also reduce the price elasticities for travel. However, the empirical evidence is mixed; Neither Hanley et al. (2002) nor Wardman (2014) find any decline in price elasticity over time. But Bastian et al. (2016) show that income elasticities of demand for driving have declined over time while fuel price elasticities increased over time, in several western countries.

Meta studies are regressions on the outcomes of many cross-sectional VTT studies, which are explained from attributes of the study area, survey period and the method used. Meta-analyses comparing studies based on data collected in different countries at one point in time would estimate a cross-sectional elasticity. Meta-analysis comparing studies based on data from one single country collected at different points in time would estimate an intertemporal elasticity. However, other meta-analyses use studies based on data from many countries and from different points in time, and it is therefore difficult to quantify the difference between intertemporal and cross-sectional elasticity. Also, meta studies should be interpreted with caution – precisely because the quality of the underlying studies is often unclear.

Wardman published several meta-studies on UK VTT data. As the period and the number of studies were extended, the income elasticity increased: from \(0.075\pm 0.029\) in Wardman (1998) to \(0.510\pm 0.300\) in (2001), \(0.723\pm 0.043\) in (2004) and to \(0.900\pm 0.035\) in Abrantes and Wardman (2011), the final study applying material from 1960 to 2008. The elasticity in the latest study is probably most accurate, as it was based on the largest data set, but also to a larger extent reflecting an intertemporal elasticity due to a wider time span. If the latter factor played a role, it indicates that the intertemporal income elasticity is higher than the cross-sectional.

Shires and de Jong (2009) used VTT studies from 30 (mainly OECD) countries almost exclusively for the period 1990–2003 for their meta-analysis. They found the income elasticity to be 0.47 for business, 0.67 for commuting and 0.52 for other purposes. Since they use studies from many countries and from several years, they estimate a mixture of cross-sectional and intertemporal elasticities. However, it is possible that the cross-sectional variation is dominant in the estimation results, since variation in the VTT between countries is much larger than the variation over time within countries. In a study for the European Investment Bank (Wardman et al. 2012), the data on the UK were extended again, but also combined with data for many other European countries over a long period. The income elasticity of the VTT then increased to 0.72, possibly due to a larger data set with larger variation.Footnote 3

Data

In this paper, we analyse data from five national VTT stated choice studies, two in the Netherlands and three in Sweden. The Dutch studies denoted NL97 and NL11 were collected in 1997 and 2011, respectively. The Swedish studies, denoted SE94, SE07 and SE08 were collected in 1994, 2007 and 2008, respectively. The SE07 study was conducted in Sweden for the specific purpose of studying the how the VTT had changed over time. Therefore, an exact replication of the SE94 study was carried out, for car drivers only. In this replication care was taken to use exactly the same questionnaire, design (but the cost differences were inflated by 40 percent corresponding to inflation and GDP growth) and sampling of drivers, including the season, time of day and locations, as in the SE94 study (even the mistakes were repeated on purpose). The same market research firm was used, receiving the same instructions, for the recruitment and interviews and the same computer software. The VTT over time using the SE94 and the SE07 study was studied in Börjesson et al. (2012a) and Börjesson (2014). But in the present study we analyse them together with the SE08 car data.

A third Dutch VTT study was conducted in 2009, NL09, where respondents were recruited from a commercial internet panel. However, the analysis of this data showed an implausibly low VTT. This was attributed to the composition of this internet panel: mostly people that have a lot of time to fill in these internet surveys and even sometimes supplement their incomes with the rewards from completing a large number of surveys. Therefore, it was decided to recruit another set of respondents in 2011 in (almost) the same way as was done in 1997. In the final analysis, and in the present paper, the absolute level of the VTT was based on the 2011 data set only, but the socio-economic interaction coefficients were based on the joint 2009/2011-datasets. This is not likely to have any major impact on the resulting VTT relevant for the present study.

Table 2 summarises the main characteristics of the five surveys. Further description of the surveys is provided in Appendix A. We conjecture that there are five major differences between the data sets, that cannot be corrected for in retrospect. These are described in the lower part of the table. The first is the number of attributes, implying that the model specifications cannot be identical for the Dutch and the Swedish data. All the surveys included stated choice experiments with only a time and a cost attribute, but the experiment for public transport (PT) travellers in the Swedish SE94 study also included attributes for frequency and the number of transfers, giving four attributes in total.

Table 2 Main characteristics of the VTT surveys included in this paper

The second major difference is the presentation format of the attributes. While two surveys presented the time and cost levels in an absolute way, the NL97,

SE94 and SE07 studies presented the difference of the travel time and cost compared to the time and cost of the reference trip, e.g. “Travel time: 10 min shorter than now” or “Travel cost: Same as now”. The NL97 questionnaire did ask the respondents to state the travel time of their reference trip, but not the travel cost. Therefore, the absolute cost levels in the choice experiment cannot be reconstructed.

The third major difference concerns choice types with respect to a reference trip. In a reference-based design, each choice matches one of the following four types (see Fig. 1): Willingness-To-Pay (WTP): a choice between the reference trip and a faster but more expensive trip; Willingness-To-Accept (WTA): a choice between the reference trip and a slower but cheaper trip; Equivalent Gain (EG): a choice between a trip that is faster than the reference trip and has the reference trip cost and a trip that is cheaper than the reference trip and has the reference travel time; or Equivalent Loss (EL): a choice between a trip that is slower than the reference trip and has the reference trip cost and a trip that is more expensive than the reference trip and has the reference travel time. The NL97 survey was partly reference-based and partly non-reference-based, the other four surveys were fully reference-based.Footnote 4 However, the SE94 and the SE07 study included only WTP and WTA type choices.

Fig. 1
figure 1

Four quadrants of reference-based choice pairs and an example of a non-reference-based choice pair

A fourth difference is that the time and cost of the reference trip differ between the years, but also the average time and cost differences between the alternatives differ between the years. Table 7 in Appendix A shows that, for the Dutch car data, the average travel time is lower in the later years, due to sampling differences (fewer drivers were recruited at fuel stations with longer travel distances in the later year). Table 8 shows that the travel time and cost differences are larger in the later Swedish study, because the design covered a larger range of trade-off values of time to uncover the tail of the VTT distribution. Throughout this paper, all Dutch prices are converted to the 2011 price level and all Swedish prices are converted to the 2008 price level.

A fifth major difference that cannot be corrected for relates to the response rate and recruitment method. The rates of respondents willing to participate in the surveys were much higher in the nineteen-nineties than in the more recent surveys. In addition, in the Swedish PT surveys, the recruitment method shifted from on board the bus/train, where respondents had little chance of avoiding being approached, to recruitment on the platforms. Recruitment on platforms makes it easier for busy travellers, with high VTT (possibly in a hurry to catch their train or bus), to escape recruitment or to decline. In the Swedish car study, number plate registration was used in SE94 and SE07, but the SE08 study used a random population sample of individuals having made a car trip during a pre-defined survey. The recruitment method in the NL97 and NL11 surveys was identical, but fewer respondents declined recruitment in the earlier year. Hence, it seems likely that the drop out of busier travellers with high VTT was larger in the later survey. And after being recruited, the rates of respondents that completed the survey also dropped in more recent years. This pattern of reduced response rates is observed in both the Netherlands and Sweden. Moreover, only the SE08 study gave a lottery ticket to all those recruited to the study.

The first three major differences relate to differences in the stated choice experiment, while the fifth arises in the recruitment process. The fourth difference is partly a result of differences in the design, but arises also partly in the recruitment process. None of these differences can be entirely controlled for in the estimation. However, differences in experimental design, in terms of size and sign effects, are controlled for as much as possible by including these variables in the model estimation. Moreover, the absolute levels of the time and cost of the reference trip are also controlled for by including them in the models (for the NL data sets, only time could be included since no information on the reference cost could be included).

Regarding the issues with self-selection, differences in the socio-economic sample composition can be controlled for by including such variables in the estimation if they have a significant impact. Hence, differences in the VTT that we find should not depend on the socio-economic composition of the samples. In this way we can control for differences in self-selection effects due to lower response rates (in the recruitment and in the survey) to the extent that they can be captured by observed sample characteristics. Note, however, that we do not attempt to answer how the average VTT has changed over time in the national populations, so we do not reweight the samples to make them representative of the population at the two points in time.Footnote 5 We only address how the VTT has changed for individuals in the two samples, when controlling for the socio-economic characteristics that impact the VTT. Moreover, since we do not know how the recruitment method impacts the sampling probability, we refrain from weighting observations in the estimation.

However, we cannot control for self-selection effects that arise from unobserved effects. Such effects can be large, not least because the VTT can vary also within the same person for different trips. In fact, we cannot control for changes in any factor impacting the VTT that was not included in the experiments. For instance, if reliability, congestion levels, crowding levels, comfort, average travel distance/travel time per day etc. impact the VTT and have changed over time we cannot control for this. Indeed, as stated in the introduction such factors might be one reason why the VTT would increase faster or slower than the income.

Methodology

Choice of model

The central problem that arises in attempting to compare VTT findings over time and/or between countries is dealing with the differences in the data designs, discussed in the previous section. Ideally, we would wish to have the same model formulation for both countries and for all years. However, because we cannot reconstruct the absolute cost levels in the choice experiment in the NL97 data, we cannot use the model applied for the Swedish data to the Dutch data. And, because the SE94 PT data includes many more attributes, we cannot use the model applied for the Dutch data to the Swedish data. Most important is to maintain consistency within each country, so that intertemporal VTT comparisons are less affected by data changes. We are also restricted to formulations that give reasonable estimates of VTT.

The log value specification, which has been found to give the best results in a number of studies (De Borger and Fosgerau 2008; Börjesson and Eliasson 2014; Hess et al. 2017) cannot be used for the SE94 PT data because it has more than two attributes. The log value approach works by comparing a postulated random VTT for each respondent with a limiting value calculated from the data. With more than two attributes, a single limiting value cannot be calculated, as the critical value for the respondent depends on multiple features of the data. For the Dutch data, we were able to use the log value approach, while for the Swedish PT data we used a ‘multiplicative’ formulation that has some features of the log value approach and was used, for example, in the multi-attribute experiments of the recent UK study (Hess et al 2017). For the Swedish car data, we used the log value specification with a mixing distribution to make the results comparable with Börjesson et al. (2012a), who used this model for SE94 and SE07. Utility functions with additive random errors were tried but yielded a considerably worse fit (such models for Swedish long-distance trips are shown in Tables 14 and 15 in Appendix B).

We do not aim at the most sophisticated model. We did not use mixed logit models (except for the Swedish car data). However, limited tests were made with a model estimating a log-normally distributed VTT for the Swedish long-distance data (Tables 14 and 15 in Appendix B). The results did not change much, though the VTT increased somewhat, which is not surprising given that the lognormal distribution has a flat tail. The main reason why we do not apply the mixed models in our main models is that when estimating the VTT distribution, in our case a log-normal distribution, also outside the range supported by the data, the results may depend on the parts of the distribution that are extrapolated outside the data range. The impact on the estimated mean that this has is in general larger, the larger the part of the tail that is not supported by the data. This means that a comparison of the VTT resulting from mixed models estimated on data from different years could be seriously biased if the range not supported by the data is large in any of the samples, in particular if the VTT distribution has different support of the data in the two samples. This might be the case since the experimental designs differ.

The specification is applied separately to the two years of sample data in each country, so that only the cross-sectional income elasticity is estimated. However, for the Swedish car data, the SE94, SE07 and SE08 were pooled in one model to make the results comparable with Börjesson et al. (2012a). Moreover, the recruitment method changed from en-route sampling in the SE94 and SE07 to an exogenous random sample in SE08. The selection probability increases with trip length in the earlier two but not in the last survey. Since trip length is positively associated with VTT, the observations must be weighted in the application to be comparable. In estimation however, we account for the difference in trip length by including travel distance in the pooled data model. For this reason, we only compare the difference in the VTT by yearly dummy variables in the estimation in the Swedish car data. Joint estimation data from both survey years was also applied for Swedish PT data but did not provide any further insights.

Netherlands

The common element in the SP experiments in the NL97 and NL11 data is a binary choice experiment with time and cost as the only attributes. In this paper, we use data from these experiments only. As in many econometric studies, the fit of all the models estimated in this paper increases with a multiplicative error formulation relative to an additive error formulation. For the time–cost experiments in the Dutch studies the utility specification is based on the logarithmic specification, i.e. the multiplicative error structure, used by de Borger and Fosgerau (2008)

$$V_{1} = \mu \cdot \log \left( { - \frac{\Delta c}{{\Delta t}}} \right) + \varepsilon$$
(1)
$$V_{2} = \mu \cdot log (W),$$

where \(W\) is the value of time and ε is a standard logistic error, so that a logit model results; μ is a scale parameter. V1 is the quicker and more expensive alternative and V2 is the slower and cheaper alternative: this definition leads to a negative expected value for µ. \(\Delta c\) and \(\Delta t\) are the absolute differences between the travel costs and travel times, respectively, in the binary choices in the experiment.

This specification was already used by Börjesson et al. (2012a) and more recently by Hess et al. (2017) and performed well in those studies.

It is well known that the VTT depends on the design variables, amongst other reasons through size and sign effects, and that reference travel time and travel cost also have considerable impact on the estimated VTT. Hence, the models must take these design and trip characteristics into account for a fair comparison across surveys.Footnote 6

In the Swedish model, dummy variables were included to control for type choices (see Eq. (8) in Sect. 8.2.2). However, in the NL97 survey almost half of the choices were not of the type WTP, WTA, EG or EL. As a result, dummy variables cannot be used to find a reference-free VTT. Instead, the size and sign effects of the cost and time differences are modelled explicitly. This is done in a similar way as in the recent UK value of travel time study (Batley et al., 2017; Hess et al., 2017). Hence Eq. (1) is rewritten as

$$V_{1} = \mu \cdot \log \left( { - \frac{{\nu \left( {dc_{1} } \right) - \nu \left( {dc_{2} } \right)}}{{\theta \cdot \nu \left( {dt_{1} } \right) - \theta \cdot \nu \left( {dt_{2} } \right)}}} \right) + \varepsilon$$
(2)
$$V_{2} = 0,$$

where \(\nu\) is the “value” function of the change \({(c}_{1}-{c}_{0})\), in which \({c}_{1}\) is the cost of the leftmost alternative, \({c}_{0}\) is the cost of the reference alternative. The value of the cost difference relative to the reference alternative is

$$\nu \left( {dc_{1} } \right) = sgn\left( {c_{1} - c_{0} } \right) \cdot {\text{exp}}\left( {\eta_{c} \cdot sgn\left( {c_{1} - c_{0} } \right)} \right) \cdot \left| {c_{1} - c_{0} } \right|^{{1 - \beta_{c} }}$$
(3)

where sgn is the sign function. In Eq. (2), \(\theta\) denotes the reference-free VTT. In Eq. (3), \({\eta }_{c}\) and \({\beta }_{c}\) allow for the reference-dependent value to be different for gains and losses and non-linearly impacted by the size of the gains or losses relative to the reference, in the cost dimension; analogous parameters adjust the time value function. Hess et al. (2017) also allowed for non-symmetric marginal valuations of gains and losses in the value function. Due to data limitations in the present study, however, Eq. (3) assumes that gains and losses have the same functional form.

The parameter \(\theta\) is defined as a function of several socio-economic and trip variables

$$\begin{aligned} \theta = & \left( {\theta_{Car} \cdot \delta_{Car} + \theta_{Train} \cdot \delta_{Train} + \theta_{LocalPT} \cdot \delta_{LocalPT} } \right) \cdot \\ & \left( {1 + fac_{Fem} \cdot \delta_{Fem} } \right) \cdot \left( {1 + fac_{HH1} \cdot \delta_{HH1} } \right) \cdot \\ & \left( {1 + fac_{Age3650} \cdot \delta_{Age3650} + fac_{Age51 + } \cdot \delta_{Age51 + } } \right) \cdot \\ & \left( {1 + fac_{Inc} \cdot \left( {\frac{Income - 35000}{{10000}}} \right)} \right) \cdot \\ & \left( {\frac{TravTime}{{60}}} \right)^{\zeta } \cdot \\ \end{aligned}$$
(4)

In the functions (2–4), the variables and estimated parameters are defined as follows.

µ

Estimated scale

TravTime

travel time of the reference trip

\(\zeta\)

travel time elasticity: separate values are estimated for each mode or groups of modes

Income

annual household income after taxes at price level 2011

\(fac_{Inc}\)

income factor: separate values are estimated for each mode

\(\theta\) Car, \(\theta\) Train, \(\theta\) LocalPT

estimated reference VTT for the mode given by the 0/1 \(\delta\) indicator

\(\beta\)t, \(\beta\)c

estimated size effects for time and cost

ηt, ηc

estimated sign effects for time and cost

\(fac_{xx}\)

estimated parameters estimated for 0/1 variables as follows

\(\delta_{Fem}\)

indicator for women

\(\delta_{HH1}\)

indicator 1-person household

\(\delta_{Age3650}\)

indicator for age between 36 and 50

\(\delta_{Age51 + }\)

indicator for age over 50

In the model estimation, the two parameters in each value function (\(\eta\) and \(\beta\)) and the parameters of the \(\theta\) function are determined, as well as the scale µ. Hess et al (2017) and De Borger and Fosgerau (2008) then show that, since the \(\theta\) is outside the value function in Eq. (2), \(\eta\) disappears from the calculation and the VTT \(W\) can be computed as

$$W = \theta \cdot \left| {t - t_{0} } \right|^{\kappa - 1} ,$$
(5)

where

$$\kappa = \frac{{1 - \beta_{t} }}{{1 - \beta_{c} }}$$

We expect the size effect to be larger in the time dimension, i.e. \({\beta }_{t}<{\beta }_{c}\), so that the VTT reduces with smaller time savings.

Unfortunately, the delta method of calculating the variance of the estimated mean VTTs that was applied for the Swedish estimation could not be used for the Dutch analysis because of the complexity of the utility functions. Therefore, we had to revert to sample enumeration to derive the t-ratios for The Netherlands as described in "Results" section.

Sweden

For the SE94 PT data the logarithmic difference specification with multiplicative error structure cannot be used because there are more than two attributes. For this reason, we apply a multiplicative error specification to the Swedish PT data, taking the logs of the observed parts of the utility specification separately

$$V_{1} = \mu \cdot log\left( {c_{1} + W \cdot t_{1} + ..} \right) + \varepsilon$$
(6)
$$V_{2} = \mu \cdot log\left( {c_{2} + W \cdot t_{2} + ..} \right),$$

where \(W\) is the VTT,\({c}_{1}\) and \({c}_{2}\) are the absolute travel costs in the two alternatives of each binary choice and \({t}_{1}\) and \({t}_{2}\) are the absolute travel times in the two alternatives. The specification of Eq. (6) cannot, however, be applied to the Dutch data, because for the 1997 respondents the absolute travel cost levels are unknown.

A central result from De Borger and Fosgerau (2008) is that the underlying reference-free value VTT (\({W}_{rf}\)) can be obtained as a geometrical mean of the VTT for the choice types WTA and WTP, and the geometrical mean of the VTT for the choice types EL and EG

$$W_{rf} = \left( {W_{WTP} \cdot W_{WTA} } \right)^{\raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} } = \left( {W_{EG} \cdot W_{EL} } \right)^{\raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} }$$
(7)

This specification holds assuming that the time and cost differences are symmetric, i.e. that the gains are under-weighted as much as losses are over-weighted. If Eq. (7) does not hold this is evidence for asymmetry of the size effect. Under the assumptions of De Borger and Fosgerau the sign effects can be implemented as multiplicative factors for EL, EG, and WTP choices, assuming the base value of time corresponds to WTA choices. The VTT is therefore parametrised as

$$W = {\text{exp}}\left( \beta + \beta_{{{\Delta }t}} log\left( {{\Delta }t} \right) + \beta_{{{\Delta }c}} \log \left( {{\Delta }c} \right) + \beta_{t} \log \left( t \right) + \beta_{EL} \theta_{EL} + \beta_{EG} \theta_{EG} + \beta_{WTP} \theta_{WTP} + \beta_{I} {\text{log}}\left( I \right) + \beta_{Imiss} \theta_{Imiss } \right) $$
(8)

where income is \(I\) and \({\theta }_{Imiss}\) is a dummy variable indicating missing income (approx. 20% of the sample in each year) and \({\theta }_{EL}, {\theta }_{EG}\) and \({\theta }_{WTP}\) are dummy variables indicating choice types. This formulation ensures that \(VTT\) is positive, while the ranges of β are unrestricted. Insignificant variables were removed to obtain the final model presented in Tables 11 and 12.

In the model estimation, the parameters of the W function are determined. To compute the variance of the estimated mean VTTs, to be able to compute the t-statistics for the difference in the estimated mean VTT between the years (two unknown means), the delta method giving the Cramér-Rao lower bound (Daly et al. 2012) was used on the Swedish data. The variance of the estimated coefficients is then

$$var\left( W \right) = W^{{\prime}{\text{T}}} {\Omega }^{*} W^{\prime}$$
(9)

where \({\Omega }^{*}\) is the robust covariance matrix simulated by the Biogeme software (Bierlaire 2003). If there are L parameters in the VTT function, \(W\), the vector \({\mathrm{W}}^{\mathrm{^{\prime}}}\) has L elements, namely the derivatives of \(W\) with respect to each of the L elemts in β. In our final model, the VTT function \(W\) includes 7 parameters so that

$$\begin{aligned} W^{\prime T} = & \left[ {\frac{\partial W}{{\partial \beta }},\frac{\partial W}{{\partial \beta_{\Delta t} }},\frac{\partial W}{{\partial \beta_{t} }},\frac{\partial W}{{\partial \beta_{c} }},\frac{\partial W}{{\partial \beta_{WTP} }},\frac{\partial W}{{\partial \beta_{I} }},\frac{\partial W}{{\partial \beta_{{{\text{Im}} iss }} }}} \right] \\ & \left[ {W,W\log \left( {\Delta t} \right),W\log \left( t \right),W\log \left( {\Delta c} \right),W\log \left( I \right),W\theta_{{{\text{Im}} iss}} ,W\theta_{WTP} } \right] \\ \end{aligned}$$

The covariance matrix \({\Omega }^{*}\) has dimension 7 by 7.

In the Swedish car data, the specification of Eq. (1) is used. However, in this model, data from all years (SE94, SE07 and SE08) and purposes were pooled to increase the number of observations. It extends Eq. (1) by allowing the intercept \(\beta\) to follow a normal distribution, assuming it to be constant for each individual (an additional parameter βσ measures the standard deviation of \(\beta\)), following Börjesson et al. (2012a).. W is parametrised as

$$\begin{aligned} W = & \exp (\beta + \beta_{WTP} + \beta_{1} \;\log \Delta t + \beta_{2} \;\log \;d + \beta_{3} \;\log \;t + \beta_{I} \;\min \;(\log \;I,\;\log I_{50} ) + \beta (\min ((\log \;I,\;\log \;I_{75} ) \\ & - \log \;I_{50} ) \cdot 1\{ \log \;I > \log \;I_{50} \} + \beta_{I75} (\log \;I - \log \;I_{75} ) \cdot 1\{ \log \;I > \log \;I_{75} \} + \beta_{employed} \;\theta_{employed} + \\ & \beta_{commute} \theta_{commute} + \beta_{recreation} \theta_{recraetion} + \beta_{school} \;\theta_{school} + \beta_{service} \;\theta_{service} \\ \end{aligned}$$

The variables and parameters used in these equations are defined as follows.

µ

Estimated scale

\(\beta\)

the base value for VTT

\(\beta_{xx}\)

parameters adjusting the VTT relative to the base \(\beta\) for the following variables

\({\Delta }t\)

travel time difference from the reference trip

\({\Delta }c\)

travel cost difference from the reference trip

\(\theta\) EL, \(\theta\) EG, \(\theta\) WTP

VTT differences for choice types: Equivalent Loss, Equivalent Gain and Willingness to Pay

I

Annual net income, price level 2008

\(\theta_{Imiss }\)

0/1 indicator for missing income

\(I_{50}\), \(I_{75}\)

median and 75% quantile of net annual income

t

travel time

d

travel distance

\(\theta_{commute}\), \(\theta_{recreation} ,\theta _{school}\),\(\theta_{service}\)

dummy variables identifying employed people and trip purpose

The controls include dummy variables for year and trip purposes, using 2007 and other trips as base year and trip purpose. Other dummy variables identify employed persons, flexible working hours, gender etc. Insignificant socio-economic terms were removed in the final model presented in Table 13.

Results

In this section we present and discuss the resulting VTT.

VTT by survey year

The models estimated for the two years and countries are presented in Tables 9 and 13 in Appendix B. For each country (except the Swedish car data), the VTT was then derived for all observations, using sample enumeration, in the pooled (combined) sample of respondents from the two years. By pooling the sample, the differences in the VTT found between the years will not be impacted by changes in design variables or socio-economic composition of the samples, if these effects are perfectly modelled. The sample enumeration was carried out as follows. The VTT was computed for each observation in the pooled sample, based on the estimated model (and the socio-economic statistics and design variables for the observation). Then the average of the resulting VTT distribution was computed and presented in Tables 3 and 4, for NL and SE, respectively.

Table 3 The average simulated VTT, The Netherlands, euro of 2011
Table 4 The average simulated VTT, Sweden, euro of 2008

Two versions of the pooled samples were derived for each country, only differing regarding the income variable. Take the NL data as example. In the first version of the pooled sample, all 1997 respondents kept their income as reported, while the incomes of the 2011 respondents were deflated by the ratio \(\frac{\mathrm{sample}\, \mathrm {mean}\, \mathrm{income}\, 1997}{\mathrm{sample}\, \mathrm {mean}\, \mathrm{income} \,2011}\) to correct for the income growth between 1997 and 2011. The second version of the pooled sample mirrored this procedure, but all 2011 respondents kept their income as reported, while the incomes of the 1997 respondents were multiplied by the ratio \(\frac{\mathrm{sample}\, \mathrm {mean}\, \mathrm{income}\,2011}{\mathrm{sample}\, \mathrm {mean}\, \mathrm{income}\,1997}\).

For each country, the VTT was derived from sample enumeration, applying three different combinations of model and sample version (shown in column one, two and six). Column one shows the average VTT resulting from the sample enumeration applying the NL97 (SE94) model on the first version of the pooled sample (reflecting income levels -97 and -94). Column two shows the average VTT resulting from the sample enumeration applying the NL11 (SE08) model on the second version of the pooled sample (reflecting income levels -11 and -08).

In the Swedish data the t-statistics are computed applying the Delta method as explained in "Sweden" section. In the Dutch data, the t-values were computed by varying the estimated coefficients in Eq. (5) between ± 1.96 times their standard deviation, taking correlations between parameters into account and again applying a sample enumeration, resulting in a distribution of the average VTT. The Swedish sample sizes are larger than the Dutch, implying higher significance levels.

However, in the main model we did not account for the panel structure of the data (repeated choices from the respondents). Limited tests were made with the Swedish long-distance data to account for the panel structure of the data. They indicate that the recognition of the panel nature of the data increases the error estimates, in particular in the 2008 data. For these reasons the significance levels are overstated, which needs to be considered when interpreting our results.

The VTT has increased between the years for some modes and purposes and decreased for others in the Dutch data, despite the higher incomes in the 2011 sample. In the Swedish data, the average VTT is lower in the later survey for all modes and purposes (but since standard errors are overstated the change might not be significant) except for long distance bus. The decline is five to ten percent.

Column six derives the VTT by sample enumeration, applying the NL97 (or SE94) model to the second version of the pooled sample (reflecting the income levels of NL11 and SE08). This column can be interpreted as the VTT for NL11 or SE08 that would have been forecast in the nineties, given that the income growth had been known. By comparing column six with column two, reflecting the real VTT outcome for the later year, we get an indication of the changes in the VTT that cannot be explained by the increase in income, assuming that the cross-sectional income elasticity estimated on the sample from the nineties applies over time, and that the experimental design and trip variable are correctly modelled. For Sweden, we find a 14 percent decline for rail trips (for long distance and for the two purposes of regional trips) and a 7 to 18 percent decline for bus. For the NL the results vary much more from an increase of 41 percent for business rail trips to a decline by 41 percent for local public transport business trips. It is hard to find any clear pattern or plausible explanation for this variation.

Table 13 shows a model estimated on the pooled samples SE94, SE07 and SE08 for car. As shown by Börjesson et al. (2012a) there are no significant difference in the VTT for SE94 and SE07 when accounting for differences in income, design, distance, purpose etc. However, the dummy for SE08 is -0.361 and significantly different from zero, implying that the VTT for SE08 is 70 percent of the VTT for SE07 (exp( − 0.361)).

Income elasticity

In this section we first derive the income elasticity calculated based on the change in the VTT between the two years and the income change. Note that this can only be interpreted as an intertemporal income elasticity under the assumption that all changes in VTT can be attributed to income changes once differences in the trip and design variables have been controlled for. This assumption might, however, not hold, for instance if the marginal utility of time changes due to changes in travel comfort or productivity, as suggested by the discussion in the introduction. We still believe that these figures are relevant as a benchmark, because the problem of determining how the VTT changes between the base year and future years in CBA is often handled by only applying an income elasticity for expected income changes after the base year.

Column three in Tables 3 and 4, shows the percentage change in the mean VTT computed from column one and two, and also the t-values of the change. Column four reports the percentage change in the mean income of the samples. The income increase is substantial in most cases, except for other trips by train and public transport in the Dutch data (which is likely to be related to the small sample size in the 2011 survey). Column five reports the implied income elasticity. The t-statistics for the income elasticity are computed based on the standard errors of the change in the VTT and in the income.

Column five shows that only one of the income elasticities for private trips is significantly different from zero in the Dutch data, and that this is negative, and that almost all are negative in the Swedish data (negative income elasticities are not micro-economically consistent).

In practical applications, an inter-temporal income elasticity is often taken to be an estimated cross-sectional elasticity even if there is no strong reason why they should be equal. For comparison, Table 5 therefore reveals the cross-sectional elasticities in the Dutch and the Swedish samples. In the Swedish data, the income elasticity equals the estimated income parameter since the we take the log of the utility function in the estimation. In the Dutch sample the elasticity was calculated by increasing all incomes by 1 percent in the same sample enumeration tool with which the average VTT was calculated. The Swedish numbers for public transport show a tendency towards a higher cross-sectional income elasticity in the later year. However, in the Dutch samples we see the opposite and more varied results: lower cross-sectional elasticities in the later year (with one exception). There is thus no conclusive evidence regarding the pattern of the cross-sectional elasticities for the two countries. The SE94 elasticities were relatively low compared to the international evidence (Table 1). The NL97 elasticities were relatively high. So, in both countries the more recent elasticities moved towards the international mean values.

Table 5 Cross-sectional income elasticities by year and country

Table 5 also show the income elasticities estimated for the Swedish pooled car samples in Table 13. In this model the cross-sectional income elasticity increased with income (as also found in Börjesson et al. (2012a)). This is consistent with the higher cross-sectional income elasticity in the later Swedish sample for public transport, where income is higher.

Discussion

We have observed an unexplained reduction in the VTT in all cases but the Swedish exact replication SE94-SE07. For the NL data there is also an unclear pattern of increases and decreases which is puzzling. For the Swedish PT modes, the decline is more consistent over modes and purposes. We also see a 30 percent lower VTT in the SE08 data than in the SE07 data for car.

There are two potential explanations for the lower VTT in many of the later surveys. The first is that the VTT has in many cases declined over time despite income increases. The second is that VTT is not in fact reduced, and that such findings are an artefact of the survey or experimental method and design or survey conduct. The first explanation could be valid if the travel time has become more comfortable or productive as explained in the introduction. It could also be an effect of changes in other factors that we cannot control for. If for instance reliability has decreased over time, this might have reduced the VTT, since a small time reduction might be interpreted as pointless if the travel time normally varies more than the time reduction.Footnote 7 On the other hand, if crowding or congestion have increased, this would work in the opposite direction and increase the VTT. There is however no evidence that reliability or crowding has increased over time for the bus and train services in the NL or in Sweden (long-distance and regional).

The first explanation could also be valid if the decline in the VTT is related to changes in the income distribution in the samples. Our models account for the income dependency of VTT, but if the functional form of this dependency is different from the one that we have assumed, a shift in the shape of income distribution could result in a lower estimate of the VTT for the same income level in later years. The income distribution has become slightly more skewed to the right over time in the two countries. Still, the shifts are modest in the Netherlands: the Gini coefficient increased slightly from 0.276 (1997) to 0.282 (2011). In Sweden the Gini coefficient increased more, from 0.253 (1994) to 0.311 (2008). This increase was almost exclusively a result of increased spread of income from capital (the distribution of earned income and transfers remained stable) (Björklund and Jäntti 2011). Moreover, in the Swedish sample, the income among the sixth of respondents with the lowest income did not increase between the years.Footnote 8 This could be one contributing factor for a possible decline in the VTT, but even if the income distribution has become slightly more skewed over time, this would probably not explain the large decline in the VTT that we find.

There is no reason to believe that differences in reliability, congestion or income distribution would be found for car drivers between 2007 and 2008, explaining the 30 percent lower VTT in the latter. However, the Swedish 2008 data was collected during the early part of the financial crisis. It cannot be ruled out that this impacted the results. Still, the crisis did not immediately impact the economy, and we have controlled for the respondents’ incomes in the models.

Moving to the second explanation, we identified in "Data" five differences between the yearly samples that we cannot entirely control for in the analysis. The first is that the number of attributes differs. The Swedish data for bus and train had four attributes in the 1994 experiment and two attributes in the 2008 experiment. Hess et al. (2020) suggest that the two-attribute experiment (the simple time-money trade off) tends to result in lower VTT than experiments with more attributes.

As discussed in "Data" section, the difference in the presentation of the reference time and cost in the experiment, differences in the type of choice questions, differences in the level of the time and costs of the reference trips, the time and cost differences between the alternatives, and differences in socio-economic composition might have impacted the results even though we did our best to control for the differences in the model estimation. Furthermore, there is a possibility of larger self-selection of travellers with low VTT caused by unobserved factors in the later surveys, due to the fall in response rate in the later surveys, and in Sweden also the change in the recruitment method (making it easier for busy PT travellers with high VTT to escape recruitment and even avoid being approached).

The second explanation seems more consistent with the Swedish replication car data and the RP studies (i.e. Swärdh (2008) and Daly and Fox (2012)), all showing that the VTT increases with income. In the Swedish replication study, none of the differences regarding stated choice experiment or recruitment process between the yearly samples were present. Moreover, it is unlikely that self-selection into the sample of drivers with low VTT would be much higher in the later wave of the replication study. This is because recruitment was identical in the two waves and conducted by number plate registration while the cars were moving, so busy drivers could not avoid being recorded during the recruitment. The response rates after recruitment of all approached travellers were still lower in the later wave, but the fall is modest in comparison to the other studies presented in this paper (65 percent in 1994 and 55 percent in 2007).

In the SE08 car data (with 30 percent lower VTT than in the SE07 data), the response rate had dropped to 36 percent and the recruitment method changed to a exogenous random population sample. Exogenous population samples are also normally applied when collecting NTS and other travel survey data, for which a general decline in response rates over the past decades has become a major issue (Prelipcean et al. 2018). Hence, selection bias could be a problem also in travel survey data.

The home interview RP data used by Daly and Fox also avoided differences arising from changes in experimental design and changes in the recruitment process. Rather the recruitment was conducted using an exogenous sample of individuals as in the SE08 car data. There can still be self-selection present due to low response rates in this form of data collection as well as in studies applying number plate registration (as in the Swedish car study). However, such recruitment still avoids a self-selection in the recruitment process because busy travellers can avoid being approached or decline recruitment on platforms and at gas stations (and probably in web-panels).

In the Weis et al. (2021) study, also finding stable valuations, the time span between the first and second wave was only five years, and both were designed as follow up surveys to the Mobility and Transport Microcensus (MTMC). This made the recruitment process, experimental design and survey design similar across the survey waves. The reported response rate for the later 2015 sample was also high, 76.9 percent, but this was from a sample already recruited to the MTMC, and the response rate of that MTMC is not reported. The response rates for the 2010 sample are not reported in the paper, but it seems probable that the response rate remained similar over the five years.

A relevant question is then how meta-analysis studies, such as Abrantes and Wardman (2011), can result in income elasticities of the VTT close to unity, while our repeated studies do not. A possibility is that the recruitment is more consistent across the included studies than in our surveys. However, since recruitment method or response rate is usually not included as an explanatory factor in the meta-regressions (or even reported), this is unknown. There are also comparisons across countries/areas with different income levels as well as several RP studies in the meta-analysis.

So, what does the analysis in this paper imply regarding the handling of VTT over time? Either we trust the results from the study and conclude that the VTT has declined over time for the public transport modes in Sweden and that the results for the NL data are mixed in terms of changes over time. Those results are similar to earlier repeated VTT studies, but are not consistent with results from meta-studies, the Swedish exact replication study and RP studies. The other option is not to trust the results from our analysis, or the other repeated SP studies that are not exact replications, because differences in the data collection methods and survey conduct have a bigger impact on the VTT than the real change in the VTT over time.

In the latter case, the remaining issue is how to analyse the VTT over time, if the repeated SP studies cannot be trusted. Or precisely, what data sources can be trusted for an analysis of the VTT and the VTT over time? Since this study does not produce exact results in terms of what has been the source of the bias, Table 6 lists three possible sources. The table gives an overview of the presence of these three sources in five different data types. Repeated SP surveys have errors from all three sources. If the key source of the error is the sensitivity to stated choice survey design, we must turn to RP data. However, RP data has other problems, primarily measurement error in input variables. This is often larger in the cost variable, which tends to attenuate the cost parameters and leads to under-estimation of the VTT in RP data if this is not addressed (Varela et al. 2018).

Table 6 Sources of errors of the analysis of the changed in VTT over time

However, if the key source of the error stems from self-selection during recruitment on platforms and gas stations, SP studies recruited from a random sample or number plate registration would be reliable. If the main source is sample selection due to low and declining response rates, NTS data or any form of data collected with low response rates in any of the waves might also produce biased results. If that cannot be controlled for by weighting etc., the remaining options are data stemming from passive data collection, such as mobile phone data (Andersson et al. 2022), road pricing data or ticket sales data. The best option might be to combine SP and RP data, but then it is still essential to understand the strengths and weaknesses of each type of data.

Conclusion

This paper explores the intertemporal income elasticity of the VTT using data from the Netherlands and Sweden, collected at two points in time, thirteen and fourteen years apart, respectively. We do not attempt to answer how the average VTT has changed over time in the national populations, but analysed only how the VTT has changed for similar individuals in the two samples. The results show mostly a declining VTT for a given income level. The results show also a large within-country heterogeneity across modes and purposes, in the cross-sectional income elasticity of the VTT, and in its development over time. This is essentially the same result as was produced by three of the four earlier studies comparing stated choice data at two points in time to study temporal change in the VTT. The main contribution of this paper is to document the differences between the studies collected at different years, indicating the reasons why it is difficult to uncover temporal changes in the VTT. We have concluded that either the VTT has declined for given income levels for many modes and purposes, or the replication study can for some reason not be trusted.

If the decline in the VTT can be attributed to changes in the survey method and experimental design, formulation of the questionnaire and choice questions, survey mode (paper/computer-aided etc., telephone), this would raise fundamental questions regarding the legitimacy of the SP method for VTT research as such, as well as for the extensive practice relying on these methods. If the outcome is critically sensitive to survey design, it would also imply that the estimated VTT could not be compared across surveys since survey techniques change over time, due to improvements in experimental design and econometric techniques and in survey technology. This would also reduce the validity of meta-studies unless these improvements are somehow included in the meta-regression.

However, an even more likely explanation is the fall in response rate over time, and that the recruitment method (on platforms and at fuel stations) has made it easier for busy travellers to escape recruitment in the later years. If the key explanation is that the response rate has dropped or that busy travellers can escape being recruited, this would not necessarily invalidate the stated choice method if recruitment methods could be improved to the extent that a representative sample is recruited. Recruiting representative samples of respondents is a challenge for all sorts of studies, including travel surveys (NTS data). Declining response rates in surveys has since long been well documented (De Heer and De Leeuw 2002). In the literature on election polls the issues of sample selection and representativeness are recognized as a key problem (Chen et al. 2019; Conduit and Akbarzadeh 2020), but this has possibly not received appropriate attention in the value of time literature. The problem might be particularly serious in VTT studies, if samples are getting increasingly selective with respect to VTT because busier people drop out first. Our finding can be applied in all contexts involving VTT (including transport forecasting models) when intertemporal patterns are analysed based on surveys with declining response rates. A possible way forward could be to develop and estimate some sort of model of response probability in further research.

The most important advice to practitioners and researchers would be to spend more resources and focus on increased response rates. Indeed, the response rate also including people approached but declining to be recruited to the survey is often not even reported. If it is not possible to recruit representative samples, the remaining options are data from passive data collection, such as mobile phone data, road pricing data or ticket sales data (Daly et al. 2017) for studying the VTT. We recommend more consideration of methods to make it more difficult for busy respondents to avoid being approached and recruited in future research.