Can repeated surveys reveal the variation of the value of travel time over time?

This paper studies intertemporal changes in the value of travel time (VTT) and investigates whether the change of VTT over time can be studied based on national VTT data, collected at two points in time. We use repeated national VTT data from the Netherlands and Sweden, collected 13 and 14 years apart. The results show mostly a declining VTT for a given income level. The results show also a large within-country heterogeneity across modes and purposes, in the cross-sectional income elasticity of the VTT, and in its development over time. The explanation most consistent with our results and those of others is that the VTT has in fact increased due to income increases, but that the repeated stated choice data cannot detect this given the data, methodology and population changes. In particular, it seems that the response rate has dropped considerably in the later surveys partly due to a higher share of (busy) respondents declining to be recruited. The main contribution of this paper is to document the differences between the studies carried out in different years, indicating the reasons why it is difficult to identify temporal changes in the VTT.


Introduction
A forecast of the value of travel time (VTT) is a key parameter in the appraisal of infrastructure investment, since time saving is the main benefit of many transport investments. Some countries take the intertemporal income elasticity to be the elasticity estimated on cross-sectional data. Sweden for instance applies the cross-sectional income elasticity to the cost parameter in the national forecast model. Other countries, such as the UK and The Netherlands, use data collected at different points in time to make assumptions on the intertemporal income elasticity, either through meta-studies (Wardman et al. 2016) or repeated SP studies (Gunn 2001;Gunn et al. 1999); also Sweden applies this method for valuing transport benefits.
However, there are few such studies based on data collected at different points in time to study the development of the VTT over time, and they give puzzling results (see below). This paper contributes to this literature by analysing data from national VTT surveys collected at two or three points in time, in the Netherlands (1997( vs. 2009( ) and Sweden (1994( vs. 2007( vs. 2008. The aim of this paper is twofold. First, we analyse how the VTT has changed over time by estimating the VTT on the two data sets for each year by applying the same econometric model inspired by Hess et al. (2017) andDe Borger andFosgerau (2008). Second, we analyse the comparability of the surveys. Even if the data from the different years were state-of-practice national stated choice surveys when they were conducted, they are subject to differences regarding sampling and recruitment methods, response rates, experimental design, and survey design. We therefore aim at documenting these differences and exploring to what extent they impact the comparability of the different studies.
The VTT depends on three parameters (DeSerpa, 1971) that all may vary over time for different reasons.
where is the opportunity utility of time, λ is the marginal utility of money and u t − u w is the direct utility of travel time, i.e. the differences between the utility of time during travel ( u t ) and the utility of time spent working ( u w ). 1 At the individual level, λ should reduce with income. However, the relationship between λ and income found in the cross-section might not apply over time at the population level, for instance if the income distribution or population composition changes. The opportunity utility of time reflects the value of time as a resource and equals the utility that could be attained if the travel time could be used for some other activity. This utility may therefore change if people become more (or less) productive or busy at the origin or destination location. The direct utility of travel time (compared to time spent working or on some other reference activity) depends on the comfort of the travel time, but also on how productive or enjoyable the travel time is. It may change over time due to new technology (e.g. laptops and smartphones), crowding and congestion etc.. 2 The average VTT in the population can also change due to changes in the composition of the population (e.g. in terms of employment, or age). The gross income might also have different impact on the VTT if the tax systems change. This paper does not, however, attempt to answer how the average VTT has changed over time in the national populations, but rather how the VTT has changed for similar individuals in the two samples.
The implications of moving away from the assumption that the VTT grows over time proportionally with wage rate or income because the marginal utility of travel time declines, would reduce the future year time benefits in the appraisal of transport projects.
On the other hand, if the time parameter in forecast models decreases, more or longer trips would be forecast, increasing the benefits in the appraisal. We know of four previous studies that have used repeated stated choice data collected roughly 10 years or more apart to study the temporal change in the VTT. The first was based on data conducted in the Netherlands in 1988 and 1997 (Gunn 2001;Gunn et al. 1999). The second study used data collected in Britain in 1985and 1994(Wardman 2001. The third study used data collected in Britain in 1994and 2006(Tapley et al. 2007, and the fourth used data collected for drivers in Sweden in 1994 and 2007 (Börjesson et al. 2012a). The first study found that the VTT had decreased within each income group, but the income increase was large enough to cancel out the trend decline at each real income level, such that the real average VTT remained largely unchanged. The second and third studies found a slight trend decline in the VTT, in spite of an income increase. Gunn (2001) suggests that the non-increasing VTT could be explained by a declining direct disutility of travel time.
The fourth repeated study by Börjesson et al. (2012a), using log willingness to pay with random variation, found no significant difference between the cross-sectional and intertemporal income elasticity. Using the same data, Börjesson (2014) found that the travel time coefficient had remained constant over time but that the travel cost parameter had declined in real terms-which is a central result for implementation in forecasting models. Using RP data Fox (2015) came to the same conclusion (see "Previous studies" section). Hence, of all the repeated SP surveys, only the Swedish repeated car study consistently indicated increased VTT over time. This is also the only study where exactly the same questionnaire, experimental design, and survey method were used in the second wave; the sampling of drivers was also identical, including the season, time of day and locations; a reanalysis of this data for the current paper confirms these earlier results. This indicates that differences in recruitment method, interview method, response rate, season, questionnaire and experimental design can become important reasons for differences in the estimated VTT. As far as possible, we control for such differences in this paper, attempting to do this more thoroughly than in the previous repeat studies. Still our results suggest that the differences that we cannot control for have such a large impact that the resulting VTT cannot be meaningfully compared. In particular it seems likely that the trend decline in response rates, partly due to changes in recruitment method, leads to a substantial downward bias in the VTT in the later year.
A fifth repeat study is Weis et al. (2021), covering only five years (2010 and 2015), where both waves were designed as a repeated follow up SP experiment to the Mobility and Transport Microcensus (MTMC). The authors state that the survey method, recruitment strategy and experimental design remained similar, even if some attributes were slightly revised. They also found that the VTT was stable between the samples. Another study, Flügel et al. (2020), compares the Norwegian VTT from 1997, 2009 to 2018 published in the official reports (i.e. without specific modelling), and finds that they had mostly increased from 1997 to 2009 (except for air trips) and from 2009 to 2018. However, for car, the VTT had declined from 2009 to 2018, in spite of income increases.
"Previous studies" section reviews evidence from cross-sectional models, meta studies and repeated RP studies. "Data" section briefly describes the data used and "Methodology" section discusses the model specification. "Results" section then summarises the results and discusses the findings; "Conclusions" section concludes. Appendices describe the data more fully and present the detailed model results.

Previous studies
This section reviews earlier studies on the intertemporal and cross-sectional income elasticity in addition to the repeated SP studies reviewed above. It shows that here is a considerable disagreement about VTT elasticity and how to forecast VTT when income changes, which is the main justification for more research such as the present study.
Stated preference cross-sectional data yield income elasticities in a wide span of 0.07-0.9, with a consensus value of 0.3-0.5 (see Table 1). Daly and Fox (2012) find a cross-sectional income elasticity in the middle of this consensus span but point out that income elasticities estimated on cross-sectional models cannot necessarily be applied when forecasting the VTT over time because they could be biased by unobserved variables influencing the VTT and correlating with income in the cross-section. To avoid such bias, the data should include longitudinal variation.
Using RP cross-sectional data collected at different points in time in Toronto, Fox (2015) finds that the time parameter remained constant and that the cost parameter declined with income. Swärdh (2008) estimated the income elasticity on the VTT for commuting trips based on revealed choices on income and commuting time for workers changing jobs. Swärdh used Swedish register data from 1983, 1990 and 1993 and found that the intertemporal income elasticity did not significantly differ from unity.
Assuming that the marginal utility of money is inversely proportional to income, implying that the VTT would increase with income, would all else equal also imply reducing price elasticities for travel. That is, if the importance of cost declined over time because income increases, this would not only increase the VTT but it would also reduce the price elasticities for travel. However, the empirical evidence is mixed; Neither Hanley et al. (2002) nor Wardman (2014) find any decline in price elasticity over time. But Bastian et al. (2016) show that income elasticities of demand for driving have declined over time while fuel price elasticities increased over time, in several western countries.
Meta studies are regressions on the outcomes of many cross-sectional VTT studies, which are explained from attributes of the study area, survey period and the method used. Meta-analyses comparing studies based on data collected in different countries at one point in time would estimate a cross-sectional elasticity. Meta-analysis comparing studies based on data from one single country collected at different points in time would estimate an intertemporal elasticity. However, other meta-analyses use studies based on data from many countries and from different points in time, and it is therefore difficult to quantify the difference between intertemporal and cross-sectional elasticity. Also, meta studies should be interpreted with caution -precisely because the quality of the underlying studies is often unclear.
Wardman published several meta-studies on UK VTT data. As the period and the number of studies were extended, the income elasticity increased: from 0.075 ± 0.029 in Wardman (1998) to 0.510 ± 0.300 in (2001), 0.723 ± 0.043 in (2004) and to 0.900 ± 0.035 in Abrantes and Wardman (2011), the final study applying material from 1960 to 2008. The elasticity in the latest study is probably most accurate, as it was based on the largest data set, but also to a larger extent reflecting an intertemporal elasticity due to a wider time span. If the latter factor played a role, it indicates that the intertemporal income elasticity is higher than the cross-sectional. Shires and de Jong (2009) used VTT studies from 30 (mainly OECD) countries almost exclusively for the period 1990-2003 for their meta-analysis. They found the income elasticity to be 0.47 for business, 0.67 for commuting and 0.52 for other purposes. Since they Table 1 Cross-sectional income elasticity of the VTT from SP studies. Note that income elasticities obtained from using banded income are usually subject to more uncertainty than those directly estimated as part of the model specification *Elasticities refers to income at the household level, unless otherwise reported a This reference does not explicitly state income elasticities of the VTT, but we have calculated these from the VTT estimated by income segment using detailed historic income distribution information from Statistics Netherlands (CBS). Note that these VTTs are calculated after weighting / expansion, so that these also include other effects that influence the VTT and that are correlated with income. Hence the resulting elasticities are not pure income elasticities with all other effects being equal b Dillén and Algers write: "The lowest income category is disregarded, and if the rest of the categories are grouped in two halves, then the household income elasticity of the VoT is 0.46 for single person households and ranges from 0.07 to 0.24 for 2 person households with and without children respectively" References Study area and data Year of data collection Income elasticity of VTT Gunn et al. (1999)  use studies from many countries and from several years, they estimate a mixture of crosssectional and intertemporal elasticities. However, it is possible that the cross-sectional variation is dominant in the estimation results, since variation in the VTT between countries is much larger than the variation over time within countries. In a study for the European Investment Bank (Wardman et al. 2012), the data on the UK were extended again, but also combined with data for many other European countries over a long period. The income elasticity of the VTT then increased to 0.72, possibly due to a larger data set with larger variation. 3

Data
In this paper, we analyse data from five national VTT stated choice studies, two in the Netherlands and three in Sweden. The Dutch studies denoted NL97 and NL11 were collected in 1997 and 2011, respectively. The Swedish studies, denoted SE94, SE07 and SE08 were collected in 1994, 2007 and 2008, respectively. The SE07 study was conducted in Sweden for the specific purpose of studying the how the VTT had changed over time. Therefore, an exact replication of the SE94 study was carried out, for car drivers only. In this replication care was taken to use exactly the same questionnaire, design (but the cost differences were inflated by 40 percent corresponding to inflation and GDP growth) and sampling of drivers, including the season, time of day and locations, as in the SE94 study (even the mistakes were repeated on purpose). The same market research firm was used, receiving the same instructions, for the recruitment and interviews and the same computer software. The VTT over time using the SE94 and the SE07 study was studied in Börjesson et al. (2012a) and Börjesson (2014). But in the present study we analyse them together with the SE08 car data. A third Dutch VTT study was conducted in 2009, NL09, where respondents were recruited from a commercial internet panel. However, the analysis of this data showed an implausibly low VTT. This was attributed to the composition of this internet panel: mostly people that have a lot of time to fill in these internet surveys and even sometimes supplement their incomes with the rewards from completing a large number of surveys. Therefore, it was decided to recruit another set of respondents in 2011 in (almost) the same way as was done in 1997. In the final analysis, and in the present paper, the absolute level of the VTT was based on the 2011 data set only, but the socio-economic interaction coefficients were based on the joint 2009/2011-datasets. This is not likely to have any major impact on the resulting VTT relevant for the present study. Table 2 summarises the main characteristics of the five surveys. Further description of the surveys is provided in Appendix A. We conjecture that there are five major differences between the data sets, that cannot be corrected for in retrospect. These are described in the lower part of the table. The first is the number of attributes, implying that the model specifications cannot be identical for the Dutch and the Swedish data. All the surveys included stated choice experiments with only a time and a cost attribute, but the experiment for public transport (PT) travellers in the Swedish SE94 study also included attributes for frequency and the number of transfers, giving four attributes in total.
The second major difference is the presentation format of the attributes. While two surveys presented the time and cost levels in an absolute way, the NL97, SE94 and SE07 studies presented the difference of the travel time and cost compared to the time and cost of the reference trip, e.g. "Travel time: 10 min shorter than now" or "Travel cost: Same as now". The NL97 questionnaire did ask the respondents to state the travel time of their reference trip, but not the travel cost. Therefore, the absolute cost levels in the choice experiment cannot be reconstructed.
The third major difference concerns choice types with respect to a reference trip. In a reference-based design, each choice matches one of the following four types (see Fig. 1): Willingness-To-Pay (WTP): a choice between the reference trip and a faster but more expensive trip; Willingness-To-Accept (WTA): a choice between the reference trip and a slower but cheaper trip; Equivalent Gain (EG): a choice between a trip that is faster than the reference trip and has the reference trip cost and a trip that is cheaper than the reference trip and has the reference travel time; or Equivalent Loss (EL): a choice between a trip that is slower than the reference trip and has the reference trip cost and a trip that is more expensive than the reference trip and has the reference travel time. The NL97 survey was partly reference-based and partly non-reference-based, the other four surveys were fully reference-based. 4 However, the SE94 and the SE07 study included only WTP and WTA type choices.
A fourth difference is that the time and cost of the reference trip differ between the years, but also the average time and cost differences between the alternatives differ between the years. Table 7 in Appendix A shows that, for the Dutch car data, the average travel time is lower in the later years, due to sampling differences (fewer drivers were recruited at fuel stations with longer travel distances in the later year). Table 8 shows that the travel time and cost differences are larger in the later Swedish study, because the design covered a larger range of trade-off values of time to uncover the tail of the VTT distribution. Throughout this paper, all Dutch prices are converted to the 2011 price level and all Swedish prices are converted to the 2008 price level.
A fifth major difference that cannot be corrected for relates to the response rate and recruitment method. The rates of respondents willing to participate in the surveys were much higher in the nineteen-nineties than in the more recent surveys. In addition, in the Swedish PT surveys, the recruitment method shifted from on board the bus/train, where respondents had little chance of avoiding being approached, to recruitment on the platforms. Recruitment on platforms makes it easier for busy travellers, with high VTT (possibly in a hurry to catch their train or bus), to escape recruitment or to decline. In the Swedish car study, number plate registration was used in SE94 and SE07, but the SE08 study used a random population sample of individuals having made a car trip during a predefined survey. The recruitment method in the NL97 and NL11 surveys was identical, but fewer respondents declined recruitment in the earlier year. Hence, it seems likely that the drop out of busier travellers with high VTT was larger in the later survey. And after being recruited, the rates of respondents that completed the survey also dropped in more recent  Table 7 See Table 7 See Table 8 See Table 8 See Table 8 a Household income was also stated but explained less of VTT. Joint taxation does not apply in Sweden years. This pattern of reduced response rates is observed in both the Netherlands and Sweden. Moreover, only the SE08 study gave a lottery ticket to all those recruited to the study. The first three major differences relate to differences in the stated choice experiment, while the fifth arises in the recruitment process. The fourth difference is partly a result of differences in the design, but arises also partly in the recruitment process. None of these differences can be entirely controlled for in the estimation. However, differences in experimental design, in terms of size and sign effects, are controlled for as much as possible by including these variables in the model estimation. Moreover, the absolute levels of the time and cost of the reference trip are also controlled for by including them in the models (for the NL data sets, only time could be included since no information on the reference cost could be included).
Regarding the issues with self-selection, differences in the socio-economic sample composition can be controlled for by including such variables in the estimation if they have a significant impact. Hence, differences in the VTT that we find should not depend on the socio-economic composition of the samples. In this way we can control for differences in self-selection effects due to lower response rates (in the recruitment and in the survey) to the extent that they can be captured by observed sample characteristics. Note, however, that we do not attempt to answer how the average VTT has changed over time in the national populations, so we do not reweight the samples to make them representative of the population at the two points in time. 5 We only address how the VTT has changed for individuals in the two samples, when controlling for the socio-economic characteristics that impact the VTT. Moreover, since we do not know how the recruitment method impacts the sampling probability, we refrain from weighting observations in the estimation.
However, we cannot control for self-selection effects that arise from unobserved effects. Such effects can be large, not least because the VTT can vary also within the same person for different trips. In fact, we cannot control for changes in any factor impacting the VTT that was not included in the experiments. For instance, if reliability, congestion levels, crowding levels, comfort, average travel distance/travel time per day etc. impact the VTT and have changed over time we cannot control for this. Indeed, as stated in the introduction such factors might be one reason why the VTT would increase faster or slower than the income.

Choice of model
The central problem that arises in attempting to compare VTT findings over time and/or between countries is dealing with the differences in the data designs, discussed in the previous section. Ideally, we would wish to have the same model formulation for both countries and for all years. However, because we cannot reconstruct the absolute cost levels in the choice experiment in the NL97 data, we cannot use the model applied for the Swedish data to the Dutch data. And, because the SE94 PT data includes many more attributes, we cannot use the model applied for the Dutch data to the Swedish data. Most important is to maintain consistency within each country, so that intertemporal VTT comparisons are less affected by data changes. We are also restricted to formulations that give reasonable estimates of VTT.
The log value specification, which has been found to give the best results in a number of studies (De Borger and Fosgerau 2008;Börjesson and Eliasson 2014;Hess et al. 2017) cannot be used for the SE94 PT data because it has more than two attributes. The log value approach works by comparing a postulated random VTT for each respondent with a limiting value calculated from the data. With more than two attributes, a single limiting value cannot be calculated, as the critical value for the respondent depends on multiple features of the data. For the Dutch data, we were able to use the log value approach, while for the Swedish PT data we used a 'multiplicative' formulation that has some features of the log value approach and was used, for example, in the multi-attribute experiments of the recent UK study . For the Swedish car data, we used the log value specification with a mixing distribution to make the results comparable with Börjesson et al. (2012a), who used this model for SE94 and SE07. Utility functions with additive random errors were tried but yielded a considerably worse fit (such models for Swedish long-distance trips are shown in Tables 14 and 15 in Appendix B).
We do not aim at the most sophisticated model. We did not use mixed logit models (except for the Swedish car data). However, limited tests were made with a model estimating a log-normally distributed VTT for the Swedish long-distance data (Tables 14 and 15 in Appendix B). The results did not change much, though the VTT increased somewhat, which is not surprising given that the lognormal distribution has a flat tail. The main reason why we do not apply the mixed models in our main models is that when estimating the VTT distribution, in our case a log-normal distribution, also outside the range supported by the data, the results may depend on the parts of the distribution that are extrapolated outside the data range. The impact on the estimated mean that this has is in general larger, the larger the part of the tail that is not supported by the data. This means that a comparison of the VTT resulting from mixed models estimated on data from different years could be seriously biased if the range not supported by the data is large in any of the samples, in particular if the VTT distribution has different support of the data in the two samples. This might be the case since the experimental designs differ.
The specification is applied separately to the two years of sample data in each country, so that only the cross-sectional income elasticity is estimated. However, for the Swedish car data, the SE94, SE07 and SE08 were pooled in one model to make the results comparable with Börjesson et al. (2012a). Moreover, the recruitment method changed from en-route sampling in the SE94 and SE07 to an exogenous random sample in SE08. The selection probability increases with trip length in the earlier two but not in the last survey. Since trip length is positively associated with VTT, the observations must be weighted in the application to be comparable. In estimation however, we account for the difference in trip length by including travel distance in the pooled data model. For this reason, we only compare the difference in the VTT by yearly dummy variables in the estimation in the Swedish car data. Joint estimation data from both survey years was also applied for Swedish PT data but did not provide any further insights.

Netherlands
The common element in the SP experiments in the NL97 and NL11 data is a binary choice experiment with time and cost as the only attributes. In this paper, we use data from these experiments only. As in many econometric studies, the fit of all the models estimated in this paper increases with a multiplicative error formulation relative to an additive error formulation. For the time-cost experiments in the Dutch studies the utility specification is based on the logarithmic specification, i.e. the multiplicative error structure, used by de Borger and Fosgerau (2008) where W is the value of time and ε is a standard logistic error, so that a logit model results; μ is a scale parameter. V 1 is the quicker and more expensive alternative and V 2 is the slower and cheaper alternative: this definition leads to a negative expected value for µ. Δc and Δt are the absolute differences between the travel costs and travel times, respectively, in the binary choices in the experiment.
This specification was already used by Börjesson et al. (2012a) and more recently by Hess et al. (2017) and performed well in those studies.
It is well known that the VTT depends on the design variables, amongst other reasons through size and sign effects, and that reference travel time and travel cost also have considerable impact on the estimated VTT. Hence, the models must take these design and trip characteristics into account for a fair comparison across surveys. 6 (1) In the Swedish model, dummy variables were included to control for type choices (see Eq. (8) in Sect. 8.2.2). However, in the NL97 survey almost half of the choices were not of the type WTP, WTA, EG or EL. As a result, dummy variables cannot be used to find a reference-free VTT. Instead, the size and sign effects of the cost and time differences are modelled explicitly. This is done in a similar way as in the recent UK value of travel time study Hess et al., 2017). Hence Eq. (1) is rewritten as where is the "value" function of the change (c 1 − c 0 ) , in which c 1 is the cost of the leftmost alternative, c 0 is the cost of the reference alternative. The value of the cost difference relative to the reference alternative is where sgn is the sign function. In Eq. (2), denotes the reference-free VTT. In Eq. (3), c and c allow for the reference-dependent value to be different for gains and losses and non-linearly impacted by the size of the gains or losses relative to the reference, in the cost dimension; analogous parameters adjust the time value function. Hess et al. (2017) also allowed for non-symmetric marginal valuations of gains and losses in the value function. Due to data limitations in the present study, however, Eq. (3) assumes that gains and losses have the same functional form.
The parameter is defined as a function of several socio-economic and trip variables In the functions (2-4), the variables and estimated parameters are defined as follows.  In the model estimation, the two parameters in each value function ( and ) and the parameters of the function are determined, as well as the scale µ. Hess et al (2017) andDe Borger andFosgerau (2008) then show that, since the is outside the value function in Eq.
(2), disappears from the calculation and the VTT W can be computed as where We expect the size effect to be larger in the time dimension, i.e. t < c , so that the VTT reduces with smaller time savings.
Unfortunately, the delta method of calculating the variance of the estimated mean VTTs that was applied for the Swedish estimation could not be used for the Dutch analysis because of the complexity of the utility functions. Therefore, we had to revert to sample enumeration to derive the t-ratios for The Netherlands as described in "Results" section.

Sweden
For the SE94 PT data the logarithmic difference specification with multiplicative error structure cannot be used because there are more than two attributes. For this reason, we apply a multiplicative error specification to the Swedish PT data, taking the logs of the observed parts of the utility specification separately where W is the VTT,c 1 and c 2 are the absolute travel costs in the two alternatives of each binary choice and t 1 and t 2 are the absolute travel times in the two alternatives. The specification of Eq. (6) cannot, however, be applied to the Dutch data, because for the 1997 respondents the absolute travel cost levels are unknown.
A central result from De Borger and Fosgerau (2008) is that the underlying referencefree value VTT ( W rf ) can be obtained as a geometrical mean of the VTT for the choice types WTA and WTP, and the geometrical mean of the VTT for the choice types EL and EG This specification holds assuming that the time and cost differences are symmetric, i.e. that the gains are under-weighted as much as losses are over-weighted. If Eq. (7) does not hold this is evidence for asymmetry of the size effect. Under the assumptions of De Borger and Fosgerau the sign effects can be implemented as multiplicative factors for EL, EG, and WTP choices, assuming the base value of time corresponds to WTA choices. The VTT is therefore parametrised as where income is I and Imiss is a dummy variable indicating missing income (approx. 20% of the sample in each year) and EL , EG and WTP are dummy variables indicating choice types. This formulation ensures that VTT is positive, while the ranges of β are unrestricted. Insignificant variables were removed to obtain the final model presented in Tables 11 and 12. In the model estimation, the parameters of the W function are determined. To compute the variance of the estimated mean VTTs, to be able to compute the t-statistics for the difference in the estimated mean VTT between the years (two unknown means), the delta method giving the Cramér-Rao lower bound ) was used on the Swedish data. The variance of the estimated coefficients is then where Ω * is the robust covariance matrix simulated by the Biogeme software (Bierlaire 2003). If there are L parameters in the VTT function, W , the vector W � has L elements, namely the derivatives of W with respect to each of the L elemts in β. In our final model, the VTT function W includes 7 parameters so that The covariance matrix Ω * has dimension 7 by 7. In the Swedish car data, the specification of Eq. (1) is used. However, in this model, data from all years (SE94, SE07 and SE08) and purposes were pooled to increase the number of observations. It extends Eq. (1) by allowing the intercept to follow a normal distribution, assuming it to be constant for each individual (an additional parameter β σ measures the standard deviation of ), following Börjesson et al. (2012a).. W is parametrised as The variables and parameters used in these equations are defined as follows. The controls include dummy variables for year and trip purposes, using 2007 and other trips as base year and trip purpose. Other dummy variables identify employed persons, flexible working hours, gender etc. Insignificant socio-economic terms were removed in the final model presented in Table 13.

Results
In this section we present and discuss the resulting VTT.

VTT by survey year
The models estimated for the two years and countries are presented in Tables 9 and 13 in Appendix B. For each country (except the Swedish car data), the VTT was then derived for all observations, using sample enumeration, in the pooled (combined) sample of respondents from the two years. By pooling the sample, the differences in the VTT found between the years will not be impacted by changes in design variables or socio-economic composition of the samples, if these effects are perfectly modelled. The sample enumeration was carried out as follows. The VTT was computed for each observation in the pooled sample, based on the estimated model (and the socio-economic statistics and design variables for the observation). Then the average of the resulting VTT distribution was computed and presented in Tables 3 and 4, for NL and SE, respectively.
Two versions of the pooled samples were derived for each country, only differing regarding the income variable. Take the NL data as example. In the first version of the pooled sample, all 1997 respondents kept their income as reported, while the incomes of the 2011 respondents were deflated by the ratio sample mean income 1997 sample mean income 2011 to correct for the income growth between 1997 and 2011. The second version of the pooled sample mirrored this procedure, but all 2011 respondents kept their income as reported, while the incomes of the 1997 respondents were multiplied by the ratio sample mean income 2011 sample mean income 1997 . For each country, the VTT was derived from sample enumeration, applying three different combinations of model and sample version (shown in column one, two and six). Column one shows the average VTT resulting from the sample enumeration applying the NL97 (SE94) model on the first version of the pooled sample (reflecting income levels -97 and -94). Column two shows the average VTT resulting from the sample enumeration applying the NL11 (SE08) model on the second version of the pooled sample (reflecting income levels -11 and -08).
In the Swedish data the t-statistics are computed applying the Delta method as explained in "Sweden" section. In the Dutch data, the t-values were computed by varying the estimated coefficients in Eq. (5) between ± 1.96 times their standard deviation, taking correlations between parameters into account and again applying a sample enumeration, resulting in a distribution of the average VTT. The Swedish sample sizes are larger than the Dutch, implying higher significance levels.
However, in the main model we did not account for the panel structure of the data (repeated choices from the respondents). Limited tests were made with the Swedish longdistance data to account for the panel structure of the data. They indicate that the recognition of the panel nature of the data increases the error estimates, in particular in the 2008 data. For these reasons the significance levels are overstated, which needs to be considered when interpreting our results.
The VTT has increased between the years for some modes and purposes and decreased for others in the Dutch data, despite the higher incomes in the 2011 sample. In the Swedish data, the average VTT is lower in the later survey for all modes and purposes (but since standard errors are overstated the change might not be significant) except for long distance bus. The decline is five to ten percent.
Column six derives the VTT by sample enumeration, applying the NL97 (or SE94) model to the second version of the pooled sample (reflecting the income levels of NL11 and SE08). This column can be interpreted as the VTT for NL11 or SE08 that would have been forecast in the nineties, given that the income growth had been known. By comparing column six with column two, reflecting the real VTT outcome for the later year, we get an indication of the changes in the VTT that cannot be explained by the increase in income, assuming that the cross-sectional income elasticity estimated on the sample from the nineties applies over time, and that the experimental design and trip variable are correctly modelled. For Sweden, we find a 14 percent decline for rail trips (for long distance and for the two purposes of regional trips) and a 7 to 18 percent decline for bus. For the NL the results vary much more from an increase of 41 percent for business rail trips to a decline by 41 percent for local public transport business trips. It is hard to find any clear pattern or plausible explanation for this variation. Table 13 shows a model estimated on the pooled samples SE94, SE07 and SE08 for car. As shown by Börjesson et al. (2012a) there are no significant difference in the VTT for SE94 and SE07 when accounting for differences in income, design, distance, purpose etc. However, the dummy for SE08 is -0.361 and significantly different from zero, implying that the VTT for SE08 is 70 percent of the VTT for SE07 (exp( − 0.361)).

Income elasticity
In this section we first derive the income elasticity calculated based on the change in the VTT between the two years and the income change. Note that this can only be interpreted as an intertemporal income elasticity under the assumption that all changes in VTT can be attributed to income changes once differences in the trip and design variables have been controlled for. This assumption might, however, not hold, for instance if the marginal utility of time changes due to changes in travel comfort or productivity, as suggested by the discussion in the introduction. We still believe that these figures are relevant as a benchmark, because the problem of determining how the VTT changes between the base year and future years in CBA is often handled by only applying an income elasticity for expected income changes after the base year.
Column three in Tables 3 and 4, shows the percentage change in the mean VTT computed from column one and two, and also the t-values of the change. Column four reports the percentage change in the mean income of the samples. The income increase is substantial in most cases, except for other trips by train and public transport in the Dutch data (which is likely to be related to the small sample size in the 2011 survey). Column five reports the implied income elasticity. The t-statistics for the income elasticity are computed based on the standard errors of the change in the VTT and in the income. Column five shows that only one of the income elasticities for private trips is significantly different from zero in the Dutch data, and that this is negative, and that almost all are negative in the Swedish data (negative income elasticities are not micro-economically consistent).
In practical applications, an inter-temporal income elasticity is often taken to be an estimated cross-sectional elasticity even if there is no strong reason why they should be equal. For comparison, Table 5 therefore reveals the cross-sectional elasticities in the Dutch and the Swedish samples. In the Swedish data, the income elasticity equals the estimated income parameter since the we take the log of the utility function in the estimation. In the Dutch sample the elasticity was calculated by increasing all incomes by 1 percent in the same sample enumeration tool with which the average VTT was calculated. The Swedish numbers for public transport show a tendency towards a higher cross-sectional income elasticity in the later year. However, in the Dutch samples we see the opposite and more varied results: lower cross-sectional elasticities in the later year (with one exception). There is thus no conclusive evidence regarding the pattern of the cross-sectional elasticities for the two countries. The SE94 elasticities were relatively low compared to the international evidence (Table 1). The NL97 elasticities were relatively high. So, in both countries the more recent elasticities moved towards the international mean values. Table 5 also show the income elasticities estimated for the Swedish pooled car samples in Table 13. In this model the cross-sectional income elasticity increased with income (as also found in Börjesson et al. (2012a)). This is consistent with the higher cross-sectional income elasticity in the later Swedish sample for public transport, where income is higher.

Discussion
We have observed an unexplained reduction in the VTT in all cases but the Swedish exact replication SE94-SE07. For the NL data there is also an unclear pattern of increases and decreases which is puzzling. For the Swedish PT modes, the decline is more consistent over modes and purposes. We also see a 30 percent lower VTT in the SE08 data than in the SE07 data for car. There are two potential explanations for the lower VTT in many of the later surveys. The first is that the VTT has in many cases declined over time despite income increases. The second is that VTT is not in fact reduced, and that such findings are an artefact of the survey or experimental method and design or survey conduct. The first explanation could be valid if the travel time has become more comfortable or productive as explained in the introduction. It could also be an effect of changes in other factors that we cannot control for. If for instance reliability has decreased over time, this might have reduced the VTT, since a small time reduction might be interpreted as pointless if the travel time normally varies more than the time reduction. 7 On the other hand, if crowding or congestion have increased, this would work in the opposite direction and increase the VTT. There is however no evidence that reliability or crowding has increased over time for the bus and train services in the NL or in Sweden (long-distance and regional).
The first explanation could also be valid if the decline in the VTT is related to changes in the income distribution in the samples. Our models account for the income dependency of VTT, but if the functional form of this dependency is different from the one that we have assumed, a shift in the shape of income distribution could result in a lower estimate of the VTT for the same income level in later years. The income distribution has become slightly more skewed to the right over time in the two countries. Still, the shifts are modest in the Netherlands: the Gini coefficient increased slightly from 0.276 (1997) to 0.282 (2011). In Sweden the Gini coefficient increased more, from 0.253 (1994) to 0.311 (2008). This increase was almost exclusively a result of increased spread of income from capital (the distribution of earned income and transfers remained stable) (Björklund and Jäntti 2011). Moreover, in the Swedish sample, the income among the sixth of respondents with the lowest income did not increase between the years. 8 This could be one contributing factor for a possible decline in the VTT, but even if the income distribution has become slightly more skewed over time, this would probably not explain the large decline in the VTT that we find.
There is no reason to believe that differences in reliability, congestion or income distribution would be found for car drivers between 2007 and 2008, explaining the 30 percent lower VTT in the latter. However, the Swedish 2008 data was collected during the early part of the financial crisis. It cannot be ruled out that this impacted the results. Still, the crisis did not immediately impact the economy, and we have controlled for the respondents' incomes in the models.
Moving to the second explanation, we identified in "Data" five differences between the yearly samples that we cannot entirely control for in the analysis. The first is that the number of attributes differs. The Swedish data for bus and train had four attributes in the 1994 experiment and two attributes in the 2008 experiment. Hess et al. (2020) suggest that the two-attribute experiment (the simple time-money trade off) tends to result in lower VTT than experiments with more attributes.
As discussed in "Data" section, the difference in the presentation of the reference time and cost in the experiment, differences in the type of choice questions, differences in the level of the time and costs of the reference trips, the time and cost differences between the alternatives, and differences in socio-economic composition might have impacted the results even though we did our best to control for the differences in the model estimation. Furthermore, there is a possibility of larger self-selection of travellers with low VTT caused by unobserved factors in the later surveys, due to the fall in response rate in the later surveys, and in Sweden also the change in the recruitment method (making it easier for busy PT travellers with high VTT to escape recruitment and even avoid being approached).
The second explanation seems more consistent with the Swedish replication car data and the RP studies (i.e. Swärdh (2008) and Daly and Fox (2012)), all showing that the VTT increases with income. In the Swedish replication study, none of the differences regarding stated choice experiment or recruitment process between the yearly samples were present. Moreover, it is unlikely that self-selection into the sample of drivers with low VTT would be much higher in the later wave of the replication study. This is because recruitment was identical in the two waves and conducted by number plate registration while the cars were moving, so busy drivers could not avoid being recorded during the recruitment. The response rates after recruitment of all approached travellers were still lower in the later wave, but the fall is modest in comparison to the other studies presented in this paper (65 percent in 1994 and 55 percent in 2007).
In the SE08 car data (with 30 percent lower VTT than in the SE07 data), the response rate had dropped to 36 percent and the recruitment method changed to a exogenous random population sample. Exogenous population samples are also normally applied when collecting NTS and other travel survey data, for which a general decline in response rates over the past decades has become a major issue (Prelipcean et al. 2018). Hence, selection bias could be a problem also in travel survey data.
The home interview RP data used by Daly and Fox also avoided differences arising from changes in experimental design and changes in the recruitment process. Rather the recruitment was conducted using an exogenous sample of individuals as in the SE08 car data. There can still be self-selection present due to low response rates in this form of data collection as well as in studies applying number plate registration (as in the Swedish car study). However, such recruitment still avoids a self-selection in the recruitment process because busy travellers can avoid being approached or decline recruitment on platforms and at gas stations (and probably in web-panels).
In the Weis et al. (2021) study, also finding stable valuations, the time span between the first and second wave was only five years, and both were designed as follow up surveys to the Mobility and Transport Microcensus (MTMC). This made the recruitment process, experimental design and survey design similar across the survey waves. The reported response rate for the later 2015 sample was also high, 76.9 percent, but this was from a sample already recruited to the MTMC, and the response rate of that MTMC is not reported. The response rates for the 2010 sample are not reported in the paper, but it seems probable that the response rate remained similar over the five years.
A relevant question is then how meta-analysis studies, such as Abrantes and Wardman (2011), can result in income elasticities of the VTT close to unity, while our repeated studies do not. A possibility is that the recruitment is more consistent across the included studies than in our surveys. However, since recruitment method or response rate is usually not included as an explanatory factor in the meta-regressions (or even reported), this is unknown. There are also comparisons across countries/areas with different income levels as well as several RP studies in the meta-analysis.
So, what does the analysis in this paper imply regarding the handling of VTT over time? Either we trust the results from the study and conclude that the VTT has declined over time for the public transport modes in Sweden and that the results for the NL data are mixed in terms of changes over time. Those results are similar to earlier repeated VTT studies, but are not consistent with results from meta-studies, the Swedish exact replication study and RP studies. The other option is not to trust the results from our analysis, or the other repeated SP studies that are not exact replications, because differences in the data collection methods and survey conduct have a bigger impact on the VTT than the real change in the VTT over time.
In the latter case, the remaining issue is how to analyse the VTT over time, if the repeated SP studies cannot be trusted. Or precisely, what data sources can be trusted for an analysis of the VTT and the VTT over time? Since this study does not produce exact results in terms of what has been the source of the bias, Table 6 lists three possible sources. The table gives an overview of the presence of these three sources in five different data types. Repeated SP surveys have errors from all three sources. If the key source of the error is the sensitivity to stated choice survey design, we must turn to RP data. However, RP data has other problems, primarily measurement error in input variables. This is often larger in the cost variable, which tends to attenuate the cost parameters and leads to under-estimation of the VTT in RP data if this is not addressed (Varela et al. 2018).
However, if the key source of the error stems from self-selection during recruitment on platforms and gas stations, SP studies recruited from a random sample or number plate registration would be reliable. If the main source is sample selection due to low and declining response rates, NTS data or any form of data collected with low response rates in any of the waves might also produce biased results. If that cannot be controlled for by weighting etc., the remaining options are data stemming from passive data collection, such as mobile phone data (Andersson et al. 2022), road pricing data or ticket sales data. The best option might be to combine SP and RP data, but then it is still essential to understand the strengths and weaknesses of each type of data.

Conclusion
This paper explores the intertemporal income elasticity of the VTT using data from the Netherlands and Sweden, collected at two points in time, thirteen and fourteen years apart, respectively. We do not attempt to answer how the average VTT has changed over time in the national populations, but analysed only how the VTT has changed for similar individuals in the two samples. The results show mostly a declining VTT for a given income level. The results show also a large within-country heterogeneity across modes and purposes, in the cross-sectional income elasticity of the VTT, and in its development over time. This is essentially the same result as was produced by three of the four earlier studies comparing stated choice data at two points in time to study temporal change in the VTT. The main contribution of this paper is to document the differences between the studies collected at different years, indicating the reasons why it is difficult to uncover temporal changes in the VTT. We have concluded that either the VTT has declined for given income levels for many modes and purposes, or the replication study can for some reason not be trusted. If the decline in the VTT can be attributed to changes in the survey method and experimental design, formulation of the questionnaire and choice questions, survey mode (paper/ computer-aided etc., telephone), this would raise fundamental questions regarding the legitimacy of the SP method for VTT research as such, as well as for the extensive practice relying on these methods. If the outcome is critically sensitive to survey design, it would also imply that the estimated VTT could not be compared across surveys since survey techniques change over time, due to improvements in experimental design and econometric techniques and in survey technology. This would also reduce the validity of meta-studies unless these improvements are somehow included in the meta-regression.
However, an even more likely explanation is the fall in response rate over time, and that the recruitment method (on platforms and at fuel stations) has made it easier for busy travellers to escape recruitment in the later years. If the key explanation is that the response rate has dropped or that busy travellers can escape being recruited, this would not necessarily invalidate the stated choice method if recruitment methods could be improved to the extent that a representative sample is recruited. Recruiting representative samples of respondents is a challenge for all sorts of studies, including travel surveys (NTS data). Declining response rates in surveys has since long been well documented (De Heer and De Leeuw 2002). In the literature on election polls the issues of sample selection and representativeness are recognized as a key problem (Chen et al. 2019;Conduit and Akbarzadeh 2020), but this has possibly not received appropriate attention in the value of time literature. The problem might be particularly serious in VTT studies, if samples are getting increasingly selective with respect to VTT because busier people drop out first. Our finding can be applied in all contexts involving VTT (including transport forecasting models) when intertemporal patterns are analysed based on surveys with declining response rates. A possible way forward could be to develop and estimate some sort of model of response probability in further research.
The most important advice to practitioners and researchers would be to spend more resources and focus on increased response rates. Indeed, the response rate also including people approached but declining to be recruited to the survey is often not even reported. If it is not possible to recruit representative samples, the remaining options are data from passive data collection, such as mobile phone data, road pricing data or ticket sales data  for studying the VTT. We recommend more consideration of methods to make it more difficult for busy respondents to avoid being approached and recruited in future research.
In 2013, results from a new national value of travel time study were officially released (Kouwenhoven et al. 2014;Significance et al. 2013). Stated choice data were collected in 2009 (using an existing commercial online panel) and 2011 (using en-route recruitment of travellers as in 1997).
In the 2011 surveys, each respondent was asked to answer three SP experiments. Experiment 1 was a simple time-cost trade-off experiment, with choice screens appearing as in Fig. 3.
In addition to this stated choice experiment, second and third experiments to determine the value of travel time reliability were added. The 2011 recruitment locations were mostly the same as in 1997; however, several new locations were added mainly to recruit respondents for public transport.
The sample statistics by mode and purpose are given in Table 7. For all trip purposes, travel time in the 2011-survey is lower for car and local PT but higher for train compared to the 1997-survey (see Table 7). The travel time and cost changes, ΔT and ΔC, are smaller in the later study for all modes and purposes except for train-business. The differences in travel time of the reference trip, and the differences in ΔT and ΔC between the surveys are related to differences in the recruitment method and in the experimental design.
All incomes in the NL97 survey were converted to net annual incomes. 9 All incomes and costs were inflated in line with the growth of the Dutch consumer price index 1997-2011. Similarly, all incomes and cost from the 2009 data collection phases of the NL11 survey were inflated in line with the growth of the Dutch consumer price index 2009-2011 so that all values for the Netherlands in the rest of this paper are given in 2011 prices.

Sweden, 1994 and 2008-surveys: SE94 and SE08
Two national VTT studies for all modes of passenger transport have been conducted in Sweden, the first in 1994 (Dillén and Algers 1998) and the second in 2007(Börjesson et al. 2012aBörjesson and Eliasson 2014); the references include details of the experimental design. The sampling of respondents and stated choice design experiments were not identical due to methodological development. The sample statistics of the data are included in Table 8. The travel time differences ΔT were similar in the surveys. However, the travel cost differences ΔC are substantially larger in the 2008 survey. The main reason for this is that the bid range was extended to better capture the tail of the VTT distribution (Börjesson et al. 2012b), which was not considered in the 1994 survey.
The recruitment of respondents for the public transport studies was conducted on board while travelling, in 1994. If they accepted the invitation, the 1994 subjects were mailed the survey questionnaire on paper, where details of the observed trip could be filled in as well as sheets for the stated choice interview. They were later contacted by telephone, repeatedly until reached, but at most seven times. When reached, a computer aided telephone interview was then undertaken on an agreed day, collecting all information on the mailed survey questionnaire and the stated choices. (Hence the paper questionnaire was never handed in but was only used by respondents to support their memory.) In 2008 the travellers were recruited on public transport platforms (hence it might have been easier for busy travellers to avoid recruitment). If they accepted participating in the study, the respondents' addresses and telephone numbers were collected. They then received a link to a web-based questionnaire; however, they could choose to respond to the questionnaire by a call-back telephone interview instead to reduce potential selection bias. Few respondents were interviewed over the phone. The questionnaires used in the 1994 and in 2008 were of a similar length. However, since the 1994 survey was responded to over the phone, we can assume it took more time than in 2008. Table 8 reports monthly gross income. Sweden has a progressive tax system, but the system remained relatively similar between the two years. All incomes and costs in the SE94 survey were inflated in line with the growth of the Swedish consumer price index 1994-2008 so that all values for Sweden in the rest of this paper are given in 2008 prices.
In both surveys, the instructions to the respondents and the choice context were virtually the same. The context of the SP experiment was that of a recently made trip and   the attribute levels were pivoted around the observed levels (the 'reference' values). However, the presentations of the alternatives differed between the surveys. In the first Swedish survey, only the alternative with the reference trip was described with absolute attribute levels (on the reference card, see Fig. 4). The attribute levels on of alternative trips were expressed in terms of difference in relation to the levels of the reference trip (i.e., the travel time was 10 min longer than the reference trip, the travel cost was 1 euro cheaper than the reference trip). However, the departure time and number of transfers were given in absolute levels. In the 2008 survey, the absolute time and cost levels were presented for all alternatives (see Fig. 5).

Appendix B: final model estimates
See Tables 9, 10, 11, 12, 14 and 15. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.