The German value of time and value of reliability study The survey work

In 2012 Germany’s Federal Ministry of Transport and Digital Infrastructure (BMVI) ini-tiated several projects in preparation of the new Federal Transport Infrastructure Plan (BVWP) 2030. This included an update of the general methodology and in particular of its cost–benefit analysis which is used to evaluate the effects of hundreds of German infrastructure projects under study. As part of the work the first official values of time (VOT) and values of reliability (VOR) for personal and business travel for Germany derived from a stated preference survey were estimated. From May 2012 until January 2013 nationwide data of more than 3000 participants was collected in a combined two-stage revealed and stated preference survey. This paper discusses the survey design, reports experience of the field phase and analyses the response behaviour of the sample. The stated choice experiments address mode, route, time of departure, workplace and residential location choice. The complex multi-attribute experiments of different types cover various aspects of short and long-term travel choice attributes which the respondent has to take into consideration during his decision process. Furthermore overlapping variables of the stated and revealed preference experiments enabled a joint estimation of the whole data for deriving the VOTs and VORs. Additionally numerous socio-demographic and attitudinal questions plus the large sample size for business and non-business trips make it a unique dataset offering various aspects of travel behaviour and their valuations to explore. data collection process. Using a combined RP and SP survey reflects the state of the art of transport research. its components, activity schedul- ing, and travel survey data collection. His current work includes research on autonomous vehicles, car shar-ing, and the transportation-related effects of social networking as well as development of advanced transpor- tation modeling methods, such as the agent-based micro-simulation toolkit MATSim).


Introduction
The German Federal Ministry of Transport and Digital Infrastructure (BMVI) has recently published the 2030 Federal Transport Infrastructure Plan (FTIP, Bundesverkehrswegeplan, BVWP), its medium-to long-term investment strategy for the country's transport infrastructure serving longer distance travel (BMVI 2016). As part of this, it updated the overall methodology of its central evaluation tool, cost-benefit analysis (CBA). The effects of hundreds of infrastructure projects in transport policies and investments are evaluated with CBA. In the context of updating the evaluation tool one project estimated and recommended values of travel time (VOT) and travel time reliability (VOR) for personal and business travel (Axhausen et al. 2015a). The new VOTs were estimated to replace existing values which were based on normative derived values from the BVWP'92 and had not been verified independently since then (BMVBS 2003). The VORs were estimated for the first time from survey data, although they are (still) not part of the standard appraisal. The aim of integrating reliability into the new BVWP is, in line with practice and science, to make transport systems not only faster but also more reliable (BMVI 2016). To address this a research team around the IVT (ETH Zurich) estimated the VOT and VOR for the BMVI (Axhausen et al. 2015a). Another BMVI-initiated project calculated VOTs and VORs for freight, but this is not subject of this paper (BVU et al. 2016).
Often, travel time savings make up the largest share of the gains in CBAs (Mackie et al. 2001). Micro-economic models of time allocation have been used to derive the valuations of technologically constrained time use since Becker (1965), Beesley (1965) and DeSerpa (1971), especially on the value of travel time (e.g. Truong and Hensher 1985;Bates 1987;Jara-Diaz 1990). The current state of practice draws largely upon past British Hess et al. 2017;Wardman et al. 2016; Department for Transport 2015; Mackie et al. 2003;Wardman 1998), Dutch (Kouwenhoven et al. 2014;Significance et al. 2012), and Scandinavian studies (Börjesson and Eliasson 2014;Ramjerdi et al. 2010;Fosgerau et al. 2007). Time valuation moved from revealed preference (RP) data to a growing reliance on personalized stated preference (SP) experiments to estimate the VOTs and VORs by using suitably formulated discrete choice models of travel behaviour, especially of route and mode choice. While RP data relates to the actual choices of respondents in real-world situations, where they reveal their preferences and tastes through the choice they make, SP data are hypothetical choices of respondents where data is collected presenting experimental or survey situations to respondents (Train 2003). Today, a personalised stated choice survey is the standard approach (e.g. Small 2012).
Swiss studies followed a variant path, when compared to international practice by employing more complex SP experiments including multiple modes and multiple elements of the generalized costs of travel in a series of overlapping choice contexts (Axhausen et al. 2004(Axhausen et al. , 2008Weis et al. 2012;Fröhlich et al. 2013). While these kind of complex choice surveys have been applied for some years in Switzerland more recently they were also acknowledged by researchers of other national VOT studies (e.g. Hess et al. 2017).
The design of the German value of time and value of reliability study builds on the experience of those studies in Switzerland. As described above the features of the survey are complex multi-attribute experiments of different types covering various aspects of short and long-term travel choice attributes, designed for the estimation of random utility models. During his decision process the respondent has to take all these attributes into consideration. This makes the choice situation more realistic (Louviere et al. 2000). Furthermore overlapping variables of the stated and revealed preference experiments are suitable for a joint estimation on the whole data. Additionally numerous socio-demographic and attitudinal questions plus the large sample size for business and non-business trips make it a unique dataset offering various aspects of travel behaviour and their valuations to explore. This paper presents the design of the German value of time and value of reliability study in detail. Further, it will report on the field phase of the study and analyse the response behaviour and the character of the attributes of the sample.

The survey idea and design considerations
The first official value of time and reliability estimation for Germany required utmost diligence. Choice experiments with multiple attributes are complex and sometimes difficult to understand for the respondents. Thus, different blocks of choice experiments were conducted. A combination of mode choice and departure time for the estimation of the VOR would even have been too complex which is why some relevant attributes were included in route choice experiments. Also some modes are not relevant for certain groups as not everyone has a car available. This information was gathered beforehand in the socio-economic questions and used for the questionnaire assignment. It was not only important for the estimation to include actual decisions of a single trip or route but also long-term decisions to measure the effect of future trips. These long-term decisions gave the respondents the opportunity to implement major changes in their choices but also include a discounted evaluation of the total of their short-term trips. All experiments included common variables which made it possible to estimate the required joint model by pooling the data and even include the collected RP data as a reference. This approach was most useful to create a realistic rather than only a hypothetical choice situation. It was even possible to estimated a pooled short-term and long-term model (Dubernet 2019;Dubernet et al. 2018). Figure 1 shows the steps of the study. As business travel is concentrated in a small share of the population, a complementary sample of such travellers was recruited in addition to a population-based sample to achieve an adequate sample size. Business travel was defined as all employment-related travel, but excluding commute trips, emergency services and driving as work (delivery, bus, coach drivers, etc.). The category includes various kinds of business travellers from local craftsmen to lawyers and consultants. The additional sample of business travellers was recruited with an online access panel.
On the basis of the revealed preference (RP) data collected, a stated choice (SC) questionnaire was designed in a second step. The short-term SP experiments include mode choice, route choice and route choice and reliability experiments. They are described in detail in section "Design of the short-term experiments". In order to allow the cross-checking of the results, this approach was further expanded to include long-term choice contexts, which also involve travel as an element, which also had been trialled in an earlier Swiss study (e.g. Weis et al. 2012). The long-term SPs include residential location and workplace choice situations and are described in section "Design of the long-term experiments". At the end of each stated choice block all respondents had the opportunity to mark whether one or whether all of the attributes had no impact on their decision in the different choice situations or if all attributes were important to the respondents. The results are described in section "Variable importance". All the SP questionnaires included additional attitudinal questions on risk acceptance, environmental protection and variety seeking in daily life. A descriptive analysis of the attitudinal questions can be found in section "Attitudes".
In addition to the survey itself two secondary subjects of interest were investigated in the first phase of the project for further validation of the survey approach and design. The first issue was that business travellers are sometimes not free to choose the mode or even the route of their travel due to company policy and thereby cannot contribute valid SP experiments. This was checked before the main survey by conducting a small-scale qualitative study. Twenty-four decision makers had been recruited to cover the regions of Germany as well the range of firm sizes. While many firms indeed had policies in place, the sample reported that their employees were free to choose their routes and in the vast majority also the mode of travel. This allowed us to go ahead with the SC experiments without having to fear a major bias in the results (see Chapter 3, Axhausen et al. 2015a, for a detailed description).
The other important issue for the BMVI was the treatment of small travel time savings. The empirical literature on short-term changes in travel behaviour shows that small travel time changes (e.g. < 5 min) are often ignored or not perceived by the travellers. Still, in the long-term logic of Cost-Benefit Analyses (CBA) this is irrelevant. To account for this would be inconsistent with its assumptions and would open the chance to manipulate its results through dividing or aggregating projects into smaller or larger units. After a literature search on the state of the art on size and sign effect (for example Daly et al. 2011;Austroads 2012) it was tested with the collected data if the size of the travel time differences offered to the respondents in the SP experiments had an impact on the valuations. After accounting for the other non-linearities, the models could not identify such effects. Thus our recommendation for the BMVI was to follow international practice and to value all savings equally (Ehreke 2016).

Design of the short-term experiments
In the non-business survey RP data on three trips undertaken by the respondents were collected in a first step. The purposes of the RP trips were pre-specified: commuting to work and the trips to the most important shopping and leisure (< 50 km) destinations.
Also information on the last long-distance trip over 50 km distance was collected, where, if the latter was ground-based, data on the most recent air trip was also collected. On some occasions the purpose of the reported last long-distance trip was business so that the non-business sample also contains a small number of business trips. The rationale behind the approach of collecting information on short and long distance trips is based on the observation that the bulk of a person's everyday travel is to a very small number of destinations (Ahas et al. 2010a, b;Schönfelder and Axhausen 2010). So within a relatively short computer assisted telephone interview (CATI) a good range of trips could be obtained. Business travellers reported their last three business trips.
The reference trip of a respondent was chosen randomly but aiming for an overall share of about one third long-distance trips and two-thirds daily trips, so the reference trip was selected with a bias to longer trips given their rarity and the interest of the BVWP in long-distance travel. This selection was corrected in the analysis through a reweighting to match the distance-purpose distribution observed in the most recent German national travel diary survey (Follmer et al. 2010). The most recent trip became the reference in the business sample. During the CATI the destinations and the route of the reference trip were geocoded using the software Trip Tracer (DDS Digital Data Services GmbH 2012). The gathered trip information was complemented with the usual socio-demographic information and information about mobility tools as well as attitudinal questions.
The SP experiments were constructed around the reference trip. Information about the non-chosen options were added. The non-chosen alternatives and their attributes were based on information from a number of sources. Door-to-door car travel times were computed based on the average travel times reported by Tom-Tom Stats and a NavTeq-network for Germany using the MATSim framework (Horni et al. 2016). The average car travel cost were calculated based on the 2012 ADAC (General German Automobile Club) price-per-kilometer estimate for an average sized car in each car segment (range from mini to caravan) (ADAC 2012). The travel times, headways, transfers and prices on public transport including air travel were obtained from the relevant websites with an internet bot programmed by IVT.
The SP experiments had to be generated in a way to gather as much information as possible with the smallest possible sample size. To this end, an efficient design based on variations of the reported attribute levels-a so called pivot design-was computed using the software Ngene (Rose et al. 2009). In a pivot design the attribute levels shown to the respondents are pivoted from reference alternatives of each respondent (ChoiceMetrics 2018). Table 1 shows the design and attribute levels of the different short term experiments.
Both samples received the SP experiments within a maximum of 2 weeks of having participated in the CATI. The business trip sample responded via a web-based survey system. The non-business sample could choose to respond with a paper-and-pencil form or with a web-based survey. Respondents in the non-business survey received three different SP experiments. To keep the response burden low the business sample respondents only received two types of SP experiments-either a mode choice, route choice or departure time choice (reliability) experiment but no long-term SP. So in total, respondents were offered between 16 and 24 choice situations. Each type of SP experiment contained 8 choice situations. Table 2 shows the 18 possible combinations of the different SP experiments for the non-business sample where each combination represents one type of questionnaire. The design of the business sample was basically the same only without the longterm experiments of residential location and workplace choice.
In the mode choice experiments the respondent had to choose between three modal alternatives. The modes offered depended on the reported reference mode and were either walking, cycling, car, public transport (PT) and the various long distance modes: train, air and coach. At the time of the survey, coach travel had just been de-regulated. The resulting lack of familiarity with the coach as a scheduled long-distance alternative resulted in unreliable estimates and no results for the coach option were reported. Belgiawan et al. (2019) faced similar problems when comparing the mode choice experiments to other context depending data and deriving values of time for the coach option. Figure 2 shows an example of a mode choice experiment with the three alternatives bike, public transport and car.  In the route choice experiments respondents were offered two route alternatives for either car or public transport. Figure 3 shows an example of a car route choice experiment.
The departure time and reliability experiment was formulated as route-departure time choice with an indication of travel time variability. Three formats of different complexity were tested, but each allowing to estimate the mean-variance model of scheduling (Li et al. 2010). All three formats were retained after the pre-test, as it indicated no clear preference between them in spite of their growing complexity. Figure 4 shows the three different presentation types of reliability using the example of public transport where each column (PT type 1, PT type 2, PT type 3) represents one type of experiment.
The travel time reliability was varied by providing different congestion probabilities and average congestion times (delay) for automobile travel and by providing the probability of delays (in minutes) from scheduled arrival time for public transport travel (delays were a percentage of the specified tolerance from the RP survey). Furthermore the mode choice experiments included the share of delayed arrivals and the route choice experiments the share of trips delayed.
As a result of the pre-test the RP questionnaire was shortened for the main survey. To make the trade-offs easier to understand for the respondents it was decided to also show monthly and not only trip based costs in the SP questionnaire in the main survey.

Design of the long-term experiments
Most value of time studies consider short term decisions by framing experiments around a situation where respondents are presented with variations to travel time and cost of different modes or routes. The questions arises if the focus on short term decisions is the most appropriate? Can for example a commuter vary much of his daily commute in the short run or is it perhaps more reasonable that changes in commutes occur because of longer term decisions that people make such as where to work or where to live? (Beck et al. 2017).
Workplace and residential location influence many other behavioural choices of travellers as they define the marginal cost of further travel and the distances involved. Therefore the focus of several more recent empirical studies shifted to understand and explain everyday travel behaviour as a routine activity changing due to key events such as residential relocation or workplace decisions. A recent article by Müggenburg et al. (2015) reviews the theoretical framework and the most important studies investigating mobility behaviour in a long-term choice context. Schirmer et al. (2014) give a comprehensive overview of residential location choice literature and show that travel time, commuting and employment changes are significant determinants of choices.
Trading workplace or residential location, however, represents a long-term choice; it is a decision that is not made easily and cannot be changed quickly. In the long-term experiments the respondents could choose between their current work or living situation and a constructed alternative. The alternatives include travel related variables and in addition a description and variation of work and residential attributes of the respondents. The respondents were asked to make trade-offs between these transport and workplace or residence related attributes.
In the workplace games we presented choices via a labelled choice experiment where respondents were asked to choose between their current workplace and an alternative workplace that varied in commute times, commute costs, salary and other workplace attributes. The SP experiments were generated in the same way using efficient design as described in section "Design of the short-term experiments". The attributes and their variation can be found in Table 3. An example of this choice task is shown in Fig. 5. A respondent received eight long-term choice tasks in total.
The residential location games were similar to the workplace ones but with residential attributes. In addition to the travel cost and time for commute trips the alternatives also show the time and cost for car and public transport to the nearest shopping location. The residential attributes regard the appearance and location of the dwelling. All attributes and their variation can be found in Table 3. An example of this choice task is shown in Fig. 6.

Response behaviour
After the pre-test in May 2012 the two-step survey was carried out in six subsequent waves from July 2012 to January 2013. For their participation in the whole survey respondents of the non-business sample received a lottery ticket (benefiting the charity "Aktion Mensch", worth about 35 Euro) as an incentive. Respondents of the business sample were recruited by an online access panel and received the usual reward for their participation in the form of reward points for their panel account.
The population based non-business sample was drawn from a dual frame of land-line and mobile numbers (60% and 40%) to ensure that the growing share of mobile-only persons are included (ADM Arbeitskreis Deutscher Markt-und Sozialforschungsinstitute e.V. 2014). The sample was incrementally controlled over the survey period so as to ensure spatial quotas in terms of the German federal states. Recent findings in the Norwegian VOT survey and already in the Dutch VOT survey from 2012 show that the recruitment method also has an influence on the value of travel time. As a form of self-selection internet panel-members who regularly respond to interviews to earn extra money or because they have more time available have different (lower) values of time than those recruited by calling or en-route (Flügel et al. 2019). In the pretest it was tested to recruit respondents for the business sample the same way as for the non-business sample by calling respondents and ask them to participate. Significantly fewer respondents could be recruited for the business sample. To avoid hidden refusal it was hence decided to recruit participants from an online access panel where the trip purpose could be chosen beforehand. The RP data was collected the same way as in the nonbusiness sample with a CATI.
Before sending out the SP game sets of the first wave (pre-test) the expected response rates for the paper-pencil and online non-business and business sample were predicted following and compared to other surveys conducted at the IVT to calculate the number of contacts needed for the aimed-for number of participants (Axhausen et al. 2015b). In the end all three observed rates settled in the expected range (see Fig. 7). The response rate was even higher than for the IVT Swiss value of time study (Axhausen et al. 2004). A recruitment rate of over 30% for the CATI and 73% completion rate for the first phases of the RP survey and response rates of 68% (non-business sample) and 91% (business sample) for the second phases in spite of the complexity of the instruments indicate a strong interest in the topic.
In the RP survey over 4000 persons completed the questionnaire providing sociodemographic characteristics and information on recent trips. During the recruitment phase the data was checked and controlled so that there was a sufficiently large sample of responses for all trip purposes.
Including the pre-test data over 2400 non-business and over 830 business sample respondents completed the questionnaire including the SP games provided to them. Hence the sample contains almost 64,000 choice situations (Table 5). Figure 7 and Table 4 show that the response rate of the business study is overall higher than in the non-business study as participants were recruited in a business market research online panel. Table 5 gives an overview about the distribution of the number of the completed choice tasks by type of experiment and sample. Sufficient data for all five types of SP experiments was collected. Only the reliability experiments for business trips with the plane do not contain many cases. As some of the long-distance flights of the nonbusiness sample were also business trips the number increased to 10 person and 80 completed SPs. However any disaggregated modelling for this trip purpose, mode and SP experiment has to be done carefully as it not always led to reasonable results (see Dubernet 2019, for a more detailed analyses).    Figure 8 shows the response rates by waves, sample and medium. As mentioned above respondents in the business sample have an overall higher response rate (except in the pre-test).
The aimed-for number of participants in the business study was already reached after wave six so that in the seventh wave only non-business SP game sets were sent out. In the non-business survey respondents were free to choose between completing the questionnaire online or as paper-and-pencil. From almost 3200 respondents who indicated their willingness to participate in the SP experiments only 5.6% or 186 person in total chose to complete the questionnaire online. Hence, the response rate of the online nonbusiness sample varies more than the other samples' rates as its sample is much smaller. In any case, the response rates for that medium were the lowest.
To complete the full online SP questionnaire respondents in the business sample needed between 1 min 18 s and 43 min 48 s and on average 9 min 24 s. Participants in the non-business survey needed more time, taking between 5 min 6 s and 58 min and on average 17 min to fill in the survey questionnaire. As the long-term experiments were only given to the respondents in the non-business sample they had to answer to an additional block of 8 different choice situations. Nevertheless the absolute number of respondents of the non-business online SP survey is about ten times smaller than the absolute number of participants in the business online access panel.
Within two weeks after participating in the CATI respondents received the SP games and the overall time it took them to send back the questionnaires was recorded. Those who did not answer within 21 days after the send-out received a reminder by that time. Figure 9 shows that the reminder had only little impact on the two online-surveys but did so on the paper pencil one.
Responses to the two online samples were faster than to the paper pencil survey. Over half of the respondents of the online business sample answered within 2 days. After 1 week 80% of respondents had already completed the SP games. The reminder had almost no effect as responses did not substantially increase after it was sent out. In the non-business sample half of the respondents took a maximum of 4 days to answer the SPs. Most of the respondents (80%) answered within 14 days. The reminder increased responses by about 2%. Sending questionnaire by post and back takes more time in general than answering an online survey. First completed SP arrived after 5 days and half of the questionnaires were sent back within 2 weeks. The reminder, which also included the full questionnaire, sent after 21 days motivated an increase between 15 and 20% additional responses after an additional time interval of about 4 days. 80% of the questionnaires arrived within 28 days. So it took the respondent almost twice as long to complete the written questionnaire, however, not including the additional time for sending it through post. The last questionnaire arrived after 151 days.
Besides experience from the pre-test the main study confirmed that all three types of reliability presentation delivered equally high response rates (see Fig. 10). Between the presentation types no clear pattern is recognizable. In the written paper pencil non-business survey the reliability presentation type 2 got most responses whereas respondents in the non-business online survey responded best to type three presentation of reliability. Type 1 turned out to gain most responses in the online business survey whereas in total the difference between type 3 and type 1 is about 7%. If one has to decide between the different presentation types it seems reasonable to prefer a graphical presentation of reliability as it is easier for respondents to understand the experiment. Tseng et al. (2009) found an opposite result since some respondents have difficulties reading the presented graphs correctly.

Non-traders
Non-traders in a stated preference survey are respondents who always choose the same alternative among their choice sets regardless of the available alternatives' attributes. This may have several reasons, one of which is the presence of very strong preference in the In total 34% of the respondents never varied their choices in the mode choice experiments (see Fig. 11). Differentiated by mode, it can be seen that the share of non-traders is higher for car user and persons using non-motorised transport whereas public transport user are more willing to vary their choices. Non-trading does not necessarily imply inconsistent responses. Traveling that is not linked to going to a destination but to traveling per se may explain a part of travel. Non-motorized trips for example are often performed for their own sake e.g. going for a walk. Public transport on the other hand often implies trading fares and schedules, and cannot be an automatic behaviour such as using individual transport may be. Hence, the relevant variables, such as trip distance and purpose and the availability of mobility tools were included in the modelling process rather than excluding non-traders.
In the long-term workplace choice experiment the share of non-traders was about 43% with 14% always choosing the new workplace. In the residential location choice experiment the share of non-traders was a bit higher with 51% with only 7% always choosing the

Fig. 11
Share of non-traders by mode in the mode choice experiments new residential alternative. Overall the share of non-traders was in the expected range and for some modes even lower than expected (Fröhlich et al. 2013). For the reasons described above, the unlabelled experiments (SP 2 and SP 3) include far less non-traders. Overall only 26 respondents (0.1%) always chose the left or right alternative. With 22 respondents non-trading occurred mostly in the route choice and departure time experiments (SP3). Over the two labelled experiments 12 respondents always chose the left and 14 always the right alternative. The car route choice experiments had included overall more non-traders (18 respondents) than the public transport choice experiments.

Lexicographic behaviour
Lexicographic behaviour occurs when over the course of the experiment the respondent evaluates the choice alternatives on a basis of a subset of attributes for example by always choosing the cheapest or fastest alternative (Hess et al. 2010). The authors state in the same paper that true lexicographic behaviour is hard to detect especially in complex choice situations with multiple attributes as in this survey. For example in a choice situation where the respondent always chooses the cheapest alternative and not the more expensive one in a certain situation could also be due to more transfers during the trip. Also it is sometimes hard to distinguish between lexicographic and non-trading behaviour (Hess et al. 2010). Nevertheless it is interesting to see how often the respondents decided to always choose the fastest or cheapest alternative where in this case the five different types of choice experiments can be even more revealing.
In the mode choice experiments (SP1) which contains data of in total 2062 respondents 390 respondents (19%) always chose the fastest option which was offered to them. 18% (376) always chose the cheapest option. 13% (264) always chose the alternative with the smallest share of delayed trips. As mentioned above, especially in mode choice experiments it is extremely hard to detect real lexicographic behaviour. For example a person who always chooses the bike alternative could be either a non-trader with a general preference for taking the bike or could really chose the bike because it is the cheapest option with zero costs.
In the car route choice experiments 47% of the respondents (396 from 841) always chose the fastest in-vehicle time and 26% (224) of the respondents the fastest overall travel time. 23% (195) always decided for the cheapest alternative. In the public transport experiments 19% (60 from 316 respondents) only chose the cheapest option and 35% (112) the fastest. For the route choice and reliability experiments (SP3) the shares are lower. As these experiments contain even more variables (reliability related) this might be another sign that no true lexicographic behaviour can be identified.
Again in SP4 and SP5 it can not be distinguished if a respondent chooses his current situation or a lexicographic attribute for example always the highest salary (40%) in SP 4 or the lowest commute time (41%) or rent (56%) in SP5.
Nevertheless even if it is not possible to see true lexicographic behaviour the results give us a general insight for the importance and dominance of certain attributes in the choice set. Also the findings match with the ones of the variable importance questions (section "Variable importance"). Furthermore, they validate the trade-offs generated through experimental design as most of the respondents did not always choose only one certain low or high attribute of a choice situation.

Item non-response
Another important issues for a survey is item non-response, which means that respondents do not answer to a particular item among the questions. In social sciences these are often sensitive private information like income or education. The German value of time and value of reliability study showed only minor problems with item non-response, generally the shares of missing values were less 2% or occurred for less important variables. The questions about being an academic, number of jobs, children living in the household and the profession had a share of missing values higher than 20%, but where more or less covered by other questions, for example, by education in general, the number of person living in the household of a respondent, or the type of employment (all < 1% missing values). The variable household income which was essential for modelling and usually is also one of the more sensitive questions showed an item non-response rate of only 12.9%. A possible solution to discover patterns or groups behind the non-response at a later stage is to estimate a separate coefficient for missing income. All other variables in the survey not shown in Table 6 had item nonresponse rates of less than 2%.

Descriptive analysis
In this section we present an overview of the collected data using basic descriptive analyses. The same socio-demographic attributes (section "Socio-demographic attributes") were collected in both samples whereas the data of the reference trip (section "Reference trip") differ slightly between the samples. For validation the sample was compared with other German nationwide travel behaviour survey data-the Mobilität in Deutschland 2008 (MiD 2008) (Follmer et al. 2010). The collected SP data (section "Short-term SP attributes") is again the same for both samples only differing by trip purpose. Table 7 shows the categorical distribution (number of cases) and the percentage share of the socio-demographic attributes. Both columns show the unweighted number of person of each sample.

Socio-demographic attributes
The total number of cases in Table 7 differs between the variables as not all 3243 respondents answered all of the socio-demographic questions. However, only the valid percentage share of the levels are shown.   The education category Hauptschule represents the lower secondary education with 8 (Volksschule) to 10 (Hauptschule) school years. Realschule represents secondary education with 10 school years usually followed by an apprenticeship. The category Abitur includes the German Abitur or Allgemeine Hochschulreife and Fachabitur which allows the pupil to enter higher education either at a university or at a Fachhochschule (technical college) with a Fachabitur.
The lower bound of the income categories shown is always above the printed value and the upper bound vice versa, e.g. income class 1000-1500 € represents an income above 1000 and below 1500 € per month.
In the collected sample older higher educated male respondents working full-time and owning a car are over-represented compared to the population average (Statistisches Bundesamt 2014; Follmer et al. 2010). One reason is the over-sampling of business trips but often this socio economic group is also more likley to participate in surveys (e.g. Follmer et al. 2010). The data set contains three weighting variables to achieve representativeness if needed. First, it contains a person weight which can be used to match the weighted numbers of the MID on the following dimension: age, gender, education, employment, region, driver's license and car availability (PFAKT). The second weight variable (WFAKT) is based on the person weight and additionally contains a factor for trip frequency, trip length, trip purpose and the main mode of transport. The third weight (WFAKT2) additionally includes a weight factor for business trips which is based on the representative CATI sample, regional trip distribution and trip frequency. For weighting the sample it is recommended to use the the weight variable WFAKT2 as it contains all weighting factors.

Reference trip
The following Table 8 shows the same variables as described above for the reported reference trip of the two samples. The parameters for the trip purposes commute, shopping and leisure and long-distance are derived from the non-business sample. The business sample provides the derived parameters for business trips. The questions differed slightly for longdistance and business trips which results in the different or fewer variables shown in the table. The variables show that the collected data lies within the expected, plausible range compared to other SP surveys which have been conducted at the institute.
The information on long-distance trips was collected from all respondents whereas the total number of cases of the daily trips shows the total number of assigned reference trips.     Branch visit  111  12  ----Conference  90  10  ----Exhibition  63  7  The variables for leisure trips in the table are only for reported trips under 50 km distance. Leisure trips over 50 km distance were recorded as long-distance trips. However, even if the selection of the reference trip were randomized it would have controlled for a more or less even distribution of respondents across each trip purpose. The numbers of cases within the trip purpose differ because again not all questions were answered at all or with "I don't know" or "I don't want to say". If a for the SP part elementary variable value was missing in the RP data set a mean value was used in the SP experiments.   Figure 12 shows the travel distance distribution by type of the short-term SP experiment. Logically the trip distance for flight trips is higher than the distance for car and public transport trips. The SP experiment type which shows the shortest trip distances are the mode choice games as only these include the non-motorized transport modes bike and walk. Figure 13 shows the trip purpose distribution by experiment type for the whole SP sample containing all non-business and business trips. "Business trips" in the total sample are all trips from the business sample and additionally a small number of trips from the non-business sample were respondents could also state "business trip" as purpose of their reported trip. A further differentiation of these trips into the different kinds of travelling for business was not intended. For the transport mode flight only the purposes leisure and business were surveyed. As expected commute, shopping and educational trips are the ones dominating in the mode choice experiments. The distribution of the chosen alternative in the mode choice experiments for every trip purpose is shown in Fig. 14. Again the results are as expected. Car is the dominant mode for all trip purposes except educational trips. The share of taking the car is especially high for business trips. Walking as a mode has the highest share for shopping and educational trips. The bike is often chosen for commute and leisure trips. Public transport has the highest share for commute trips especially for educational reasons.

Long-term SP attributes
The two following tables (Tables 9, 10) show the parameters of the chosen alternative variables of the single choice experiments. In this description chosen alternative means the respondent's choice regarding workplace or residential location. The number of cases shows how often respondents chose an alternative. The parameters show that the collected data lies within the expected, plausible range.

Variable importance
At the end of each block all respondents had the opportunity to mark the impact of the attributes on their decision in the different choice situations. The respondents were asked to tick the attributes which they thought were rather unimportant to them or did not influence their choice at all. They could also state that they took all variables into consideration when choosing the alternative. So all the variables they did not choose were coded as important or if they stated all attributes were equally important to them all the variables were also coded as important. Table 11 gives an overview of the overall number of respondents who answered that question either by selecting unimportant variables or by stating that all variables where respondents. About one quarter of the respondents stated that all attributes of the long-term experiments were equally important to them. As already described in section "Design of the short-term experiments", the respondents based on the mode of their reference trip were assigned different mode combination in their mode choice SP games (see Table 2 in section "Design of the short-term experiments"). The two modes car and public transport were available in all SP1 experiments. The third mode was either walk (combination 1), bike (combination 2), coach (combination 3) or plane (combination 4). For the variable importance, SP1 had to be distinguished into these 4 combinations as the respondents did not see the attributes of the modes not included. Between 8 and 22% of the respondents stated that all attributes were equally important to them. Especially in the mode choice experiments with a plane alternative not all attributes were important for the respondents. On the other hand, this is also the combination with the least answers. Figure 15 shows the importance of the mode choice attributes of the modes car, public transport and walk or bike. Figure 16 shows the importance of the mode choice attributes of the modes car, public transport and coach or plane. Throughout all four combinations car travel time and public transport travel costs are among the three most important attributes. Public transport travel costs in general seemed to be more important than car travel costs. The car attributes except travel time were less important to the respondents than the public transport attributes. The travel times of the two slow modes walk and bike were rather unimportant to the respondents. In the two experiments where a slow mode was included the delay of the public transport alternative was rather important. In the two experiments including only motorized modes either travel time (combination 3 with the coach alternative) or travel cost (combination 4 with the plane alternative) were the most important attributes for decision of the respondents.
In the public transport route choice experiments travel time followed by the number of transfers was more important than travel costs. The access and egress time seemed to be least important. However the number of cases is very low (Fig. 17). Figure 18 shows the importance of the variables of the long-term experiments. As it could already be seen in analyses of the data (Dubernet et al. 2018 to the respondents of SP 4. Car commute travel time and cost is important to respondents in contrast to public transport time and cost which is another indication for the main mode of transport for commute in the data. The least important variable in the workplace games is the budget the respondent is responsible for. The attributes of the residence are more important than travel cost and time for the respondents. The exterior and environment of the residence seem to be even more important than the monthly rent or mortgage and the size of the apartment or house.

Attitudes
As described above the questionnaires included additional attitudinal questions on risk acceptance (eight questions), environmental protection (four questions) and seeking for variety in daily life (six questions). Respondents could state their approval or disapproval on a five-point-Liekert-scale. The levels ranged from strong disagreement to neutral to strong agreement with no opt-out. Figure 19 shows percentage distribution of respondents agreement or disagreement to the questions. The percentage on the left side of the figure shows the share of respondents who disagreed with the statement (sum of strongly disagree and disagree). The percentage in the middle shows the share of neither agreement or disagreement (neutral) and the one on the right agreement (sum of agree and strongly agree). Most of the respondents agree with wearing a seat belt and that the environment needs to be protected. With their answers they present themselves as mostly open and interested in fellow humans and new things but the majority of the respondents seems to be more risk averse than venturesome.
For further modelling the attitudinal questions can be used to, for example, assign the respondents to different behavioural groups by using a principal component or factor analysis. The statements the respondents discuss with the friends when they disagree (einst_01) and that they respresent their opinion on unpopular topics and social gatherings (einst_06) are linear functions of one another which has to be considered for further factor analysis (but not for PCA).

Conclusion
In the German value of time and value of reliability study new survey methods were applied for estimating new values of time and for the first time values of reliability to support the Federal Transport Plan 2030. This first estimate required special accuracy in the data collection process. Using a combined RP and SP survey reflects the state of the art of transport research. This paper presented the experiences made during data collection and preparation for further model estimations. It was shown that the collected data set holds rich information with a promising amount of cases suitable for the calculation of short as well as long-term willingness to pay values. Each type of SP experiment includes enough cases to estimate single models per experiment as well as a joint model with all short-term games. The response rates were in the expected range, although especially the business online sample exceeded expectations. All forms of reliability presentation gained similar response rates and could be used in further surveys, although it seems to be easier for respondents to understand the experiment with a graphical display. Non-traders were in the expected range. The item non-response was very low for most of the variables.
The German value of time and value of reliability study was conducted in 2012 and 2013. In addition to the project report (Axhausen et al. 2015a), research work based on this data was published on various occasions covering different aspects of the survey since then. The modelling results of this work, inter alia, highlighted limitations of the survey design which should be reconsidered for future surveys: • estimates for the hypothetical "coach" mode were not robust, highlighting the difficulty to include modes the respondents are not yet familiar with (Axhausen et al. 2015a; Dubernet 2019) • air travel as a mode also presented modelling challenges. It is likely that this mode follows different decision processes than the others (in terms of planning horizon for instance) (Dubernet 2019). • reference shopping trips were often very short, leading to very small variations in the stated choice experiments. A way to mitigate this effect in future studies might be to put a lower bound (in minutes) on the variations for the SP (Dubernet 2019). • the long-term experiments included a large number of attributes for realism. This, however, made estimation of the VOT challenging, as the effect of those attributes was higher than the effect of time or cost of travel for the level of variation present in the dataset (Dubernet et al. 2018;Dubernet 2019). • in the long-term cases, where the decision was always to keep the status quo or change to a new situation, a strong difference was observed in terms of gains versus losses (e.g. of salary). This makes the definition of a VOT in this case difficult, and future experiments should take this fact into account in the design (Dubernet et al. 2018;Dubernet 2019).
Some aspects could be improved based on the literature. For instance, the valuation of business travel time savings demand more attributes for adequate estimation as travel time can be used for i.e. working (see Hensher 1977;Wardman et al. 2013Wardman et al. , 2015. This effect might be important for other purposes as well (Kouwenhoven and de Jong 2018). Additionally the methods and approaches should be developed further with every new estimation.
Nevertheless this has been the first official national value of time and reliability study collecting valuable data on transport behaviour and valuation in Germany.