How does travel satisfaction sum up? An exploratory analysis in decomposing the door-to-door experience for multimodal trips

Understanding how satisfaction with individual trip legs aggregates to the overall travel experience for different types of trips will enable the identification of the trip legs that are most impactful. For this purpose we analyze data on retrospective evaluations of entire multi-modal trip experiences and satisfaction with individual trip legs. We formulate and describe alternative aggregation rules and underpin them in theory and previous empirical findings. The results of a series of regression models show that for a large number of multi-modal trip configurations normative rules can better reproduce overall travel satisfaction than heuristic rules. This indicates that all trip legs need to be considered when evaluating the overall travel experience, especially for trips legs involving waiting and/or transferring time. In particular, weighting satisfaction with individual trip legs with perceived trip leg durations yielded the best predictor of overall travel satisfaction. No evidence for a disproportional effect of the last or most exceptional part of the trip was found. However, a larger dataset would be needed in order to replicate this work and potentially generalize the results. This research contributes to the literature on combining multi-episodic experiences and provides novel empirical evidence in the transport domain.


Introduction
Increasing the modal share of transit services is of large importance to cities worldwide where there are: growing mobility needs to reach workplaces, services and leisure activities, an aspiration for achieving a healthier lifestyle, and of attaining a socially equalitarian and environmentally sustainable society. Transit's share can be increased by improving travelers' perception of the quality of the services offered. Transit traveler's satisfaction can be defined as the overall level of fulfillment with traveler's expectations (Tyrinopoulos and Antoniou 2008). The existing link between transit traveler's satisfaction, ridership and loyalty prove the relevance of improving overall trip satisfaction (e.g. Van Lierop and El-Geneidy 2016;Nathanail 2008;Fellesson et al. 2009). There is a large body of literature on the impact of main travel mode's service attributes on overall satisfaction. However, a large number of transit trips consists of multiple legs; including access, main and egress trip legs. The composition of these door-to-door trips along with their added complexity, including different travel modes and duration has been largely overlooked despite their potential impact on the overall trip satisfaction. The increasing emphasis on integrated transport planning and the policy objective to create seamless door-to-door mobility for all (European Commission 2011) calls for the investigation and deeper understanding of the determinants of travelers' multimodal travel satisfaction.
In order to address the aforementioned limitations, this study aims to answer the question "how do travelers combine memories of a series of pleasant and unpleasant episodes of their multi-leg trips to construct the evaluation of their overall experience?". Therefore the objective is to investigate the relative importance of satisfaction with access, main and egress legs for the whole travel experience of the combined trip.
A trip can be defined as a continuous sequence of legs from an origin to a destination and with a single main purpose (Axhausen 2007). A trip leg is a continuous movement with one mode of transport which includes any waiting times (and transfer times, if applicable) immediately before or during that movement. Thus, multi-modal door-to-door trips have an origin and a destination, are composed of two or more modes, consist of two or more trip legs of which one is identified as the main trip leg, and at least one as an access or egress leg.
We focus on exploring whether trip leg's relative weight on the overall travel experience varies as a function of trip composition (trips with access, main and egress stages, and with main and either access or egress), trip complexity (two, three and four or more legs), the duration of the trip legs (both waiting and in-vehicle/walking times), in regard to different combinations of transit modes (metro, train and bus), for trips with and without transfers and for different trip purposes (commuting and non-commuting).
Using a modest sample size, this study deepens and expands the very limited knowledge on door-to-door trips (Susilo and Cats 2014;Ettema et al. 2016) and on the aggregation of retrospective multi-episode experiences in the transport field. To the best of our knowledge, only Suzuki et al. (2014) explored this topic when they tested the goodness of fit of a number of rules for the last normal trip (i.e. without any disruption). Their analysis was limited to commuters and did not distinguish between in-vehicle and waiting times.
Understanding how satisfaction with individual trip legs aggregates to the overall travel experience for different types of trips will enable stakeholders to identify which particular trip segment(s) need(s) to be improved. Therefore, it will allow practitioners to better evaluate the different parts of a trip and cater for travelers' needs by supporting the allocation of resources and prioritization of measures. In addition, this research will allow making a more fair assessment of public transport operators' roles in contributing to overall travel satisfaction when different stages of the trip are provided by different operators.
In order to address the abovementioned issues, we first provide a review on how travel satisfaction is conceptualized, measured and previous findings on its underlying determinants in "Literature review". "Satisfaction aggregation rules" details the satisfaction aggregation measures used in this study. This is followed by a description of survey design and independent variables in "Survey description". Results of the analysis are then provided in "Analysis and results" including the estimation of: the overall aggregated experience, of multivariate travel satisfaction models, of correlation analyses between overall and leg specific satisfaction evaluations and of generalized aggregation models. In "Discussion" and "Conclusion", discussion and conclusions are presented, respectively.

Literature review
All trip legs, as part of a multi-episode experience, are believed to contribute to the overall trip experience. Susilo and Cats (2014) argued that access and egress trip legs influence satisfaction with the main leg, which is strongly correlated with overall satisfaction, and thus might be indirectly reflected through it. Similarly, Ettema et al. (2016) found that there was a significant amount of variance in overall satisfaction that could not be explained by the correlation with the satisfaction with the main mode only.
The level of accessibility to/from the transit mode, which has direct consequences for the access and egress legs, is believed to impact on the overall evaluation of the multimodal trip. In accordance with the latter, Givoni and Rietveld (2007) found that the quality of the station and the access and egress facilities influence the overall perception of the trip. More specifically, the availability of parking at the workplace, proximity to transit stations and the availability of shopping and entertainment activities in the surrounding areas are regarded to positively influence one's perceived accessibility (Woldeamanuel and Cyganski 2011).
A fundamental component of multi-modal trips are transfers, and the associated transferring times. Satisfaction with trips with a large number of legs such as metro, train and trips with 4 or more legs is expected to depend on satisfaction with the transfer experience. Generally, transfers have a negative effect on the overall travel satisfaction (De Abreu e Silva and Bazrafshan 2013) and thus reduce demand for travel (Lythgoe and Wardman 2002).
The main trip leg's service attributes determining transit overall satisfaction consist of, for example, station environment, ease of transfer and on-board comfort (Susilo and Cats 2014); operational attributes such as reliability, travel speed and frequency (Mouwen 2015;Cats et al. 2015); and customer interface . Tyrinopoulos and Antoniou (2008) found that besides service frequency, vehicle cleanliness, waiting conditions, transfer distance and network coverage were the most important attributes influencing the quality of the service with PT in different Greek cities. Furthermore, these determinants may vary depending on user, trip and mode characteristics. For example, researchers found that feeling safe while waiting was more important for women than men (Susilo and Cats 2014) while ride comfort was found important for students (Abenoza et al. 2017) and the elderly (Dell'Olio et al. 2011). Considering transport modes, Iseki and Smart (2012) found that information was of relevance for heavy-rail users while accessibility in and to the station together with reliability of the service were more significant to bus and light rail users.
Trip complexity is defined in its broad sense as the number of trip legs and travel modes that the trip is comprised of. Susilo and Cats (2014) found that trips which consist of a larger number of legs are associated with a lower overall satisfaction. Similarly, the characteristics of the main travel mode of the trip influence travel satisfaction. This is true not only for broad travel mode categories (soft, private and transit modes) but also amongst different transit modes which have distinct characteristics in terms of comfort, coverage, punctuality or frequency. Overall satisfaction ratings by mode vary from study to study and were found highest for: rail (Ory and Mokhtarian 2005;St-Louis et al. 2014; for commuting trips), metro (e.g. Cao et al. 2015) or bus based trips (Mouwen 2015).
Different trip purposes may influence traveler´s perception. While commuting trips are repetitive, exposed to rush hour crowding and dominated by temporal constraints, noncommuting trips are less frequent and are characterized by a more flexible travel time budget (Li 2003). Some researchers postulate that frequent travelers (Abenoza et al. 2017) and those with a seasonal ticket (and thus it is assumed frequent users) are overall more satisfied (Woldeamanuel and Cygansky 2011) because their trips are more stable and less exposed to salient events (Suzuki et al. 2014). In contrast, other findings indicate that regular users are more exposed to negative critical incidents than occasional users and thereby their satisfaction will be lower (Van't Hart 2012).
Travel time perception varies as a result of the activities carried out while traveling or waiting. Mokhtarian et al. (2015) found that listening to music/radio diminishes mental fatigue and increases trip pleasantness. In addition, they indicated that talking to others increases the probabilities of perceiving a trip as pleasant. Ettema et al. (2012) also indicated a positive effect of talking to others on satisfaction with travel scale for commute trips from work. In the same vein, working or studying on-board increases the utility of the invehicle time (Susilo et al. 2012) which may in turn be reflected in higher travel evaluations.
Traveler's perceptions can be conceptualized in terms of utilities. Kahneman et al. (1997) distinguishes between instant utility and remembered utility. The first is the basic unit of experience that is aggregated over time to form a total utility while the second is defined as the retrospective evaluation about the pleasure and pain associated with a past experiences. In turn, experienced utility is the satisfaction with the outcome of a choice whereas decision utility is "the degree to which the outcome is desired when the choice is made" (Suzuki et al. 2014). The results of a study (Sprumont et al. 2017) indicated that decision utility and stated satisfaction, as proxy of remembered utility, were significantly correlated.
The present work deals with retrospective evaluations of an experience, the multi-modal trip. These retrospective evaluations are believed to encapsulate what is learned from the experience and may affect future behavior and decisions taken by the traveler; including recommendation, complaints and service repurchase (Carmon and Kahneman 1996). In addition, these evaluations might be affected by travelers' feelings at the time of taking the survey. Ariely and Carmon (2000) theorized that evaluations of past events are affected by feelings at the time of evaluation. Feelings have been operationalized in some travel surveys in the affective measures, i.e. positive deactivation and activation, of the satisfaction with travel scale, which was conceptualized by Ettema et al. (2011).
The mental process of arriving at an overall evaluation of a certain experience remains largely unknown. While it might be assumed that the overall experience is a simple average of the experience's individual components, a large number of studies in different domains of experience provide contrary evidence (Suzuki et al. 2014;Carmon and Kahneman 1996;Fredrickson and Kahneman 1993;Miron-Shatz 2009;Bruine de Bruin 2005;Ariely 1998). It is therefore presumed that memory is not constructed in a continuous fashion. Instead, only selected key aspects of the experience are remembered (Ariely 1998). To the best of the authors' knowledge, Suzuki et al. (2014) conducted the only study in the transport field insofar. They found that the overall trip satisfaction of commuters can be modelled as a weighted average of the satisfaction with individual legs, where legs were weighted by their respective duration.
Taking into consideration all the aforementioned aspects, this study hypothesizes that accounting for travel behavioral phenomena such as the perception of different legs, modes, travel time components, as well as psychological effects such as recency and salience can better explain overall travel satisfaction than assuming a simple averaging rule. In the following, we formulate and describe alternative aggregation rules and underpin them in theory and previous empirical findings.

Satisfaction aggregation rules
There are various approaches for defining how door-to-door trips' satisfaction is aggregated in relation to each trip leg satisfaction. These approaches can be categorized into normative (equal average, moving duration weighted, complex duration weighted) and heuristic rules (peak, end, peak-and-end, serial position). Normative rules (rules 1-3), hypothesize that travelers consider all trip legs in creating an overall trip evaluation. The main difference between them stems from the strength of the influence that each of the trip legs exert on the composition of the overall trip evaluation. Equal-weighted average rules assume that all trip legs have the same weight. In turn, duration weighted moving (rule 2) and duration weighted complex (rule 3) hypothesize that the magnitude of the influence of each trip leg is given by their duration. Rule 2 considers trip leg's IVT/walking time only while rule 3 add to this the waiting and/or transferring times.
In contrast, Heuristic rules (rules 4-9), hypothesize that travelers consider one or two trip legs exclusively in composing an overall trip evaluation. The trip legs considered are determined based on theory from the psychological domain. These rules postulate that travelers only recall certain legs of their trips (i.e. last, most salient events) when composing their overall trip evaluation, thus neglecting some trip legs.
A trip j is defined as a sequence of trip legs j = l j,1 , l j,2 , … , l j,|j| . The corresponding travel satisfaction values are s j for the overall trip and s l j,i i = 1, … , |j| for the trip legs.

Equal-weighted averaging rule
This rule is based on the idea that a simple average of all trip legs can provide a good estimate for the whole trip satisfaction.

Duration-weighted averaging rules
Duration-weighted moving (DWM), considers in-vehicle/walking times only. Therefore these rules neglect the importance of waiting times in constructing an overall evaluation metric for the trip. Miron-Shatz (2009) and Suzuki et al. (2014) in sociology and transport (1) s j = ∑ i∈j s l j,i �j� 1 3 fields respectively found that the summation of instant utilities made by means of the duration weighted rule provided the best fit.
where t m l j,i is in-vehicle/walking time for leg l j,i .
where t w l j,i is waiting travel time for leg l j,i and is the multiplier applied to each waiting time as a penalty. Duration-weighted Complex can vary with respect to the weight or penalty assigned to the waiting times. Queue waiting negatively impacts overall perception of the trip (Smidts and Pruyn 1994), however the magnitude of the influence is still controversial. Hine et al. (2001) found that traveler's perception of waiting times varied amongst travel modes being higher for train than for buses. Some other authors range traveler's perception of waiting times between 1.52 and 4.4 times longer than in-vehicle time (Mishalani et al. 2006;Dziekan and Kottenhoff 2007). Therefore waiting time weights, ω, of 1, 2, 3 and 4 are specified in this work.

End rule
This rule tests whether the last trip experience, the one associated with the last trip leg, alone explains overall travelers' evaluation. Previous studies report contradictory results concerning whether end-effect exists and have a significant impact on the overall experience. For example, Finn (2010) in an experiment involving a learning experience, Diener et al. (2001) in an experiment that explored how the ending of a life influenced the desirability of that life; and Carmon and Kahneman (1996) in an experiment related to queuing experiences provided evidence supporting their existence. However, Miron-Shatz (2009) for common activities undertaken during a day, and Tully and Meyvis (2016) for sound stimuli and an obstacle race, did not find evidence for the existence of end effects.
Hence, overall satisfaction is determined by the last trip leg.

Serial position rule
This rule asserts that the sequential order in which an experience is presented affects how it is evaluated. The first and last episodes of an experience, primacy and recency effects, are the better remembered parts of an experience and the ones that solely impact the overall evaluation of the experience. These effects were proven by Page and Page (2010) studying the evaluation of a TV show; by Miller and Krosnick (1998) for the impact of candidate name order on elections outcomes; and by Bruin de Bruin (2005) when evaluating end-ofsequence and step-by-step evaluation procedures used in Eurovision song contest.

Peak rules
This rule examines whether the most salient experience is the most representative of the whole trip. An exceptional episode can be caused by a wide variety of events such as: extreme on-board crowding, severe weather conditions which make the waiting time at a stop unpleasant, being able to find a seat in rush hour, feeling unsafe or experiencing longer or shorter travel times than expected. The rule is defined as the satisfaction with the trip leg with the largest deviation from the average overall satisfaction: This salient event is the one employed by Suzuki et al. (2014).
The deviation can be either positive or negative depending on whether the episode is pleasant or unpleasant (Fredrickson and Kahneman 1993). The latter contradicts the findings made by Friman et al. (2001) who concluded that negative episodes, or as they call them negative critical incidents, have a higher impact on overall and attribute specific cumulative satisfaction, than positive ones.
Given that satisfaction evaluations are partly based on expectations (Parasuraman et al. 1991;Grönroos 1994) and that travel time is one of the main components for the evaluation of a trip (Ettema et al. 2016), deviation from expected travel time is the salient event presented in the next rule: where t m l j,i and t w l j,i are the expected in-vehicle or walking time (moving time) and waiting time, respectively, on a given trip leg. Given that both, moving and waiting times might be perceived very negatively when delays are present (Carrel et al. 2015). The deviation can be either positive or negative. If no deviation from the expected time is observed then the equal-weighted averaging rule is applied.

Peak-end rules
This rule asserts that the recollection of a past overall experience is best aggregated by averaging only two distinct moments; the most salient event (peak) and the latest experience (end). Therefore, the effects of the duration of an experience upon retrospective evaluation are neglected (Fredrickson and Kahneman 1993;Ariely and Loewenstein 2000).
A large amount of empirical evidence in different fields supports the peak-end rule. For example, Fredrickson and Kahneman (1993) demonstrated it for film watching, Redelmeier and Kahneman (1996) for painful medical practices, and Baumgartner et al. (1997) for rating film and television commercials. However, Miron-Shatz (2009) did not reach the same conclusions for day-long experiences. A peak_max-end rule is formed by the combination Eqs. 4 and 6: And the peak_exp-end is formed by the combination of Eqs. 4 and 7: The success of either of these rules may have the following implications for stakeholders. A better goodness-of-fit of the equal-weighted averaging rule may signify that improvements should be equally distributed over all parts of the travel experience. The success of the duration-weighted moving rule may imply that travel experience aspects of the modes, when in movement, should be prioritized. Travel experience aspects may include instrumental and non-instrumental variables. The implications of a successful durationweighted complex rule might be interpreted as the largest importance given to trip legs comprising waiting and/or transferring times, and thus of those involving transit modes. The higher the waiting time penalty the higher the importance attached to transit legs. In this case, the corresponding stakeholders should focus on improving interchange facilities and reducing and uplifting the waiting time perception. The success of the end rule may suggest that stakeholders need to care only about ensuring that travelers have a pleasant and satisfactory experience in the final leg of their door-to-door trip. In a typical 3-leg trip this implies focusing on last-mile facilities. In line with the former, the success of the serial position rule may suggest that first-and last-mile facilities should be prioritized. Travel time reliability and adhering to expected travel times should be targeted if peak-exp travel time (rule 9) is found to better explain overall travel satisfaction. The negative effect that disruptions and negative salient events have on overall satisfaction may indicate the need to focus on them if peak-max satisfaction rule obtains the highest goodness-of-fit.

Example
The abovementioned satisfaction aggregation rules are illustrated using an example. Figure 1 displays an example of a door-to-door trip including all the relevant data needed to aggregate trip leg satisfactions into overall travel satisfaction evaluations. The door-todoor trip received a reported overall satisfaction of 4, in a five-point likert scale (Fig. 1a). The trip comprises three trip legs and three stages (access, main and egress), each of them shown with its respective reported satisfaction value in the labels (2, 5 and 3 for access, main and egress respectively). The passenger walks from and to bus stops. The perceived travel time of the entire trip is 36 min including 4 waiting time minutes, while traveler expected travel time (T.T) for this trip was 31 min. The percentage time difference shows the percentage difference of the perceived travel time compared to the expected travel time. The latter is calculated at a trip leg level and as specified in Eq. 7. The percentage satisfaction dev. is the ratio of the trip leg satisfaction value to the simple average of all trip legs' satisfaction values, as specified in Eq. 6.

Survey description
We analyze data collected in a survey that was conducted as part of the METPEX (A Measurement Tool to determine the quality of the Passenger EXperience) FP7 EU project,  (Susilo and Cats 2014;METPEX 2014). The survey was simultaneously carried out in spring 2013 in eight European cities: Bucharest (Romania), Coventry (United Kingdom), Dublin (Ireland), Rome (Italy), Stockholm (Sweden), Turin (Italy), Valencia (Spain) and Vilnius (Lithuania). These sites represent a diverse combination of city sizes, climates, transport infrastructures, urban fabric, culture and quality of the public transport service delivered.
The recruitment strategy included both on-site and retrospective surveys. Therefore depending on the nature of the survey, there were self-administered interviews distributed by email, phone, social network and online; and assisted interviews where travelers were approached on main streets and squares, on-board, at stations, in shopping centres and universities. Of the various on-site recruitment locations, on-board vehicles seemed to have worked best, mainly because the respondents were not in a position to rush somewhere else. The length of the whole questionnaire was between 20 and 30 min. In some of the data collection sites (i.e. Coventry and Stockholm) incentives to take the survey such as money or a cinema ticket were offered.
Two versions (on-site and retrospective) of a revealed preference questionnaire were translated into local languages. The on-site version focused on the trip currently being undertaken while the retrospective version focused on the main trip of the given day. The questionnaire was designed to address the entire door-to-door trip and different travel modes, and thus facilitate the analysis of overall travel satisfaction and how it varies as a function of the satisfaction with individual attributes, travel characteristics, trip legs and service factors.
The questionnaire consisted of five sections: (a) Traveler information-socio-demographic, mobility behavior and mode usage; (b) Attitudes-travel preferences and travelrelated opinions; (c) Travel satisfaction with overall trip, individual trip legs and respective service attributes; (d) Underlying travel aspects-familiarity, adaptation and past experience, and; (e) Contextual variables-including trip purpose, weather conditions, subjective well-being. For more detailed information on the survey design, see Susilo et al. (2015).
In order to study how overall satisfaction relates to satisfaction with individual trip legs, we employ the following variables collected in sections (a), (c) and (e): • Leg-specific and overall satisfaction with the trip: Likert scale measurement from 1 (very dissatisfied) to 5 (very satisfied); • Trip purpose: categorized into commuting to work or education, business, shopping, leisure or other; • Transport modes used in each trip-leg: including walk, bicycle, car as driver and as passenger, motorcycle, underground, train, tram/trolleybus, bus, special transport; • Perceived duration of waiting times and on-board/walking times: given in minutes and for transit modes only; • Expected time spent waiting and on-board/walking if everything goes as planned: given in minutes and for transit modes only. In the retrospective surveys, the expected travel time might be biased as respondents may adjust their expectations based on what they actually experienced; • Trip complexity: number of trip legs.
In addition, in Sect. 5, selected socio-demographic and familiarity variables from sections (a) and (d) are used to characterize the dataset. These include gender, age, disability, frequency of travel by PT, income and access to private vehicle and bicycle.
In total, 554 responses were collected during the pilot survey, of which 363 records contain multi-leg trips. Multi-leg trips are defined as those trips involving at least two trip legs (i.e. an access or egress and a main trip leg). After the dataset was cleaned and verified for completeness, consistency and reliability across the different parts of the survey used, the sample size was reduced to 156. The remaining records were primarily collected in five sites; Stockholm, Bucharest, Dublin, Turin and Coventry, which together account for 89% of the data. Reported trips were distributed almost evenly between the retrospective and on-site formats. The main trip leg of the trip was not explicitly stated by the respondent. Instead, it was determined by summing on-board/walking times and waiting times for each trip leg and defining the longest one as the main leg.

Analysis and results
This section, presents descriptive statistics (Sect. 5.1). The following three Sects. (5.2-5.4) serve to a different and complementary purpose. First, investigating how travelers combine memories of their door-to-door trip to construct and overall evaluation of their trip. Then, identifying the strength of the relationships between overall trip satisfaction and the satisfaction with different trip stages, as well as studying the extent to which the trip stages are inter-correlated. At last, estimating generalized aggregation models which provide the relative weight that each trip stage has on the overall travel satisfaction. Table 1 presents the summary statistics of the socio-demographic characteristics and mobility patterns of the survey sample by purpose, main travel mode and trip complexity. The percentage of respondents per category is shown in bold and the share of missing values and non-responses is displayed in brackets. Data of respondents with missing values were used only in the exploratory stage and were excluded in the model estimation phase.

Descriptive analysis and satisfaction aggregation rules
Approximately two thirds of the respondents are young adults (65%)-age 18-34, who report their commuting trip (67%). In addition, young adults use more frequently than others private vehicle or non-motorised modes (other) as their main travel mode. About 7% of travelers reported to have some type of disability, and their share doubles when considering only non-commuting or bus trips (14% each). Compared to the average duration of all trips (58 min), trips travelled by train (114 min) and non-commuting trips (76 min) have longer average duration and larger variation in trip duration (see SD in the duration row in Table 1). This might be explained by the conjunction of a wider diversity of non-commuting trip purposes, and by the prominence of train (suburban, regional and inter-city) as the main mode. The unexpected high share of unemployed and low income commuters results from the large share of students within this group (40%). Bus is the main mode for 42% of the trips that consist of 2 legs, while metro is the main mode for 84% of the most complex trips comprising of at least 3 legs.
A modification of the interconnectivity ratio proposed by Krygsman et al. (2004), which we define as the proportion of access and egress times to the main leg travel time, is employed in this paper to further characterize the trips by showing the ratio of time spent  by travelers with regard to the main leg. The interconnectivity ratio provides an insight on the distribution of travel time between (primarily) non-motorised modes and the main travel mode. It is expected that most of the travel time is spent in the main mode, and thus yielding ratios below 1. With the exceptions of trips by metro (1.15), 4-leg trips (1.30) and trips with transfers (1.02), generally travelers spend less time in access and egress legs together than in the main mode. A visual examination of the modes used in the access and egress legs displays the importance of non-motorised modes (60-63%) which are followed by a combination of non-motorised modes and transit modes (mixed transit, 18-21%), and exclusively transit modes (11-16%). Non-motorised modes only are particularly preponderant at bus and 2-leg trips, while their presence is much smaller in trips with 4 or more legs, with a transfer and with train as the main mode. Table 2 shows the means and standard deviations for overall satisfaction with the entire trip and for the values obtained for the 9 different trip categories when applying each of the satisfaction aggregation rules, as defined in Eqs. 1-9. Waiting time weights, ω, of 1, 2, 3 and 4 are specified in rule 3, DWC (Duration Weighted Complex) which weights leg satisfaction by its duration. The travel mode category 'other' has been dropped out of the analysis since it encompassed very different travel modes (walk, bike, tram and private vehicles) and the results would be difficult to interpret. In addition, it is not possible to calculate the peak-and-end rules for 2-leg trips, thus they are shown in the table with NA (not applicable).
As observed in Table 2, compared to the entire dataset, overall travel satisfaction is slightly higher for commuting trips and significantly higher for trips with 3 legs and trips with metro as the main mode. Highest overall satisfaction for commuting and metro trips is in line with some previous findings (Woldeamanuel andCygansky 2011 andCao et al. 2015, respectively). Coincidentally, travel duration of all these trip categories is the shortest compared to their counterparts. This might be in line with previous research (e.g. Ory and Mokhtarian 2005) which reported that the level of travel satisfaction is a function of travel distance; and that the longer one travels the less satisfied one becomes.
The mean values of overall satisfaction yielded by the various aggregation rules manifest a large range, as large as 0.8 points for trips of 4 or more legs and as narrow as 0.2 points for 3-leg trips. In the following section we investigate which of the proposed rules provides the best estimate for overall satisfaction for different trip categories.

Satisfaction aggregation models
A series of regression models were estimated with "Overall satisfaction with the whole trip" as the dependent variable, and with each of the satisfaction aggregation rules defined in Sect. 3 determining the specification of the independent variables. Since overall satisfaction is an ordinal variable, ranging from 1 (very unsatisfied) to 5 (very satisfied), ordered logit models are most adequate. In terms of random utility theory, ordered logit model can be expressed as: where y * k is the latent dependent variable of individual k. X k is the explanatory variable set of individual k, which includes a satisfaction aggregation rule for individual k. Note that the intercept is dropped for identification issues. β is the corresponding vector of parameters to be estimated. ε k is the error term which is assumed as an identically distributed logistic error-term. The latent dependent variable is then associated with the observed dependent variable, y k (5 likert scale overall satisfaction), with m = 1,…, 5, defined as follows: (10) y * k = X k β + ε k

3
Aggregation rules for all different trip configurations are modeled. The total number of estimated models amounts to 130 (12 aggregation rules * 11 trip configurations − the 2 peak-end-rules for 2 leg trips). Table 3 shows McFadden R2 square (Rs) in the first row and significance values (Sig.) in the second for each model. As can be observed, the very few insignificant results at a 90% confidence interval are found for the end rule. The magnitude of the R2 square coefficients ranges between .023 and .362 points being highest for metro trips and lowest for bus and 3-legged trips. These coefficients are in general lower than it could have been expected when specifying a model with an overall satisfaction measure calculated by means of a very diverse set of aggregation rules. This may suggest that either the rules tested do not aggregate well door-to-door travel satisfaction or that overall travel evaluations are influenced by factors other than the aggregation of their components. In general, the average weighted rules that consider both moving (in-vehicle/walking) and waiting times (DWC) are the best predictors of overall remembered utility, highlighting the importance of waiting times. However, notable exceptions are found for different trip configurations. The peak rule in which the salient event takes place in the leg with the maximum deviation from the expected travel time (Peak_exp) is found to be the best aggregation rule for non-commuting trips (.155). Additionally, trips made with bus as the main mode are better aggregated (.132) by the peak rule which salient event takes place in the leg with the maximum deviation from the average overall satisfaction (Peak_max). Furthermore, overall satisfaction with 2-leg trips is best reproduced using either the equalweighted averaging or the serial position rules (.281) since in this case they are essentially the same.
Model estimation results show that for a large number of multi-modal trip configurations, normative rules are superior in retrospectively aggregating experiences to heuristic rules. In particular, this is true for duration-weighted averaging rules with a penalty of 3 and 4 times relative to a moving minute. However, for certain trip configurations (2-leg, bus, non-commuting) some heuristic rules, including the peaks and the peak_max-end, yield the best goodness-of-fit.
Evidently, the serial position and the end rule are the worst-performing aggregation rules. However, a remarkable exception is found with the serial position rule for the least complex multi-modal trips (2 legs), where this rule and the equal-weighted averaging rule prove to be best. This might indicate that the shortest multi-episodic experiences are easiest to recollect due to their simplicity, and respondents are thus less prone to recall biases. Additionally, some other factors such as their presumably lesser exposure to waiting times and chances to experience negative and salient events may reduce the likelihood of other aggregation rules to prevail.
The performance of three other measures (DWM and peak-end-rules) is very modest. The poor results yielded by the average-weighted rule that considers in-vehicle/walking times only (DWM) were expected. These results reinforce the notion that moving times alone, and thus the quality of the services and factors influencing the moving experience alone, do not explain well traveler's perception of the overall trip (e.g. Ariely 1998). .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 Purpose Comm .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 Non comm. .000

Cross-correlations of trip legs and overall satisfaction
This section studies the strength of the relationship between overall satisfaction and the different trip legs and aims to unveil the most relevant leg for bus and non-commuting trips. A correlation matrix was constructed in order to identify the strength of the relationships between overall trip satisfaction and the satisfaction with different trip stages, as well as to study the extent to which the trip stages are inter-correlated. Multimodal transit trips consist of two or three stages: a main leg and an access and/or egress stage. Access and egress stages can be comprised of a single or various legs before or after the main leg respectively. Trip legs are aggregated into stages by means of simple average of the trip legs. Figure 2 and Table 4 present cross-correlation matrices for different trip configurations with regard to their purpose, travel mode, trip complexity and presence of transfers, with each satisfaction item represented as a node and the correlation between two items illustrated by a link. The cross-correlation graphs were visualized using the NodeXL excel addin and offer an intuitive glimpse of the relationships between the various factors. In Fig. 2, correlations with the overall trip satisfaction are highlighted using red lines while correlations among trip stages are displayed in blue. The line thickness corresponds to the degree of correlation as follows; correlations above 0.7 are shown with a solid line, those between 0.5 and 0.7 with a dashed line and those between 0.3 and 0.5 with a dash and dot line. The coefficients shown are at a 90% confidence interval and are always positive.
A number of general trends can be observed in Fig. 2. When compared to access and egress legs, the main trip leg is more strongly and consistently correlated (> 0.5) with overall trip satisfaction. In some of the trip configurations where DWC measures are best aggregating the travel experiences ( Fig. 2: a-General, e-Commuter, b-Metro, c-Train, i-4 legs and k-Transfers), this higher correlation could be due to the extensive share (84.6%) of transit modes used in the main trip leg. Transit modes which generally consist of waiting times which in turn are perceived with a higher weight than moving times. In the remaining trip configurations with successful DWC rules, however, the lack of significance and the weakest correlations existing between access/egress legs and overall satisfaction ( Fig. 2: j-Without transfers and h-3 leg trips) might be attributed to the very large share of non-motorised modes used in the access (73 and 81%) and egress legs (64 and 82%) respectively. Being non-motorised modes are characterized by their lack of waiting time and their travel time reliability.
While peak rules were found to be most adequate for explaining satisfaction with bus and non-commuting trips, the impactful leg associated with the peak experience remained unknown. Cross-correlations, however, allow to determine the most probable impactful leg. For bus trips, the main leg is identified as the leg with the maximum deviation from the average satisfaction since this is the only leg showing a moderate but significant correlation coefficient (d-Bus) with overall trip satisfaction. For non-commuting trips, the main leg´s correlation is 0.59, far over the 0.37 of access or egress legs and thus the former should be considered as the most deviant and relevant one. Commuting and non-commuting trips show very similar correlation coefficients: very strong correlation between access and egress stages and between main and overall satisfaction. The main difference pertains to the weaker correlations found for non-commuting trips between access and egress stages and main and overall satisfaction. This may indicate that commuters perceive their door-todoor trip as a more integrated experience." Similar weights attributed to the main and access/egress legs in constructing an overall evaluation of the trips with transfers is made evident in Fig. 2c-Train, where all legs exhibit a very high correlation (> 0.5) with overall satisfaction. Furthermore, in the least complex trips (2-leg) all legs are similarly important and correspondingly all travel modes and trip stages are equally important, as indicated by the performance of the aggregation rules.
Interestingly, an examination of all correlation matrices shows that access legs are more strongly correlated with overall satisfaction than egress legs. This may suggest that the first trip stage influences the overall evaluation more than the last stage (egress) does and that therefore access should be prioritized over egress. The only study that commented on possible differences between the importance of access and egress trip stages ( Van der Waard 1988) found that access walking time has greater influence on bus trip route choice than egress walking time. Finally, a very strong inter-correlation is found across the board between access and egress legs (> 0.7) which may support the belief that first and last miles could be roughly considered as a single entity. However, train trips are a noticeable exception to this assertion. The very different access/egress's travel mode composition generally occurring in train trips (i.e.: Givoni and Rietveld 2006) is also a reality in our data where non-motorised modes and other modes, mainly private vehicle, represent a 60% of the modal split in the access legs while mixed transit and transit are used by 65% of travelers for egress (see Table 1). A plausible explanation to their negligible correlation coefficient is the specific characteristics inherent to each mode which are differently evaluated by travelers.

Trip stages and overall satisfaction
By creating generalized aggregation models the present section aspires to go a step further into the study of the strength of the relationship between overall travel satisfaction and the trips stages. This is done by considering the normative rules only in aggregating individual trip legs' satisfaction into trip stages' satisfaction since they include all trip legs. These generalized aggregation models would assist stakeholders in better evaluating the percent relative weight on overall travel satisfaction assigned to each trip stage.
A series of ordered logit regression models were estimated with "Overall satisfaction with the whole trip" as the dependent variable, and with trip stages' satisfaction values as the independent variables. Each of the normative rules are employed to aggregate trip legs' satisfaction into satisfaction with access and egress trip stages. The main trip stage corresponds to the longest trip leg. There are three conceptual models: one for trips including all trip stages (M1-all trips stages), and two for trips including main and either access (M2) or egress (M3) stages. By combining aggregation rules and trip leg inclusion options, the total number of estimated models amounts to 18. Table 5 shows the estimated coefficients of the independent variables (Coeff), the relative weight percent of the trip stages on overall satisfaction (W) and the McFadden R2 square (Rs). With few exceptions, the results are significant at a 90% confidence interval.
The strength of the Rs coefficients is not directly comparable across conceptual models since they involve a different type of trip, combination of independent variables and number of respondents. However, Rs are used to identify the best aggregation rule for each of the conceptual models. Equal-weighted average (M1 and M3) and duration-weighted moving (M2) are the models explaining the better overall travel satisfaction and thus hereinafter are taken as the reference models. This finding might seem to be in contradiction with Sect. 5.2 results where the IVT and waiting and/or transferring times of the trip legs matter. However, in this case the aggregation is made at a trip stage and not a trip level. In addition, access and egress are made up of few components (between 1.2 and 1.3 trip legs) and their trip leg's duration (moving and waiting times) is shorter than the main leg. Therefore, all together, the impact of the trip leg duration is less consequential. The similar importance given to each of the trip legs of access and egress seems to indicate that travelers naturally aggregate and clump together the components of these trip stages in a simple way. In general, the relative weight of the main leg is higher (M1-100% and M2-54%) than that of the remaining trip stages. This is in consonance with the results obtained in Sect. 5.3. However, egress explains about 2/3 of overall travel satisfaction in M3. The fact that access and egress trip stages are made up of a composition of trip legs, indicates that at a trip leg level their relative weight is smaller than that obtained at a trip stage level.

Discussion
This research contributes to the literature (e.g. Suzuki et al. 2014;Miron-Shatz 2009;Kahneman 2000;Ariely 1998), on combining multi-episodic experiences and provides novel empirical evidence in the transport domain. The results shed light on retrospective evaluations of entire multi-modal trip experiences and how they comprise satisfaction with individual trip legs. The results of this study allow identifying the trip legs that are most influential in determining travelers' overall trip evaluations. This can support practitioners and stakeholders in prioritizing measures for promoting transit service with limited resources. The estimation results for alternative satisfaction aggregation models (Sect. 5.2) highlight the importance of improving traveler's perception of their waiting time. The better goodness of fit of some aggregation measures confirms our assumption that the recollection of retrospective evaluations varies for different travel categories. Hence, trip characteristics should be considered when multi-modal trips are investigated.
In general, normative rules were found better predictors of retrospectively aggregating experiences than heuristic rules. These findings resonate with previous research (e.g. Suzuki et al. 2014;Miron-Shatz 2009) and indicate that no trip leg can be neglected since all of them have an impact on overall travel satisfaction. The average weighted rule that considers both moving (in-vehicle/walking) and waiting times (DWC) performed particularly well, especially when applying a waiting time weight of 3 or 4 times in-vehicle or walking time (DWC3 and DWC4). Nevertheless, trips with train as the main mode are a notable exception, presumably because waiting times at train stations are more tolerable because of the higher service reliability and the availability of station amenities. This statement concurs with Fan et al. (2016) who demonstrated that perceived waiting times diminish for stops that have amenities such as benches and shelters. Furthermore, other studies have shown that travel time perception at stations and stops can be reduced by making the stay more comfortable including aspects of external and internal design, increasing safety and improving how time can be used while interchanging (Hernandez and Monzon 2016;Yoh et al. 2012).
The negative impacts that transfers exert on overall satisfaction found by De Abreu e Silva and Bazrafshan (2013) are not manifested in our study. In fact, trips involving transfers have higher than average overall satisfaction levels (see Table 2). However, the good performance of DWC3 and DWC4 rules for these trip configurations may suggest that improving the perception of waiting time, including when transferring, is important. Previous research showed that the degree to which satisfaction is negatively affected by transfers depends on aspects related to the service offered and the quality of the interchanges including: signposting of facilities, their internal design, safety, security (Hernandez et al. 2014) and their cleanliness and maintenance (De Abreu e Silva and Bazrafshan 2013); aspects that therefore should be enhanced by practitioners.
Unlike findings from other domains (Fredrickson and Kahneman 1993;Redelmeier and Kahneman 1996;Baumgartner et al. 1997;Varey and Kahneman 1992;Schäfer et al. 2014), none of the peak-end-rules show a good performance for any of the trip configurations. This might either indicate that recency effects are not present in the context of travel satisfaction, that our experiment is very different in nature to the ones previously carried out, or that there are a larger number of trips in our database that lack salient events. 1 The absence of a strong end effect does not necessarily imply that the egress leg is not impactful since this leg is embodied in some of the other successful rules. In addition, the egress leg is found to have a higher percent relative weight in certain types of trips (main and egress trips) than the main mode. However, compared to many of the previous experiments, our study is applied to delimited experiences containing very few instant utilities and the overall memory recollection is done completely in retrospect.
M1 results from Sect. 5.4 may suggest that access and egress have no influence at all on overall satisfaction. However, this is conflicting with findings from Sect. 5.2 and inconsistent with the relative weight distribution yielded in the remaining normative models. A plausible explanation to this might be the indirect influence that access and egress exert on overall travel satisfaction through the main trip leg, as observed in the strength of the correlations between the access/egress and main stages (Sect. 5.3).

Conclusion
This work aimed to investigate how travelers aggregate different door-to-door travel experiences into an overall evaluation. In addition, this work investigates the relative weight of trip legs on overall satisfaction. It can be concluded that, as shown by the predominant success of DWC rules, all trip legs are found to be relevant in influencing the overall trip's evaluation. However, the main trip leg together with legs including waiting and transferring times have a higher influence. This partially diverges from Susilo and Cats (2014) findings when they argued that access and egress trip legs influence satisfaction with the main leg. The analysis was performed using a modest sample size and the generalization of the results should be therefore done with caution. Therefore, to be able to generalize this work results the experiment should be replicated with a larger dataset in multiple geographical contexts. In any event, the results set a precedent in the transport domain regarding how retrospective evaluations of different door-to-door trips are constructed, and can be used as a benchmark point for future studies. In addition, this work provides a useful method that can be applied to different datasets for investigating which leg(s) of a certain type of trip is (are) more consequential. Given the possible gap between what travelers actually perceived and memory-based experienced utility, it is recommended to collect instant utilities during the experience or to use methods such as ecological momentary assessment (Stone et al. 1998) or daily reconstruction (Miron-Shatz 2009; Kahneman et al. 2004) that have been proven to elicit recall bias-free instant utilities.
Future research should focus on investigating how the satisfaction aggregation measures work for different main transport modes (private vehicle, soft and other transit modes). Integrating the underlying causes for satisfaction with trip leg, service attributes of individual legs, into a single model of overall satisfaction with multi-modal trips, will allow investigating the trade-offs between different trip elements. Including a set of instrumental variables such as socio-demographic and travel characteristics in the regression models will allow studying their influence on travelers' individual legs' perceptions. In addition, measuring waiting and transferring times' satisfaction as separate episodes within the multi-modal trip would allow to better single out their impact. Finally, exploring the correlation between satisfaction and different leg composition with supply and infrastructure conditions will allow anchoring the relation between travelers' perceptions and objective measures of service provision.