Data collected
For the analysis, we rely on panel data collected in the influence area of the TransMiCable. The data collection had two phases. Firstly, a baseline face-to-face survey was conducted before implementing the project between February and November 2018; then we carried out a follow-up measurement from July 2019 to March 2020. The target population was adults living within 800 m of the TransMiCable stations for at least two years and with no plan to move out for at least two more years.
The overall sample of the study was selected through a multi-stage sampling design. Blocks were selected with a probability proportional to the density of parcels. Every third household was systematically selected. Lastly, we randomly selected one eligible adult per household. If the selected adult was not present or did not agree to participate the household was replaced until completing about 800 adults per area. This procedure allowed to complete a sample size powered to detect changes equivalent to standardized mean differences in outcomes that range between 0.3 and 0.4 (Sarmiento et al. 2020). Hence, the sample is statistically representative of the population of adults living in the influence area of the cable car.
At the baseline, we applied a general questionnaire of revealed preferences to 1031 individuals to gather sociodemographic, mobility, accessibility, and other health and social information. Among these persons, we randomly chose 343 participants to answer a stated preference experiment to assess their willingness to use the TransMiCable, a perceptual questionnaire about satisfaction with the current transport system, and a ranking survey to explore expectations of the cable car implementation. In the follow-up round, we contacted the same individuals to respond to a similar general questionnaire, a new stated preference exercise, and a ranking survey to explore perceptions of the TransMiCable. At this stage, 303 respondents (from the 343 selected) successfully answered all survey components, achieving a reasonable 11.6% attrition rate in the sample. A data cleaning process resulted in the exclusion of two observations for missing and incongruent information. Therefore, the final sample size consists of 341 observations for the baseline survey (i.e., expectations stage) and 301 for the follow-up round (i.e., perceptions of reality). Table 1 includes a description of the sample for both periods.
Table 1 Sample description The sample is characterized by high unemployment, low education levels, and low household income. Most of the respondents were female; time living in the neighborhood has a mean of 19.3 years and a standard deviation of 13.2. Most of the individuals (85%) have used the cable car, while around 16% use it frequently for their daily trips. Many people do not use the cable car regularly because they carry out most of their daily activities inside the neighborhood.
Ranking questionnaire
We issued ranking surveys before and after the TransMiCable went into operation. We asked respondents in both periods to rank the three attributes they expected to improve or perceived to be the most improved after the cable car started operating. We decided that individuals would rank only three attributes to minimize boredom and fatigue effects in the choice process that may affect the reliability of the data, considering that as the number of choices increases people tend to respond less carefully (Bradley and Daly 1994). Also, individuals usually classify the best and worst alternatives more easily, given that they have greater certainty about preferred and extreme alternatives. At the same time, they are less sure about middle options, which may be ranked with less care (Ortúzar and Willumsen 2011). The set of attributes from which to choose was the following.
-
A1. Reduce travel time
-
A2. Improve comfort
-
A3. Improve reliability of waiting time
-
A4. Improve in-vehicle security
-
A5. Improve security at the station
-
A6. Improve road safety
-
A7. Increase the number of places I can access
-
A8. Increase the number of hours (departure time) I can travel
-
A9. Improve the frequency of the service
-
A10. Reduce the fare
-
A11. Reduce pollution
-
A12. Improve reliability of arrival time
-
A13. Improve the neighborhood aesthetic
-
A14. Improve the quality of life
-
A15. Nothing
-
A16. Other
The database for our analysis consists of the sample who answered the ranking component in both surveys. It is noteworthy that along with alternative A16 (Other), we encouraged respondents to specify which benefits, not in the list, they expected to improve after the cable car implementation. At the baseline, this allowed us to reclassify other responses into new alternatives. At the follow-up, we could not reclassify some answers because they referred to unique attributes, such as a better experience on the trip or better customer service offered by the cable operator, compared with informal paratransit services.
The use of rank data allows capturing more variability of the stated choices when compared to single-choice experiments. This reduces the bias related to the issue that all attributes might improve, and relative changes could influence the differences between expectations and perceptions compared to the top attributes that improved the most. Even when an attribute is not ranked within the highest set by an individual, the aggregation of responses from different individuals allows comparing aggregate relative weighting in a before and after situation, which can suggest if expectations were met.
The survey also asked the respondents to rank the three attributes they expected or perceived to get worse. These additional data were used to test the best–worst approaches. However, this was futile since most of the sample responded ‘nothing’, so the new data did not provide enough variability to enrich the models.
Methods
The analysis reported in this paper is based on the modeling of the rank data collected. Rank data is a common source of information in quantitative research of many fields, such as psychology, sociology, and econometrics. Approaches to modeling rank data include order statistics, distance-based, decision trees, paired comparison, and multistage models. Probabilistic ordered models are the most popular approach due to their long history and wide use of literature on statistics and psychology (Yu 2000).
Discrete choice models allow estimating the probability of choosing an alternative from a set of available alternatives, measuring the effect of covariates, and capturing the heterogeneity of this probability. In particular, the Luce Model (Luce 1958), an extension of the multinomial logit model, estimates the choice probability from a set of stated choices in the form of ranks. Luce’s theorem supports this formulation, stating that a ranking can be decomposed into a sequence of S–1 independent choice stages or pseudo-observations. Here, S refers to the number of alternatives in the ranking.
Based on this, we divided the rank into implicit choice observations. The first choice consists of the selection of the preferred alternative when all options are available. The second observation is the choice of the second-best alternative when the preferred attribute is not available. Finally, the third choice refers to the third-best alternative when the two preferred are not available. The choice probability is then given by Eq. 1.
$$\Pr \left( {r_{1} ,r_{2} ,r_{3} } \right) = \Pr \left( {{\raise0.7ex\hbox{${r_{1} }$} \!\mathord{\left/ {\vphantom {{r_{1} } {r_{1} ,r_{2} ,r_{3} , \ldots ,r_{j} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${r_{1} ,r_{2} ,r_{3} , \ldots ,r_{j} }$}}} \right) \times \Pr \left( {{\raise0.7ex\hbox{${r_{2} }$} \!\mathord{\left/ {\vphantom {{r_{2} } {r_{2} ,r_{3} , \ldots ,r_{j} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${r_{2} ,r_{3} , \ldots ,r_{j} }$}}} \right) \times \Pr \left( {{\raise0.7ex\hbox{${r_{3} }$} \!\mathord{\left/ {\vphantom {{r_{3} } {r_{3} , \ldots ,r_{j} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${r_{3} , \ldots ,r_{j} }$}}} \right)$$
(1)
where rk refers to the alternative ranked in position k (k = 1, 2, 3) and Pr (r1, r2, r3) is the probability of observing a given rank order considering the availability of rj alternatives. This modeling framework is also called exploded logit (Ortúzar and Willumsen 2011).
In our specification, the probabilities in Eq. (1) follow the structure of a mixed logit model accounting for the panel effect due to the multiple observations by the respondent (Ortúzar and Willumsen 2011). Equation (2) describes the utility function Uiq associated with alternative i and individual q:
$$U_{iq} = { }ASC_{i} + { }X_{iq} \times \beta_{iq} + \varepsilon_{iq} + \eta_{q}$$
(2)
where ASCi is the alternative specific constants, representing the net influence of all unobserved or not explicitly included characteristics of the individual and the alternative in the utility function (Ortúzar and Willumsen 2011); Xiq are observed attributes (e.g., socioeconomic characteristics); βiq is a set of parameters to be estimated; εiq is a random error component with an identical and independent Type I Extreme value distribution; and ηq is a normal error component with mean zero and standard deviation to estimate, accounting for the panel effect. This error component varies across individuals but is constant over the repeated implicit observations of each individual according to the density function f(θ|η), conditioned to the population parameters θ. The unconditional probability is then given by Eq. (3), which can be estimated by applying simulated maximum likelihood methods (Train 2009):
$$P_{iq} = \smallint \mathop \prod \limits_{k} \frac{{e^{{U_{iq} }} }}{{\mathop \sum \nolimits_{j = 1}^{j} e^{{U_{jq} }} }}f(\theta |\eta )d\eta$$
(3)
We estimated two groups of models. Firstly, we estimated aggregated models to compare the overall expectations before the construction of the cable car versus perceptions after its implementation. In this case, we estimated market share models considering only ASCi in the utility functions (βiq = 0) assuming homogeneity in the perceptions. The second group of models considers βiq parameters associated with the attributes sex, household income, occupation, and use of the TransMiCable. These models aim to capture perception heterogeneity according to these covariates. Both groups of models account for the panel effect through the inclusion of random error components ηq.
The final number of pseudo-observations for the exploded models is 979 for the baseline (i.e., expectations) and 837 for the follow-up stage (i.e., perceptions). In some cases, the respondent did not report the complete ranking and only selected one or two alternatives. Furthermore, we evaluated the perception heterogeneity considering the catchment areas of the TransMiCable stations but did not find significant differences across these areas. We hypothesize that population characteristics are homogenous in the whole study area and there are no significant spatial differences in respondents’ perceptions.