We divide the results section into three sections. First, we depict the descriptive analysis, including the description of the data collection and sample and an evaluation of the non-trading behaviour; then, we cover the choice modelling analysis; and the last part reports the scenario analysis, as final model application.
Sample description and descriptive analysis
A total of 1077 respondents finished the questionnaire, of which 1006 (93%) were considered valid after data cleaning (based on survey completion time and straight lining checks throughout the whole survey). Table 3 shows the socioeconomic characteristics of the sample, the target population (highly urbanised areas in the Netherlands), and the overall Dutch values. Gender and the two urbanisation levels are well represented in our sample. Sample age distribution is overall representative of the respective population, although middle aged adults are a bit underrepresented and the elderly population slightly overrepresented. Shares for education, working status and household composition can only be compared to the national values. As expected, our (sub)urban sample has a higher percentage of higher educated individuals, working respondents and single households. Given the similitudes between the analysed sample shares and their Dutch counterparts, we consider that our sample adequately mirrors the socioeconomic characteristics of the target population.
Table 3 Comparison between the survey sample and the Dutch population. Sources for the population data: (Centraal Bureau voor de Statistiek (CBS) 2018a), (Centraal Bureau voor de Statistiek (CBS) 2018b), (Centraal Bureau voor de Statistiek (CBS) 2018c), (Centraal Bureau voor de Statistiek (CBS) 2018d) Out of the 1006 respondents, 308 were directed to the commuting trip purpose and 698 answered the survey for the leisure trip. The leisure trip purpose subsample had 42% of working individuals. Differences in working status between both subsamples led to differences in age and education levels (higher proportion of older and lower-education level individuals in the leisure subsample).
A significant share of respondents (around 30%) exhibited a non-trading behaviour in the SP experiment, despite that all blocks contained scenarios with values of time that ranged from less than 5 €/h to over 30 €/h (initial choice modelling analysis showed an average value of time of around 15 €/h). 50% of non-traders chose the individual alternative in all of the shown scenarios (we refer to these respondents as “individual-only” respondents), and the other 50% chose exclusively the pooled alternative (“pooled-only” respondents). Given the link between attitudes and behaviour (Molin et al. 2016), we perform an exploratory factor analysis (EFA) on the included privacy, cost and time related attitudinal indicators to shed light on the main reason behind the exhibited non-trading behaviour. We use principal axis factoring with direct oblimin rotation, and extract factors with eigenvalues greater than one (Kaiser–Meyer–Olkin measure KMO = 0.797 and Bartlett’s test of sphericity p < 0.001, indicating sampling adequacy and adequate correlation between the EFA items). The included statements and the related performed analysis is included in “Appendix A”.
We extract three factors from the EFA (privacy, cost and time factors), as expected. We measure the reliability of the factors with the Cronbach’s alpha coefficient, and obtain (for the Cronbach’s Alpha based on the standardized items) 0.61 (privacy factor), 0.70 (cost factor) and 0.57 (time factor). Values over 0.60 are considered acceptable in exploratory research (Nunnally and Bernstein 1994). Cronbach’s Alpha value, however, is dependent on the number of items that belong to a factor (Tavakol and Dennick 2011), which explains the somewhat lower value for the time factor (which consists of two items). Following Schmitt (1996) and Taber (2017), which argue that factors with lower alphas can also prove both acceptable and useful, and after checking that the two attitudinal time items are highly correlated (their Pearson correlation is 0.40), we decide to not discard the time factor due to the exploratory (and not confirmatory) nature of our factor analysis.
The means of all attitudinal indicators display the same trend: “individual-only” respondents (15% of the sample) are the most privacy and time sensitive, and the least cost sensitive; the opposite holds for the “pooled-only” respondents (15%). The mean values of “traders” (70%) lie always in between both two groups. ANOVA tests confirm that these differences are significant for all indicators at the 95% confidence level or beyond. This difference is largest between the “individual-only” and the “pooled-only” groups, significant at the 99% level (independent t-test). Therefore, we consider the existence of strong preferences as the main underlying cause for the non-trading behaviour, and accept non-traders as valid respondents in the posterior choice modelling analysis.
Further, pair-wise comparison between “individual-only” and “traders” shows statistically different means in all indicators (in all but one at the 99% level) while differences between “pooled-only” and “traders” are insignificant for some of the privacy indicators. This suggests that differences in preferences between “individual-only” and “traders” stem from both different values of time and willingness to share, while differences between “pooled-only” and “traders” stem mainly from differences in the values of time of the two groups.
Discrete choice model estimation
We estimate three model structures (see Table 4), as indicated in the Discrete Choice Modelling Methodology subsection. The first model is a mixed logit model with a random component to account for the panel structure of the data. All included parameters in this first model are significant and have the expected signs. Time and cost are modelled linearly as generic parameters (i.e., they have the same parameters for both alternatives). We find that working individuals have a larger time disutility, and include this taste heterogeneity in the model with an additional time disutility parameter for this segment of the population. The models tested show that the effect of the number of additional passengers is best modelled as a trip specific disutility for the case of one or two extra passengers (same disutility for both situations). However, the WTS disutility for the four extra passengers scenario is higher (starting at 20% higher for 13 min rides, the shortest trip included in the experiment) and increases per minute of in-vehicle time. We speculate that individuals consider that a similar level of privacy and enough personal space is granted in both the single and the two co-rider scenarios, which may explain why the same disutility is attributed to both scenarios. This threshold is however surpassed for the four co-rider situation, leading not only to a higher value but to a per-minute value. We find that having a high income, never using bus/tram/metro (BTM) and having a low usage of cycling increases the preference towards the individual ride alternative. These effects are also included in the model specification. We also find, that, unlike in Lavieri and Bhat (2019), commuting and leisure trip purposes are best modelled together [tested using a likelihood ratio test (Ben-Akiva and Lerman 1985)].
Table 4 Parameter values (and robust t-tests) of the mixed logit (ML) models and parameter values (and z-value) of the latent class choice model (LCCM) (p-value: ≤ 0.01 ***, ≤ 0.05 **, ≤ 0.1*) Our second model adds random components to the time and cost attributes, to account for unobserved heterogeneity. Adding a random component to the WTS-related attributes did not improve the model. We tried different distributions for these random components: a normal distribution, a lognormal distribution, and a doubly-truncated (i.e., bounded) normal distribution. The two latter distributions allow to not associate individuals with positive parameter values (which would be counterintuitive for the time and cost attributes). From the three distributions, the doubly-truncated distribution provides the best model fit (truncation is done by normalising the remaining surface). The time-related random component only affects the common time parameter and not the additional time-related parameter concerning working individuals. Unlike in the previous model, not using BTM (bus/tram/metro) did not prove to be significant, and is removed from the final model specification. The final adjusted rho-squared of the two ML models are 0.281 and 0.291 respectively, indicating a better model fit of the second model specification. Both models are estimated using 10,000 Halton draws.
We additionally calculate the values of time (VOT) of the estimated models, which help us further compare their results. The basic VOT calculation (as direct division between the \(\beta_{time}\) and \(\beta_{cost}\) coefficients), does not apply in the case of random coefficients. In this case, a second order approximation can be used (Seltman 2012). Given that the covariance of both parameters can be assumed to be zero (as a result of the choice model formulation), Frei et al. (2017) approximate the VoT in this case as follows:
$$\begin{aligned} VoT & = E\left[ {\frac{{\beta_{time} }}{{\beta_{cost} }}} \right] \approx \frac{{E\left[ {\beta_{time} } \right]}}{{E\left[ {\beta_{cost} } \right]}} - \frac{{Cov\left( {\beta_{time} , \beta_{cost} } \right)}}{{E^{2} \left[ {\beta_{cost} } \right]}} + \frac{{Var\left[ {\beta_{cost} } \right] \times E\left[ {\beta_{time} } \right]}}{{E^{3} \left[ {\beta_{cost} } \right]}} \\ & \approx \frac{{E\left[ {\beta_{time} } \right]}}{{E\left[ {\beta_{cost} } \right]}} + \frac{{Var\left[ {\beta_{cost} } \right] \times E\left[ {\beta_{time} } \right]}}{{E^{3} \left[ {\beta_{cost} } \right]}} \\ \end{aligned}$$
(1)
Unlike the model reported in Frei et al. (2017), our time and cost distributions do not follow a normal distribution but rather a doubly truncated normal distribution (\(z_{1}\) = −1.28, \(z_{2}\) = 1.28). Therefore, the mean remains the same as the non-truncated distribution, but the truncation shrinks the variance of the distribution relative to the non-truncated case. Therefore, the \({\text{Var}}[\beta_{\text{cost}}]\) introduced in (1) has to be adjusted. For our symmetrical case, the corresponding formulation is as follows (we refer the reader to Burkardt (2014) and Johnson et al. (1994) for the general mathematical formulation):
$$Var\left[ {\beta_{cost} } \right] = \sigma^{2} \left[ {1 + \frac{{z_{1} \times \emptyset \left( {z_{1} } \right) - z_{2} \times \emptyset \left( {z_{2} } \right) }}{{\Phi \left( {z_{2} } \right) - \Phi \left( {z_{1} } \right)}}} \right]$$
(2)
where \(\sigma\) is the variance of the non-truncated normal distribution, and \(z_{1}\) and \(z_{2}\) are the lower and upper truncation bounds of the equivalent standard normal distribution. The functions \(\emptyset\) and \(\Phi\) are:
$$\emptyset \left( z \right) = \frac{1}{{\sqrt {2\pi } }} exp\left( { - \frac{1}{2}z^{2} } \right)$$
(3)
$$\Phi \left( z \right) = \frac{1}{2} \left( {1 + {\text{erf}}\left( {\frac{z}{\sqrt 2 }} \right)} \right)$$
(4)
The WTS calculations are analogous to the VOT ones (including \(\beta_{add pax}\) instead of \(\beta_{time}\)). Values of these VOT and WTS values are depicted in Table 5. As can be observed, values for the ML model with random components are a bit higher. Not capturing the unobserved heterogeneity in the model formulation can thus lead to an underestimation of the VOT and WTS.
Table 5 Value of Time (VOT) and Willingness to Share (WTS) values for the estimated models For the ML model with random components, we obtain a VOT of 16.25 €/h for non-working individuals and 20.08 €/h for working individuals. WTS values are much lower. They amount to 0.52 €/trip when the ride is shared with one or two additional passengers, and 2.85 €/h when the ride is shared with four additional passengers (remember that the ML model included the four co-rider disutility as a time-dependent variable).
Next, we compare the obtained VOT and WTS values with previous studies, in particular those reported in Al-Ayyash et al. (2016) and Lavieri and Bhat (2019). These studies, similarly to this study, include the time, cost and the number of additional passengers as explanatory variables. Al-Ayyash et al. (2016) is set in Beirut, Lebanon, and addresses university students and university employees. It estimates different parameters depending on how often individuals would be willing to adopt the pooled on-demand service for their university commuting habits, and it differentiates between car and public transport commuters. Their obtained VOTs (converted to Euros) range between 3 €/h and 13 €/h. These are lower than our obtained values, which may be arguably attributed to the lower purchasing power of individuals in Lebanon in comparison to those in the Netherlands. Lavieri and Bhat (2019), in turn, is set in the Dallas-Fort Worth Metropolitan Area, USA, and studies commuters. Its obtained VOTs are around 26 €/h for working trip purposes, and 21 €/h for leisure trip purposes, slightly higher values than those found in our study (~ 20 €/h for working individuals).
Regarding the WTS, results from Al-Ayyash et al. (2016) indicate that respondents are willing to pay between 0.5 € and 2 € to perform their ride in a vehicle that allows for a maximum of two extra passengers instead of riding a vehicle that allows for up to five extra passengers. This result resonates well with our findings. In Lavieri and Bhat (2019), the ratio between the parameter of additional passengers and cost yields a disutility of around 0.4–0.8 €/trip per additional passenger. Again, these values are in line with our findings.
We conclude the comparison between the studies comparing the ratio between the WTS and the VOT values in the three studies. The ratios that can be obtained from the different traveller categories analysed in Al-Ayyash et al. lead to values around 0.1. To match their approach and obtain a comparable ratio from our study, we need to consider as WTS value the difference between the four co-rider scenario and the 1–2 co-rider scenario. We obtain ratios of 0.05–0.1 for trips lasting 30–60 min. WTS-VOT ratios in Lavieri & Bhat amount to 0.02–0.07 for the 1–2 co-rider scenario. In this case, our ratios are also in the same range, amounting to around 0.03. This comparison shows that the VOT and WTS values obtained in our study are well aligned with results reported in previous SP experiments.
Finally, we perform the LCCM analysis. We do so with the first ML specification as a starting point. We determine the number of classes to be included in the model based on the BIC (Bayesian Information Criterion) index. The four class model minimises the BIC index and yields a meaningful segmentation, and is therefore adopted. The final model, shown in Table 4, includes different pooling parameters for different classes. This indicates that the sharing attribute is best modelled using different specifications for different individuals. All time and cost parameters are significant at the 95% level and have the expected negative signs. Parameters related to the number of additional passengers are also negative, with a higher disutility the more extra passengers are in the vehicle, as expected. The majority of the passenger related attributes are also significant at the 95% level. Three of the classes include an alternative specific constant (ASC) in their model specification. The positive sign of two of them implies a preference towards the pooled alternative over the individual one when time and cost parameters are zero and there is one extra passenger in the pooled option. A first explanation could be that the two classes prefer sharing their vehicle (e.g., environmental or social considerations). However, individuals in this classes do experience a higher disutility when sharing the vehicle with two individuals than with one, and this is again higher with four individuals than with two (negative related dummy coded parameters, largest for the four extra passenger specification). Therefore, we conclude that the positive ASC is not due to a preference towards sharing the vehicle, but it is linked to the cost-saving characteristic of the pooled alternative. The LCCM also includes four active covariates, which help define the classes and forecast class membership: being a working individual, having a high personal income, never using bus/tram/metro and being aged 18–34. Three of them also played a role in the ML specification, underscoring their relevance in explaining preference heterogeneity in our SP experiment.
To better understand the main differences between the classes, we calculate the VOT and WTS values for the different classes (Table 5) and depict percentage differences between classes regarding socioeconomic and mode use characteristics (Fig. 3). We also attach a motto to each class, as follows:
-
LC 1 (29% of the sampleFootnote 1): “It’s my ride”. Individuals in this class experience the highest disutility related to sharing their ride. This preference is confirmed with the attitudinal indicators: this class has the strongest attitude towards privacy, the highest sharing-related time sensitive attitude, and the lowest price sensitive attitude of all classes. “Individual-only” respondents are to be found in this class, amounting to over half of this class’ respondents. Sharing disutility for rides shared with four other passengers is proportional to the in-vehicle time (as specified for the ML model) for individuals in this class. Individuals in the other three classes (less adverse to sharing) perceive it as a per-ride fix disutility. Individuals in this class tend to be male, middle aged (35–64), and have high personal incomes. Regarding current mobility, they differ from the other classes in their higher car usage, and lower bicycle and public transport usage.
-
LC 2 (28%): “Sharing is saving”. They are the most positive towards the pooled alternative, which can be explained by their price sensitivity (the pooled option offers them always cheaper rides) and low sharing reluctance. These two characteristics explain why “pooled-only” respondents are to be found (almost exclusively) in this class. Individuals aged 65 and older, females and not working respondents are more predominantly in this class.
-
LC 3 (24%): “Time is gold”. These individuals display the highest value of time. They differ from “It’s my ride” individuals in their higher acceptance towards pooling. This higher acceptance explains why despite having a somewhat lower value of time, “it’s my ride” individuals have a more time sensitive attitude towards increases in time caused by sharing their ride. Their strong time sensitivity, together with the little disutility they attach to pooling per se cause the ASC of this class to have a positive sign. Note, however, that the lowest added time for the pooled alternative is three minutes, and “Time is gold” individuals already associate a larger disutility towards pooling for the three minutes extra time than the positive utility stemming from the ASC, implying that if no cost differences would exist, the individual alternative is preferred for the scenarios included in the SP. Respondents also seem to be more time sensitive for shorter trips (i.e., for the ≤ 12 km version of the SP experiment), with 55% of individuals in this class having had the short version, versus 45–50% in the other three classes. Young (18–34), female, highly educated individuals characterise this class. Frequent car usage in this class is also higher than the average, second to “It’s my ride” individuals.
-
LC 4 (19%): “Cheap and half empty, please”. This is a very cost sensitive class, with a value of time even lower than the “Sharing is caring” class. The main difference compared to the second class is the more negative preference of “Cheap and half empty, please” individuals towards the pooled alternative, especially when four extra passengers are in the vehicle (the disutility regarding pooling with an increasing number of passengers increases exponentially). This explains why, despite their lower value of time, “Cheap and half empty, please” did trade between the individual and the pooled alternative in the SP experiment. This fourth class has a higher share of male and middle educated respondents than the average sample. The likelihood to belonging to this class is similar for individuals with different age groups or working situation.
We now turn to validating the obtained models by comparing the prediction rate of both the estimation and the validation subsamples (all models were estimated on 80% of the sample and the remaining 20% was kept for validations purposes). We obtain respectively 71% and 71% for the in-sample data and 73% and 72% for the out-of-sample data for the two ML models. Both offer adequate and similar prediction performance. We obtain similar prediction rates (72% and 75% for the estimation and validation samples respectively) for the LCCM using prior membership probabilities (i.e., using only information regarding the active covariates to infer the membership probabilistic distribution to each of the classes). Moreover, when using the individual’s posterior membership probabilities of the LCCM (i.e., statistical inference using an empirical Bayes method which includes information from the observed choices and not exclusively the active covariates to determine the individual’s probabilistic distribution to each of the classes), a 93% correct prediction rate for both estimation and validation samples is achieved. This, in turn, suggests that the presented classes succeed in describing the existent heterogeneity of different individuals regarding preferences towards time, cost and pooling attributes when choosing between individual and pooled on-demand services.
Scenario analysis
We subsequently perform a scenario analysis as model application. These scenarios seek to quantify the impact of time, cost and the number of passengers on the willingness to request pooled rides (over individual ones). The scenarios are designed to demonstrate the impact of the modelling results and, thus, understand their policy implications. Consequently, scenario analyses can help prioritise effective policies triggered by the behavioural change caused by new mobility services (Tarabay and Abou-Zeid 2019). The ML model with random components has the lowest BIC value out of the three previously estimated models (3226.27 vs. 3262.51 and 3274.73), showing the best model fit, and is therefore the one used for the scenario analysis. The discrete choice model is based on disaggregate demand, but the aggregate demand is necessary to derive indicators at the population level, i.e. we need to weight individuals such that they mirror the real distribution of the population. Even if the socioeconomic characteristics of our sample are already quite representative, we weight our sample to mirror the age and gender shares of the target (urban) population for the scenario analysis.
Figure 4 shows the effect of varying time–cost trade-offs in the expected percentage of requested pooled rides for the one or two extra passenger scenario and the four extra passenger scenario (versus the individual shares). As expected, the pooled share increases with increasing price difference and decreasing time difference. For the same time–cost trade-off, the achieved pooled share with one or two additional passengers is 5–13% higher than with four additional passengers (mean 10.5%, median 11.1%). Pooled shares vary in our scenarios from 5 to 85%, showing the great impact that the studied range of additional times and fees has on the outcomes. In our experiment, the pooled alternative is preferred by 85% of the individuals when this entails a €3.00 price reduction and only 3 min of extra time with either one or two additional passengers.
For the base scenario (20 min and €6.00 for the individual ride; + 7 min and −€2.00 for the pooled ride), we obtain the following shares: 56% for the pooled alternative if sharing with one or two extra passengers and 43% in case of sharing with four extra passengers. These shares are well above current shares for pooled rides reported from deployed commercial on-demand services (reported in the Introduction). We highlight three main reasons for that. First, it can be that real time–cost trade-offs presented to on-demand users are more negative for the pooled alternative than those represented in our base scenario. Second, our results are inevitably influenced by the attribute levels used in our SP design. And third, while some segments of the population are overrepresented among users of on-demand services, our scenarios make the breakdown between individual and pooled services for a representative sample of the overall population. For example, currently a higher percentage of higher income individuals tend to use on-demand services. Their preference towards the individual alternative (confirmed by our estimated model), explains the lower pooled share in reality compared to the one found in our scenario. Still, these results suggest that there is potential for the share of pooled requests to increase once on-demand services as a whole become more common place.
In our model formulation, only absolute differences in time and cost between the two alternatives matter (given that an additional minute or Euro are associated with the same linear disutility in the individual and pooled alternatives). For the four passenger scenario, however, sharing disutility varies as a function of the total time of the pooled ride. Figures 5 and 6 show the influence of time and cost, respectively, on the share of pooled trips (while keeping the other variable constant). For any 10 min of added individual ride, we see a drop of around 7% in the share of individuals who opt for the pooled alternative when the trip is shared with four additional passengers. This does not affect the shares when sharing rides with just one or two extra passengers, given that, in the ML model, the one and two co-rider disutility (unlike the four co-rider disutility) is a per trip and not a per minute value.