Introduction

Context

In many urban public transport (PT) systems worldwide high passenger volumes result in high crowding levels on-board PT vehicles. The experience of (over)crowding in a PT system has a negative impact on perceived travel times and attractiveness of the urban PT journey by passengers. This can result in changes in passenger route choice through the PT network (for example found in Yap and Cats 2021), or deter travellers from using the PT system altogether and contribute to a mode shift to other, potentially less sustainable, travel modes. It is therefore fundamental to understand how PT passengers perceive their journey time under crowded circumstances. This can improve the accuracy of business cases and cost–benefit analyses when evaluating the expected cost and benefits of schemes which directly or indirectly result in reduced PT crowding levels (e.g. Jenelius and Cats 2015). Additionally, a more accurate understanding of passenger crowding valuation has the potential to improve the crowding parameters in strategic transport models and PT assignment models, thereby improving passenger forecasts and model validation (see for example Hamdouch et al. 2011; Nuzzolo et al. 2012; Pel et al. 2014; Cats et al. 2016; Hänseler et al. 2020).

Relevant literature

Passengers on-board crowded services perceive the in-vehicle time more negatively compared to travelling using uncrowded PT services, which is typically expressed by an in-vehicle time crowding multiplier that increases with higher on-board crowding levels. Over the last two decades, many studies have been performed aiming at inferring this in-vehicle time crowding multiplier as a function of the load factor or standing density (average number of standing passengers per m2). Initially, most of these studies relied on stated preference (SP) approaches where respondents were asked in (online) surveys to indicate which route or mode choice alternative they would choose based on hypothetical crowding scenarios. For example, SP studies of crowding valuations were performed for UK rail services (MVA Consultancy 2008), metro and buses in Santiago de Chile (Batarce et al. 2016, Tirachini et al. 2017), and RER services in Ile-de-France (Kroes et al. 2014). Wardman and Whelan (2011) and Li and Hensher (2011) provide an extensive overview and meta-analysis of SP based studies of PT crowding valuations by passengers up until 2011.

In more recent years there is an increasing number of studies using revealed preference (RP) to estimate PT crowding valuation. By unlocking large-scale passenger demand data from Automated Fare Collection (AFC) systems and/or Automated Passenger Count (APC) systems such as load-weigh systems, passengers’ crowding valuation can be derived from empirically observed route and mode choice behaviour, rather than relying on stated choices elicited by means of SP experiments. SP approaches have the inherent limitation that there is a potential discrepancy between the stated behaviour by respondents in a survey compared to their actual behaviour, which can potentially bias the estimated coefficients. RP based crowding studies have been applied to case studies in Singapore (Tirachini et al. 2016), Hong Kong (Hörcher et al. 2017), the Netherlands (Yap et al. 2020) and Washington, DC (Yap and Cats 2021). For Singapore, Tirachini et al (2016) found that the in-vehicle time multiplier linearly increases with 0.18 for each increase in standing passengers per square metre, for example resulting in an in-vehicle time multiplier of 1.55 in the event of 3 standing passengers per m2. Hörcher et al. (2017) found that for the Hong Kong metro the in-vehicle time multiplier at full capacity ranges between 1.72 and 1.98 depending on the seat probability, while Yap and Cats (2021) found an average in-vehicle time multiplier of 1.84 when the on-board load equals the full capacity for the Washington DC metro system.

All above-mentioned studies estimate the perception of PT crowding based on data before the outbreak of the COVID-19 pandemic. One can expect that passengers perceive crowding more negatively since the start of the pandemic as crowded environments generally pose a higher risk of contracting COVID-19. Furthermore, one might hypothesise that after the introduction of regulations focusing on social distancing and capacity limitations by many countries worldwide, people are less likely to feel comfortable in, or to accept, very crowded environments. It is thus important to understand how PT passengers perceive on-board crowding in this post-pandemic era, as changes in crowding perception might influence route and mode choice and might hamper a full demand recovery on PT routes being perceived as (over)crowded, imposing in effect new de-facto capacity limits.

More recently, a few studies have been performed which assess passengers’ post-pandemic crowding perception based on stated preferences elicited from choice experiments. Shelat et al. (2022) found that on-board crowding and COVID infection rates are the most important factors being perceived as a risk to use PT. Basnak et al. (2022) confirmed in a SP study that post-pandemic crowding perception in Santiago de Chile is higher than how it was perceived pre-pandemic, thereby also highlighting the perceived importance of wearing face coverings. Flügel and Hulleberg (2022) conclude based on an SP experiment that crowding valuation in PT was significantly higher in November 2021 compared to November 2018 based on data collected in two Norwegian cities. Their results show that this crowding valuation reduced in May 2022, but still remains higher than pre-pandemic. Cho and Park (2021) conducted surveys in the Seoul metropolitan area before and after the COVID-19 pandemic, which showed that passengers perceive post-pandemic crowding impedance as 1.04–1.23 times higher than pre-pandemic. Bansal et al. (2022b) estimated several logit models by conducting a stated choice experiment during the COVID-19 pandemic to 961 pre-pandemic users of London Underground. They found an in-vehicle time multiplier of 1.73 when the metro operates at full capacity. Furthermore, they were able to test preferences for different stages of the epidemic as well as for different interventions, such as vaccination rate and mandating wearing face covering whilst travelling. However, as of yet no studies have been performed which use observed passenger route choices from large-scale AFC and APC systems to re-establish public transport crowding perception in the aftermath of the pandemic based on actual passenger behaviour rather than based on stated behaviour in surveys or choice experiments.

Study contribution

The main contribution of our study is in deriving the crowding valuation of public transport passengers in a post-pandemic era entirely based on observed, actual passenger route choices. To the best of our knowledge, our study is the first one adopting a revealed preference approach to derive post-pandemic crowding perceptions, thereby adding to the emerging evidence from studies which derive post-pandemic crowding perceptions from SP surveys (see Table 1). By relying on large-scale, empirical passenger demand data, we derive crowding valuations based on more than 20,000 observed passenger journey data in the London PT network, which is a much larger sample size than for typical SP studies and therefore results in more robust estimates. Furthermore, this allows for a comparison of crowding perceptions with pre-pandemic crowding curves which were estimated using similar revealed preference approaches in past studies. In our study, we infer passenger crowding valuation by estimating a discrete choice model using maximum likelihood estimation based on observed passenger route choices on Transport for London’s PT network. Our study results contribute to a better understanding of how on-board crowding in urban public transport is perceived in a European context since the outbreak of the COVID-19 pandemic.

Table 1 Study contribution

A methodological contribution of our study is the use of data from APC systems to obtain the on-board train loads and crowding levels. Previously conducted RP based studies to PT crowding valuation merged AFC and Automated Vehicle Location (AVL) data to infer train loads. Especially in high-frequent and high-density metro systems this can be a challenging and complex process as a passenger-to-train assignment procedure needs to be implemented, combined with a route choice model when multiple feasible routes exist between the entry and exit station registered in the AFC data (see for example Zhu et al. 2017) with a larger potential for modelling errors. Instead, in our study we rely on empirical train load observations directly derived from load-weigh systems. This means that we use a more direct approach to obtain train loads with the potential of faster and more accurate train load estimations, as it only relies on the accurate calibration of the average passenger weight (as opposed to a range of route choice and assignment parameters, some of which are subject to estimation in this study).

The remainder of this paper is structured as follows. In the Section “Methods and data” we discuss the data semantics, choice set generation and model specification. The model estimation results and policy implications are discussed in the Section "Results and discussion", followed by conclusions and recommendations for further research in the Section "Conclusions and recommendations".

Methods and data

In this section, we discuss the required data inputs (Section "Data input"), choice set generation (Section "Choice set generation"), choice identification from the data (Section "Choice identification"), model specification (Section "Model specification") and the extraction of the attribute levels from the data (Section "Attribute levels").

Data input

As input for our study we use passenger demand and occupancy data derived from London, United Kingdom. We focus on the urban PT network of the Greater London Area, which is under the authority of Transport for London (TfL). Passengers travel through this network by using an Oyster card or a Contactless Payment Card (such as a bank card), meaning that passenger demand data is captured via the AFC system in place. We only focus on journeys entirely made by bus or by metro (London Underground: LU) because passengers are required to touch in upon boarding a bus next to the bus driver and since 99% of the LU stations are equipped with closed ticket barriers. This means that demand data from the AFC system provides reliable, complete data on travel patterns. Journeys made on other rail modes (such as Docklands Light Rail and London Overground) are not included as many stations are ungated (Transport for London 2022). Bus and metro journeys amount to 88% of the total number of journeys on TfL’s network, thus covering the vast majority of all journeys (Transport for London 2023).

For metro journeys in London each row in the AFC data consists of the location and time of the first station entry and of the last station exit. As passengers are required to touch in and out at the station gates, empirical data is directly available for both the station entry and exit. For buses, passengers are only required to touch in upon boarding, meaning that boarding stop, time and bus route are empirically available. The alighting location for most bus journeys is inferred using the well-known trip-chaining algorithm (based on Sánchez-Martinez 2017) and otherwise scaled based on the inferred alighting probabilities for each downstream stop.

The data characteristics as discussed above imply that AFC data does not directly provide information on loads and crowding levels for the metro network. Passenger assignment modelling is required to determine the most plausible route passengers take between the station entry and exit gate, as several plausible routes can exist between a certain station pair in a high-density metro network such as London. As these route choice models typically require input parameters for waiting time and crowding valuation, relying on this data to estimate crowding valuation could lead to a self-fulfilling prophecy or to incorrect estimates if these choice model parameters are incorrect. Instead, for metro crowding information we therefore rely on APC data obtained from load-weigh data as independent data source. For selected metro lines of the TfL network (the Central Line and Victoria Line) the rolling stock is equipped with a load-weigh system, which provides on-board passenger loads for each line segment by train and on average per 15-min time interval based on an implied average weight per passenger of 75 kg, which is validated based on on-board train surveys. London buses are not equipped with APC systems, meaning that bus load and crowding information is not directly available. Estimating the passenger load on-board buses by the accumulation of (observed) boarding and (inferred) alighting passengers for each bus trip relies on several assumptions to infer the most plausible alighting stop for bus passengers and to apply a scaling procedure for bus journeys where the alighting stop cannot be inferred by a destination inference algorithm. As a result, there is a much higher degree of uncertainty involved when estimating bus loads compared to metro loads. Therefore, given our study purpose to rely as much as possible on directly observed crowding data, in this study we only focus on estimating the crowding valuation for metro journeys for which we can rely directly on APC data. Both metro and bus journeys are however included in the passenger journey dataset, and all other attribute values are derived for both modes.

In this study we estimate three different models:

  • A pre-pandemic off-peak model based on 3–7 February 2020. We use this as an uncrowded baseline model to confirm whether the estimated in-vehicle time and waiting/walking time coefficients are in line with previous RP based model results.

  • A post-pandemic off-peak model based on 13–17 June 2022. This uncrowded model is estimated to assess whether base level in-vehicle time and waiting/walking time valuations have changed since the COVID-19 pandemic.

  • A post-pandemic peak model based on the same period 13–17 June 2022. This model, focusing on AM and PM journeys, estimates the post-pandemic metro crowding valuation based on load-weigh data which is available for this period.

During the selected post-pandemic period 13–17 June 2022 there were no COVID related restrictions in place anymore in London. After several lockdowns in 2020 and 2021, all sectors were allowed to fully reopen since July 2021. Additionally, no capacity constraints or social distancing rules were in place when travelling by PT. Since February 2022 passengers were not mandated to wear face covering anymore whilst using London’s PT network. This implies that June 2022 reflects a steady-state situation in the post-pandemic era where passengers have been able to experience regular PT travel conditions for several months.

For these models we extracted all AFC passenger demand data for 3–7 February 2020 and 13–17 June 2022, as well as the available APC load-weigh data for 13–17 June 2022. Total PT passenger journeys are constructed by linking individual passenger transactions from the AFC data together using the linkage criteria of the transfer inference algorithm as set out by Gordon et al. (2013). Next, the ultimate origin stop and destination stop of each PT journey are both clustered into an origin zone and a destination zone by applying hierarchical agglomerative clustering. Using this unsupervised learning approach, all bus and metro stops located within walking distance from each other (applying a complete linkage with a threshold of 350 Euclidean metres) are grouped together, which categorises all PT passenger journeys into an origin zone and a destination zone.

Choice set generation

Determining which journeys to include in the choice set for the purpose of this study is a non-trivial task. To generate a choice set we apply the following criteria and filtering rules:

  • Exclude incomplete and unrealistic journeys. As a data cleaning step, journeys with unrealistic travel times (shorter than 5 min or longer than 120 min) are excluded, as well as journeys with 4 or more interchanges, as this points to either a data error or to a service disruption.

  • Include metro journeys for which load-weigh data is available. Given the above-mentioned importance of relying on an independent data source for metro crowding levels, we only include passenger journeys made exclusively on lines with load-weigh data is available (Central and Victoria Line).

  • Include metro journeys between station pairs with unambiguous routing. As we cannot empirically obtain the exact route metro passengers take between station entry and exit, we only include metro journeys for which the chosen route can be determined based on the network topology. This is required to reliably infer the appropriate in-vehicle time and waiting time corresponding to the route a passenger took between station entry and exit. For this, we calculate the 2-shortest paths between each metro station pair and only include station pairs for which either one feasible, acyclic path exists, or for which the 2nd shortest path ≥ 1.5 * 1-shortest path.

  • Include journeys made in the appropriate time period. For the two uncrowded off-peak models, only journeys entirely made between 10–14 h or 20–23 h are included so that in-vehicle time and walking time coefficients are not distorted by uncaptured crowding effects. To the contrary, for the crowded peak model only journeys entirely made in the AM peak (6–10 h) or PM peak (15–19 h) are included.

  • Only include origin–destination pairs with a sufficient number of observations for at least two different observed paths. As we rely entirely on observed passenger route choices, we only include observed paths for each origin–destination pair in our choice set. To be able to derive crowding perceptions solely from observed route choices, the above implies that there need to be at least two observed paths between each clustered origin and destination zone, which are physically different from each other. To prevent including paths which are only chosen during unforeseen disruptions, we require that the path probability of each path is at least 5%. In addition, each path needs to be chosen by at least five passengers per day on average.

  • Include OD pairs with the appropriate crowding level. Once origin–destination zones with at least two different paths are identified, it is checked whether there is sufficient crowding on at least one of the paths to be able to estimate the peak-based crowding model. This implies that the load factor (the passenger load divided by the seat capacity) of the metro path should exceed 50%, as above this level passengers start having to sit next to each other, which can result in an increased in-vehicle time valuation related to crowding. For the uncrowded off-peak model, the standing density of each path of an OD pair should not exceed 1 standing passenger per m2, to only include paths and OD pairs where crowding is not expected to affect route choice behaviour.

  • Exclude OD pairs with dominance. Exclude OD pairs where one route option is dominant over the other paths in terms of crowding levels, in-vehicle time and waiting/walking time, as this does not add any explanatory power to the model.

Choice identification

The resulting choice set inputs for all three models are summarised in Table 2. As can be seen, the number of observations included in the choice set for the pre-pandemic uncrowded model equals 50,494; for the post-pandemic uncrowded model 46,400; and for the post-pandemic crowding model 20,970, resulting in a large number of observations for each of the models. As shown in Table 2, for most origin–destination pairs there are two different observed paths included. For a few OD pairs there are three physically distinctive paths satisfying all of the above criteria. Due to the abovementioned minimum crowding level requirement for the post-pandemic crowding model (Model 3), fewer OD pairs satisfy the specified threshold compared to the two uncrowded, off-peak models (Model 1 and 2), thus resulting in a lower number of observations included in the model.

Table 2 Choice set description

After applying the choice set generation criteria, there are no PT journeys with interchanges between bus and/or metro included in the final choice set for our case study. This implies that each OD pair included in the choice set is composed of one metro path alternative with unambiguous routing between the entry and exit station on either the Central Line or Victoria Line, together with one or two bus paths between that same OD pair.

Model specification

In order to derive passenger crowding valuation, we estimate a discrete choice model based on the observed route choices between the origin-destinations pairs included in the choice set. The AFC system in place provides the observed route choices for individual passengers \(i\) between origin stop \(o\subseteq S\) and destination stop \(d\subseteq S\). Different observed paths between a certain OD pair included in the choice set are indicated by \({a}_{od}\in {A}_{od}\). Each 15-min time interval is indicated by \(t\), whereas entire time periods (AM peak, inter-peak, PM peak, evening) are indicated by \(T\).

We adopt a standard utility maximisation framework. To prevent biased estimates due to possible correlations between unobserved components of the different path alternatives \({a}_{od}\in {A}_{od}\), we explicitly account for overlap between paths using a path size correction factor as proposed by Ben-Akiva and Bierlaire (1999). By using a path size factor we add a deterministic term to the utility function which approximates the correlation between alternative paths. As a result, we can estimate a path sized logit (PSL) model with overlap correction whilst benefitting from a more convenient closed-form solution. Therefore, the total disutility of each path \(U(V,r,\varepsilon )\) is composed of the structural, deterministic utility component \(V\), a path size factor \(r\) and a random error term \(\varepsilon\) (Eq. 1). The probability \({P}_{a}\) for choosing each path \(a\) can then be calculated using the closed-form function shown in Eq. 2.

$${U}_{{a}_{od}}={V}_{{a}_{od}}+{\beta }_{psl}\cdot {r}_{{a}_{od}}+{\varepsilon }_{{a}_{od}}$$
(1)
$${P}_{{a}_{od}}=\frac{\mathrm{exp}({V}_{{a}_{od}}+{\beta }_{psl}\cdot {r}_{{a}_{od}})}{\sum_{{a}_{od}\in {A}_{od}}\mathrm{exp}({V}_{{a}_{od}}+{\beta }_{psl}\cdot {r}_{{a}_{od}})}$$
(2)

In line with the suggested formulation by Dixit et al. (2021), we adopt a node-based formulation of the path size correction factor \(r\) to reflect overlap between different PT route alternatives. Dixit et al. (2021) demonstrate that a node-based correction factor which captures the overlap between paths in terms of the number of decision points for passengers (boarding and transfer points) outperforms link based PSL models in terms of model fit when modelling PT route choices. This reflects the principle that overlap between different PT paths is only relevant for PT passengers at locations where they can actually make a decision—at boarding and transfer stops—rather than across all links of a path once boarded a certain PT vehicle. The node-based path size term is defined in Eq. 3, where \(|{s}_{a}^{b}|\) is the number of decision nodes for path \(a\) and \({\delta }_{s,a}\) is the node-route incidence between decision node \({s}^{b}\) belonging to route \(a\) (following the definition of Duncan et al. 2020). In case all paths \({a}_{od}\in {A}_{od}\) are direct paths without interchanges, the first boarding stop is the only decision stop of each path (i.e. \(|{s}_{a}^{b}|\)=1). In that case, the node-based path size term converges to a simpler formulation as defined by Eq. 4. When there is no overlap between the decision nodes \({s}^{b}\) of all paths \({a}_{od}\in {A}_{od}\), \({r}_{{a}_{od}}\) equals ln(1). In case \(|{A}_{od}|\) equals 2 and both paths overlap entirely in terms of decision nodes then \({r}_{{a}_{od}}\) equals ln(0.5), implying that a more negative value of \({r}_{{a}_{od}}\) indicates a higher degree of node overlap between different paths. As each OD pair in the choice set consists of one metro path alternative together with one or two bus paths, overlap between paths can only occur between the two bus paths. When an OD pair only contains one metro and one bus path, there is no overlap and \({r}_{{a}_{od}}\) thus equals ln(1).

$${r}_{{a}_{od}}=\mathrm{ln}\left(\sum_{j\in 1..\left|{s}_{{a}_{od}}^{b}\right|}\left[\left(\frac{1}{|{s}_{{a}_{od}}^{b}| }\right)*\left(\frac{1}{\sum_{{a}_{od}\in {A}_{od}}{\delta }_{s,a}}\right)\right]\right)$$
(3)
$${r}_{{a}_{od}}=\mathrm{ln}\left(\frac{1}{\sum_{{a}_{od}\in {A}_{od}}{\delta }_{s,a}}\right)$$
(4)

The structural part of the utility function \(V\) is a vector of observable route attributes with their corresponding weights as defined for the uncrowded off-peak models 1 and 2 (Eq. 5) and for the crowding model (Eq. 6). The alternative specific constants for modes bus \(b\) and metro \(m\) are meant to capture a generic mode preference based on the non-observed attributes are reflected by \({asc}^{b}\) and \({asc}^{m}\), respectively. We specify mode-specific in-vehicle time coefficients \({\beta }_{ivt}^{b}\) for bus and \({\beta }_{ivt}^{m}\) for metro, so that potential mode-specific differences in in-vehicle time valuation can be captured as previous studies found statistically significant differences between rail and bus in-vehicle time valuation (e.g. Bunschoten et al. 2013). A generic waiting/walking out-of-vehicle time coefficient \({\beta }_{wtt}\) is specified in the utility function as there is no strong behavioural evidence in studies so far of differences in passenger waiting time valuation between different PT modes. In our model \({\beta }_{wtt}\) is specified, in such a way that \({\beta }_{wtt}\) directly reflects the ratio between waiting/walking time and in-vehicle time valuation.

In this study there was no unique or pseudonymised passenger identifier available based on the passenger smartcard-id due to privacy regulations related to the possibility of potentially being able to identify individual passengers. This implies that it is not possible to estimate a panel effects model which corrects for possible correlations between route choices made by the same passenger, if multiple journeys made by the same passenger are included in the choice set. Instead, we therefore report the robust t-statistic and robust p-value as sandwich estimator with the aim of preventing an overestimation of the model coefficients.

$$\begin{aligned} V & = asc^{b} \cdot b + \beta_{ivt}^{b} \cdot t_{ivt}^{b} + \beta_{ivt}^{b} \cdot \beta_{wtt} \cdot t_{wtt}^{b} \\ & \quad + asc^{m} \cdot m + \beta_{ivt}^{m} \cdot t_{ivt}^{m} + \beta_{ivt}^{m} \cdot \beta_{wtt} \cdot t_{wtt}^{m} \\ \end{aligned}$$
(5)
$$\begin{aligned} V & = asc^{b} \cdot b + \beta_{ivt}^{b} \cdot t_{ivt}^{b} + \beta_{ivt}^{b} \cdot \beta_{wtt} \cdot t_{wtt}^{b} \\ & \quad + asc^{m} \cdot m + \beta_{ivt}^{m} \cdot t_{ivt}^{m} \cdot \left( {1 + \left( {\beta_{d}^{m} \cdot d^{m} } \right)} \right) + \beta_{ivt}^{m} \cdot \beta_{wtt} \cdot t_{wtt}^{m} \\ \end{aligned}$$
(6)

Attribute levels

The bus in-vehicle time \({t}_{ivt}^{b}\) can directly be calculated as the difference between the (inferred) alighting time and (observed) boarding time. The bus waiting time is calculated as half the actual headway between the specific bus each passenger boarded and its predecessor. Given the typically high frequencies of London bus routes within a metropolitan environment, we can justify the assumption that passengers arrive uniformly distributed at the bus stop. We use the actual headway rather than the scheduled headway between buses to calculate the passenger wait time, meaning that the impact of possible service irregularities on extended wait times is reflected in the average wait time value. For metro journeys only the station entry and station exit times are empirically observed from the AFC data. The uncrowded metro in-vehicle time \({t}_{ivt}^{m}\) is set equal to the scheduled run time between the relevant station pair in the relevant time period. Metro run times are not affected by traffic conditions and due to automated train operation (ATO) being in place on the two metro lines of consideration, there is very limited variation in metro in-vehicle times. The remainder of the time between station entry and exit is then attributed to out-of-vehicle time, which is the sum of walking time to/from the platform and waiting time at the platform. We cannot further disentangle \({t}_{wtt}^{m}\) into separate walking time and waiting time without having to make assumptions on the walking speed distribution of passengers and without considering the station layout of individual stations to determine walking distances, which is information that is not directly available. Since no PT journeys with interchanges between bus and/or metro are included in the final choice set as a consequence of the filtering rules applied, the inclusion of a bespoke coefficient capturing the valuation of PT transfers in the utility function becomes obsolete.

We use the standing density on-board the metro \({d}^{m}\) as a crowding metric, which reflects the average number of standing passengers per square metre as derived from load-weigh data for each route segment per 15-min time interval \(t\). The standing density equals zero if the passenger load \(q\) is smaller than the seat capacity \(\kappa\)—implying that all passengers can have a seat—and increases up to 4 standing passengers per m2 when all surface available for standing \(\theta\) has been used. In this study we test three different metrics for capturing the crowding perception associated with the standing density: the average standing density across all links of a passenger journey (Eq. 7), the standing density at the first link of a passenger journey upon boarding (Eq. 8), and the maximum standing density at the busiest point of the passenger journey (Eq. 9). This enables us to assess which formulation of standing density is most important for passenger’s crowding valuation. The coefficient \({\beta }_{d}^{m}\) is specified such that it reflects the in-vehicle time crowding multiplier as function of the standing density. The 15-min average standing density as observed in the post-pandemic choice set does not exceed 3 standing passengers per m2. This illustrates that post-pandemic metro crowding levels as observed for our case study are sufficiently high to be able to estimate a RP based crowding coefficient. To prevent extrapolation of our study results beyond the observed crowding range, we primarily focus on crowding levels up to 3 standing passengers per m2 in our analysis.

$${d}_{i}^{avg}=\mathrm{max}\left(\frac{\sum_{{e}_{i}\in {E}_{i}}\frac{{q}_{e}-{\kappa }_{e}}{{\theta }_{e}}}{|{E}_{i}|}, 0\right)$$
(7)
$${d}_{i}^{first}=\mathrm{max}\left(\frac{{q}_{{e}_{1}}-{\kappa }_{{e}_{1}}}{{\theta }_{{e}_{1}}}, 0\right)$$
(8)
$${d}_{i}^{max}=\mathrm{max}\left(\mathit{max}\left(\frac{{q}_{{e}_{i}}-{\kappa }_{{e}_{i}}}{{\theta }_{{e}_{i}}}, 0\right)\right) \forall {e}_{i}\in {E}_{i}$$
(9)

The expected in-vehicle time \({t}_{ivt}\) and out-of-vehicle time \({t}_{wtt}\) are used as attribute values in the choice model for both the chosen path and non-chosen path(s) of each OD pair. The expected values are calculated as the average times observed across all individual passenger journeys included in the choice set for each path \({a}_{od}\in {A}_{od}\) during each time period \(T\). The expected value of the standing density for a certain chosen or non-chosen path as included in the choice model equals the average value across all individual observations per 15-min time interval \(t\) for each path. In contrast to the average in-vehicle time and waiting/walking time which do not vary much within each time period \(T\), crowding levels are subject to more variation within each time period as demand is much more concentrated. When selecting a too large time interval to derive the average crowding level for, it can potentially average out unevenly distributed crowding levels across this time interval, thereby dampening the crowding level experienced and expected by passengers. Notwithstanding, when the time interval \(t\) is set too short, passengers will not have a clear expectation of crowding levels of (for example) individual metro trips.

Results and discussion

This section first discusses the estimation results of the three different models (Section "Results"), followed by a discussion on the implications of these results (Section "Discussion"). We refer to Model 1 for the uncrowded pre-pandemic model; to Model 2 for the uncrowded post-pandemic model; and to Model 3 for the post-pandemic crowding model.

Results

Maximum likelihood estimation is performed to infer the coefficients which best explain the observed passenger route choices for the three different models using PythonBiogeme (Bierlaire 2016). The Newton algorithm is used as iterative method to solve this non-linear optimisation problem, which converged after 9, 12 and 12 iterations for Model 1, Model 2 and Model 3, respectively.

The initial and final log-likelihood, Rho-square and Rho-square-bar, Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) for all three models are reported in the model estimation summary in Table 3. From this table, it can be seen that the Rho-square-bar of crowding model 3 (0.299) is 37% higher (8.1 percent point higher) compared to the Rho-square-bar of uncrowded post-pandemic model 2 (0.218). As mentioned in Section 2, all three models are estimated using a closed-form path-sized logit model. As a result, all three PSL models reached convergence within a computation time of less than 5 s on a regular i7 PC.

Table 3 Model estimation summary

Model estimation results are presented in Table 4. By inspecting all estimated coefficients, we can conclude that the signs of all coefficients are plausible and in line with a-priori expectations and findings reported by previous studies. All in-vehicle time coefficients \({\beta }_{ivt}^{b}\) and \({\beta }_{ivt}^{m}\) show negative values as expected, whereas the ratio between waiting/walking time and in-vehicle time \({\beta }_{wtt}\) expectedly shows a positive sign for both uncrowded models 1 and 2. For model 3, we have fixed the ratio between waiting/walking time and in-vehicle time \({\beta }_{wtt}\) to the value found in the off-peak model estimated for the same post-pandemic date range, as we don’t reasonably expect a difference in average waiting time valuation relative to in-vehicle time between peak and off-peak. Furthermore, the coefficient \({\beta }_{d}^{m}\) reflecting the crowding in-vehicle time multiplier is positive, confirming an increasing in-vehicle time valuation when on-board crowding levels increase. The absolute value of the robust t-value is larger than 1.96 for all estimated coefficients, which confirms that our results are statistically significant. Except for the standing density coefficient \({\beta }_{d}^{m}\) (robust p = 0.0274), the robust p-values of all other coefficients are smaller than 0.01, thus indicating that our results are highly significant.

Table 4 Model estimation results

Discussion

Results from uncrowded models

For all three models the alternative specific constant is positive for metro whilst fixed to zero for bus. This suggests that passengers have an overall preference for travelling by metro over bus based on the non-observed attributes in the choice model, all other things being equal. A possible explanation for this can be the typically higher level of reliability of metro services compared to bus due to their own right of way. Bus journey times in London have a much higher variability due to the impact of road traffic conditions. Another explanation can be that journeys by metro are perceived as more comfortable due to a higher driving comfort and typically less abrupt acceleration and deceleration, resulting in a general preference for metro over bus.

Based on the ratio between the metro and bus in-vehicle time coefficients \({\beta }_{ivt}^{m}\):\({\beta }_{ivt}^{b}\) of the uncrowded pre-pandemic model 1, we find that on average uncrowded in-vehicle time on-board a metro is perceived 20% less negatively than uncrowded bus in-vehicle time. We know from earlier SP research on a so called ‘rail bonus’ that passengers value in-vehicle time on rail modes 67–80% less negatively compared to in-vehicle time by bus due to a higher perceived comfort level and—for example—the ability to spend in-vehicle time in a more productive way by doing work (Bunschoten et al. 2013). Our findings from the pre-pandemic model are in line with this, confirming that passengers value metro in-vehicle time not as negatively as bus in-vehicle time. \({\beta }_{wtt}\),the coefficient which reflects the ratio between waiting/walking time and uncrowded in-vehicle time, equals 1.94 for the pre-pandemic model. This implies that on average passengers value one minute of out-of-vehicle (walking or waiting) time as almost two minutes of in-vehicle time. This is in line with findings from many previous studies on PT walking and waiting time valuation. For example, based on a meta-analysis Wardman (2004) shows that—despite varying with mode and journey length—waiting and walking time valuations are often centred around twice the value of in-vehicle time. Bovy & Hoogendoorn-Lanser (2005) found that waiting time is valued 2.2 times as much as the in-vehicle time value based on SP research, whereas Yap and Cats (2021) found a ratio of 1.62 between waiting time and in-vehicle time based on an RP study of PT journeys in Washington, DC. This confirms that the estimation results from our pre-pandemic baseline model are in line with previous study results, thus providing confidence that our choice set generation criteria and proposed modelling approach are suitable for deriving passenger preferences for the post-pandemic era as well.

When analysing the results of the post-pandemic uncrowded model there are two interesting findings. First, the ratio between the metro and bus in-vehicle time coefficients \({\beta }_{ivt}^{m}\):\({\beta }_{ivt}^{b}\) for model 2 shows that metro in-vehicle time is now on average valued 15% less negatively compared to bus in-vehicle time. Whilst this still confirms a generic passenger preference for metro over bus regarding in-vehicle time, this result suggests that the relative attractiveness of the metro compared to bus has decreased somewhat in terms of in-vehicle time. A possible explanation is that since the COVID-19 outbreak passengers value travelling in enclosed, underground environments such as a metro system more negatively than pre-pandemic, as these might be perceived as areas with higher infection risks. In contrast, bus travel on the surface with frequent door openings at stops and the possibility for passengers to open windows can be perceived as a travel mode providing better ventilation and thus reducing COVID-19 infection risks. Second, we see that on average out-of-vehicle time is perceived 1.92 times more negatively compared to uncrowded in-vehicle time in the post-pandemic model, as reflected by \({\beta }_{wtt}\). As \({\beta }_{wtt}\) remains almost unchanged between the pre-pandemic and post-pandemic off-peak models, we can conclude that PT waiting/walking time valuation relative to in-vehicle time did not significantly change since the COVID-19 pandemic.

For all three models the path size logit coefficient \({\beta }_{psl}\)—which reflects overlap between the route alternatives—is significant and negative. As mentioned in the Section “Model specification” of this paper, the node-based path size correction factor as used in this study becomes more negative when more alternatives of a certain OD pair share the same boarding or transfer stop. Therefore, the negative sign implies that overlap between PT route alternatives in terms of boarding / transfer stops is generally perceived as positive by passengers, as the utility decreases with fewer (or no) overlap between the paths of a certain OD pair. This is in line with the most recent findings from Dixit et al. (2021), who also found that passengers prefer (node-)overlapping routes compared to completely distinct routes, possibly due to a higher degree of resilience being provided whilst waiting at a certain PT stop.

Results from crowding model

For the post-pandemic crowding model the estimated metro crowding coefficient \({\beta }_{d}^{m}\) is significant at a 95% significance level, with the robust t-statistic of 2.21 being larger than 1.96. The value of this coefficient implies that after the passenger load on-board the metro reaches the seat capacity, the in-vehicle time valuation increases by 0.42 for each increase in the average number of standing passengers per square metre. This means that the in-vehicle time multiplier increases from 1.0 when all seats are occupied without standing passengers, to 1 + (3*0.42) = 2.26 when a crowding level of 3 standing passengers per m2 is reached. It should be noted that this crowding coefficient reflects the average in-vehicle time valuation across both seated and standing passengers. As we cannot empirically infer which passengers had a seat during their journey, it is not possible to further disentangle this coefficient into separate coefficients for seated and standing passengers. In addition to a linear crowding curve, we also tested a piece-wise linear function and quadratic function. However, no significant results were found for these functions, suggesting that the in-vehicle time valuation increases linearly with an increasing standing density. This is in line with the findings from Hörcher et al. (2017) and Tirachini et al. (2016), who also found linear relations between standing density and the in-vehicle time crowding multiplier.

When we linearly extrapolate the estimated crowding coefficient—as observed crowding levels averaged per 15-min interval in our choice set did not exceed 3 standing passengers per m2—we can estimate that the in-vehicle time multiplier would be equal to 2.68 when a train operates at full capacity (assumed at 4 standing passengers per m2). This indicates that in the post-pandemic era metro passengers value in-vehicle time more than 2.5 times as negative when travelling in very crowded circumstances, compared to uncrowded in-vehicle time. Since bus crowding levels are not explicitly incorporated in this model, it is possible that bus crowding is implicitly reflected in either the alternative specific constant \({asc}^{b}\) or in-vehicle time coefficient \({\beta }_{ivt}^{b}\) for bus. This could explain why the bus in-vehicle time coefficient is more negative relative to the metro in-vehicle time coefficient in this model, compared to the ratios \({\beta }_{ivt}^{m}\):\({\beta }_{ivt}^{b}\) found for the two off-peak, uncrowded models 1 and 2 (a ratio of 0.55 compared to 0.80 and 0.85, respectively).

As mentioned in the Section “Attribute levels”, we tested three different crowding metrics for standing density, being the standing density a passenger experiences upon boarding, the average standing density across the entire passenger journey, and the maximum standing density at the busiest point of the journey for each individual passenger. The model using the crowding level upon boarding (\({d}^{first}\)) was the only model resulting in a statistically significant standing density crowding coefficient. This suggests that the PT crowding level upon boarding best captures passengers’ crowding valuation. An explanation for this is that the crowding level upon boarding is related to the passenger’s seat probability, as this is an important determinant of whether a passenger will be able to have a seat during the entire journey. This is for example confirmed in the study to in-vehicle time valuation conducted by Hörcher et al. (2017) who found a statistically significant coefficient for standing probability in addition to the coefficient reflecting the standing density. Our results suggest that implementation of the ability of using either the first or average journey crowding level in appraisal processes can be a worthwhile direction to be explored.

In Fig. 1 we compare the in-vehicle time crowding curve as derived from our model to previous studies. For comparison purposes we show the in-vehicle time multiplier between 0 and 4 standing passengers per m2 for all studies and interpolate or extrapolate where required. We first discuss the results from three RP based studies which used a comparable methodology to our work to derive pre-pandemic crowding multipliers from large-scale passenger demand data. These results show that at 4 standing passengers per m2 the estimated metro network crowding multiplier is comparable for the three RP based studies, ranging between 1.65 in Hong Kong (Hörcher et al. 2017), 1.73 in Singapore (Tirachini et al. 2016) and 1.84 in Washington, DC (Yap and Cats 2021). A RP based study to crowding valuation derived from observed, pre-pandemic route choices on an Asian metro network by Bansal et al. (2022a) yielded a crowding multiplier of 1.47 at extreme crowding levels. This is somewhat lower than found in the three above-mentioned RP studies, which might stem from the fact that the crowding multiplier in Bansal et al. (2022a) was derived from compensatory route choices only.

Fig. 1
figure 1

In-vehicle time crowding multiplier as function of standing density

Based on SP experiments conducted in Santiago de Chile before and after the pandemic, a pre-pandemic crowding multiplier of 2.01 was found at 4 standing passengers per m2 by Batarce et al. (2016). This multiplier is higher than the multipliers found in the three afore-mentioned pre-pandemic RP studies. When comparing this to the results of the SP study performed in Santiago de Chile after the pandemic, one can see that the crowding curve—as average across the latent class models for male and female respondents as estimated by Basnak et al. (2022)—is significantly steeper in the post-pandemic study. For a scenario where 100% of the passengers would wear face covering, the post-pandemic SP based crowding multiplier in Chile equals 2.54 at 4 standing passengers per m2 compared to 2.01 pre-pandemic. These SP based studies in Chile thus provide strong evidence that respondents value crowding more negatively since the pandemic.

Specifically for London we refer to two pre-pandemic studies on crowding valuation. The first one is a RP study performed in the 1988 by Transport for London, of which the results are summarised in Transport for London’s Business Case Development Manual (Transport for London 2019). This study focused on metro station Seven Sisters, where during peak hours one third of the trains started from this station (empty) whilst the other two thirds of the trains started three stations further upstream. Crowding valuation was derived from platform observations whether waiting passengers decided to skip a crowded arriving train to wait for a next, empty train starting at this station. The resulting crowding multiplier of 2.32 at 4 standing passengers per m2 is notably higher than other pre-pandemic studies, although this study has been performed several years ago using a different methodology than more recent RP studies. Secondly, we can derive a more recent average pre-pandemic crowding multiplier using the SP based coefficients estimated for seated and standing passengers by Whelan and Crockett (2009), based on the average seat and total capacity of London metro stock. At 4 standing passengers per m2, the study by Whelan and Crockett (2009) results in an average pre-pandemic in-vehicle time multiplier of 1.77. Our equivalent RP based estimated crowding multiplier for London in the post-pandemic era of 2.69 provides strong evidence that PT passengers value metro crowding substantially more negatively in London since the COVID-19 outbreak compared to both pre-pandemic studies in London, despite their differences in methodology. The crowding valuation found in our study is comparable to the post-pandemic crowding valuation derived from SP research for Santiago de Chile by Basnak et al. (2022), which gives confidence in the magnitude of our estimated crowding coefficient. For interpretation purposes we remind the reader that there was no obligation anymore for passengers in London to wear face covering whilst travelling during the period of data collection in June 2022, thus reflecting crowding valuation in a more steady-state rather than during different stages of COVID-19 pandemic recovery. The crowding multiplier of 1.73 at full capacity found in Bansal et al. (2022b) based on an SP experiment conducted in Spring 2021 among pre-pandemic users of London’s metro system is lower than our results. This might be explained by the fact that this study separately estimated the impacts of vaccination rate, the daily number of COVID-19 cases and mandatory face covering on the in-vehicle time multiplier, or because it is not certain whether all pre-pandemic users included in the experiment had experienced post-pandemic metro travelling at that time.

Conclusions and recommendations

Conclusions

In this study we derive the crowding valuation of public transport passengers using London’s metro network in the post-pandemic era entirely based on observed, actual passenger route choices. In contrast to previous studies on post-pandemic crowding valuation, we adopt a revealed preference methodology which relies entirely on large-scale, empirical passenger demand data. Our study results contribute to a better understanding on how on-board crowding in urban public transport is perceived in a European context since the outbreak of the COVID-19 pandemic.

Based on the three estimated discrete choice models we can formulate three main conclusions. First, the average post-pandemic out-of-vehicle time valuation remains unchanged at almost twice the uncrowded in-vehicle time valuation. We found a ratio between walking/waiting time and in-vehicle time of 1.94 pre-pandemic and of 1.92 post-pandemic, based on which we conclude that the relative waiting/walking time valuation did not change since the COVID-19 pandemic. Second, whilst our study results confirm that there is a generic passenger preference for metro over bus regarding in-vehicle time, we find that the relative attractiveness of metro compared to bus has decreased somewhat post-pandemic in terms of in-vehicle time. This possibly echoes a more negative perception of metro travelling in a more enclosed, underground environment compared to bus travel. Third, our crowding model estimation results show that passengers’ average in-vehicle time valuation increases by 0.42 for each increase in the average number of standing passengers per square metre. In contrast, across the six studies to pre-pandemic crowding valuation as shown in Fig. 1, the in-vehicle time valuation increased on average by 0.22 for each increase in the number of standing passengers per square metre. Compared to the results of these SP and RP studies conducted before the pandemic in London and elsewhere we thus clearly see a steeper slope of the post-pandemic crowding curve as found in our study, based on which we can conclude that PT passengers value crowding more negatively since the COVID-19 pandemic.

Study limitations and recommendations

We formulate several study limitations and subsequent recommendations for further research. First, we recommend follow-up research dedicated to a more systematic monitoring of crowding valuation over time. A limitation of our work is that we only used data from June 2022. Although PT demand in London has stabilised by this period in time, we cannot assert whether this also holds for crowding perceptions. We therefore recommend the estimation of crowding valuation for different cross sections in time, for example on a quarterly base since the start of the COVID-19 outbreak. This would enable monitoring the dynamics of crowding valuation. For example, it could provide insights into whether crowding valuation is reversing back towards pre-pandemic levels when passengers are more exposed to crowded environments again.

Second, we recommend studying the external validity of our work by expanding the study to different metro lines, cities and countries. Our study relies on PT route choice related to two metro lines in London equipped with load-weigh systems. We recommend to study crowding valuation on different metro and rail lines in London when APC systems allow for this, and to perform a wider comparison of post-pandemic crowding valuation between different cities and countries. This could provide insights into the representativeness of the data related to the two included metro lines. Additionally, it can highlight possible cultural differences in crowding valuation, as well as insights in the role of different public health measures (such as mandatory face covering or social distancing) between different countries on crowding valuation.

Third, in our study the model specifying the crowding level upon boarding was the only model yielding a statistically significant crowding coefficient and providing the best explanatory power, which we expect to be related to the passenger’s seat probability when boarding. However, most crowding valuation studies previously conducted found statistically significant results when using the average crowding level of a passenger journey rather than the crowding level upon boarding. Follow-up research is therefore suggested to assess whether similar results would be found when valuing crowding during different cross-sectional time periods and for different cities or countries.

Fourth, we recommend exploring heterogeneity in crowding valuation between different passengers. In this study we have estimated the average passenger crowding valuation based on a closed-form PSL model. An interesting direction for follow-up research would be to test the estimation of a latent class model and/or mixed logit model, to explore how much variation exists in crowding valuation between different passenger segments or between individual passengers.

Finally, we recommend assessing the extent to which the change in crowding perceptions impacts the effective capacity of PT networks and whether the higher crowding valuations contribute to observed reductions in PT demand levels.