Introduction

Irregularities and delays, or reliability deficiencies in short, are phenomena in public transport (PT) provision that have a negative impact on passengers’ perceptions and time use (van Oort 2011), as well as on operating costs (Fadaei and Cats 2016; Watkins et al. 2020). Moreover, reliability has been shown to be a key factor in the choice between PT and other modes of transport according to a wide range of studies on passengers’ attitudes and perceptions, (Beirão and Sarsfield Cabral 2007; dell’Olio et al. 2011). In fact, travel time uncertainty, being the opposite of reliability, has been found to have a higher disutility than nominal travel time (Bates, 2001). Unfortunately, and despite of the huge potential implications on passenger satisfaction, patronage and operations, this quite common but problematic aspect of public transport operations is usually not taken into account in appraisal methods that use path choice algorithms from state-of-the practice macroscopic transport models (Carrel et al. 2013). Despite an urgent need for empirically grounded perspectives on the behavioural consequences of reliability changes in order to facilitate appropriate policy measures in terms of PT network design, research regarding the long-termFootnote 1 behavioural effects (Carrel et al. 2013; Le et al. 2020) and the potential subsequent impact on the formation of habits (Evans 2008; Gärling and Kay 2003; Ouellette and Wood 1998; Verplanken et al. 1998) that can result from perceptions of (un)reliability, is scarce. Apart from small-scale surveys by Carrel et al. (2013) and Le et al. (2020), previous research into longitudinal change in PT use has mostly covered how short-term travel patterns (Goulet-Langlois et al. 2018; Trépanier et al. 2007) and long-term mode choice (Chatterjee 2010; Thøgersen 2006) may be impacted by exogenous factors such as changes in transport provision. The reason for the relative scarcity of research into medium- to long-term (at least one year) behavioural effects from PT reliability changes may be attributed to two fundamental challenges in obtaining empirical evidence for making robust conclusions regarding this issue. First, the sole measure of reliability has been a matter of continuous debate over at least the last 50 years, during which time innumerable attempts have been made to identify a good measure (Bagherianet al. 2016; Currie et al. 2011; Diab et al. 2015; Gittens and Shalaby 2015; Jenelius 2018; Noland and Polak 2002; Uniman et al. 2010). Second, detailed data regarding the behaviour of PT passengers have at least historically been difficult to capture, particularly behavioural responses to “vague” events such as fluctuations in reliability. Stated choice experiments have shed light on some phenomena that relate to reliability, such as passengers’ approaches to risk (Li and Hensher 2012) and other attributes in a path choice context (Noland and Polak 2002). However, the inherent methodological uncertainty related to the design and wording of these studies, particularly when studying individuals’ perceptions of the likelihood of travel time variations, as well as their reliance on stated intentions instead of longitudinally revealed behaviour, makes expressing strong general claims about sensitivity in terms of perceptions as well as responses, somewhat hazardous (Bates et al. 2001; Le et al. 2020). However, with the growing availability of large-scale disaggregate datasets covering supply-side automatic vehicle location (AVL) data, demand-side automatic fare collection (AFC) data, particularly “smart” travel cards (Kurauchi and Schmöcker, 2017), and automatic passenger counters (APC), the amount of research into different aspects of detailed passenger responses to changes in PT provision has grown considerably. However, as identified by, for example, Carrel et al. (2013), the process of medium- to long-term (from one year to the next) strategic passenger response to PT service reliability changes has been poorly researched empirically, although some successful simulation approaches (An et al. 2014) may be noted.

In this paper, we explore PT passengers’ long-term revealed sensitivity, in terms of their propensity to change travel paths over time in response to (observed) change(s) in travel conditions—in our case, PT service departure reliability, utilising a fixed effects panel approach. Travel path change, as expressed in terms of the relative frequency of line routeFootnote 2 usage, was surveyed in two panel waves using AFC smart card transaction data, while AVL and timetable data formed the basis for two departure reliability indicators that measure headway regularity and departure schedule adherence (punctuality), respectively. Thus, we have broken down the concept of PT service departure reliability into these two observable components. According to previous research, punctuality may be particularly influential when passengers are expected to adapt to scheduled departure times (cf. exposition presented by Carrel et al. (2013)). As succinctly summarised by Rudnicki (1997), from an operational perspective, it may be defined as any deviation from timetable adherence, measured in absolute or relative terms. Individuals who wish to use PT line routes that operate with long headways perceive lack of punctuality as particularly negative [when they adhere to schedule-based waiting behaviour, as discussed by for, instance, Liu et al. (2010)]. Usually, and in practice, a tolerance threshold is set for when a lack of punctuality is perceived as onerous by a majority of passengers, and the performance of the transport system is benchmarked according to the proportion of departures that exceed this deviation threshold value. Regularity, on the other hand, may be more associated with single (or multiple) line routes that run with a short (combined) headway. In such cases, the distribution of departures may be perceived as being more critical to tentative passengers than the strict timetable adherence of each departure time (Rudnicki 1997).

Long-term sensitivity in this case should be understood as being the degree to which individual passengers, over a measurement period of at least months, marginally change their relative usage rate of a path (expressed as a line route) in a given origin–destination pair in response to a (marginal) change in service departure reliability of that path. There should be at least two different alternative line routes and a non-zero difference in reliability for each of these line routes during the measurement period. Due to earlier findings of behavioural differences across trip purposes (Berggren et al. 2019), we controlled for day type (weekday versus weekend day) to check for systematic differences in long-term sensitivity to reliability changes assuming a different average composition of trip purposes across day types.

Thus, the overarching research hypothesis we wished to explore was that:

  1. (a)

    Passengers perceive a gradual change in their marginal utility of a certain path as the reliability of that path changes over time.

    Therefore,

  2. (b)

    when a path is subject to an increase/decrease in departure reliability, this is reflected in the behaviour of passengers insofar that the proportional use of this path increases/decreases, all else being equal.

This second sub-hypothesis, measurable in contrast to the first, is conditional upon the existence of a considered choice set (potential trade-off) and that the alternatives of this choice set remain or improve differently in standard when relative reliability changes between paths. Thus, a generalisation of the research question may be formulated as to whether or not it is possible to link changes in the relative usage of a certain path to the relative change in departure reliability compared to other paths in the choice set of an individual passenger, when changes in other attribute values such as departure frequency, travel time variability and travel time are controlled for.

In a conventional discrete choice framework, there are latent and random utilities assigned to each choice option as a function of the attributes of each and every potential choice option in a considered choice set. It should be noted that, in contrast to that conventional framework, we studied only the relative usage of (a subset of) paths that were chosen at least once. Therefore, in our analysis the observation of increased (decreased) relative usage over time (boolean) is the independent variable, and relationships are represented by a binary logistic regression approach. In the generalised linear regression approach thus applied, the binary “choice” was between the options (1) To change (increase or decrease) the use of a specific path (line route) or (2) To not change the use of this path, for each observed OD pair. A latent and random utility was attributed to option (1) as a function of how the path-specific attribute values had changed, while the utility of option (2) was assumed to be constant with random variation. Since the dependent variable is dichotomous (change versus no change), this binary logistic regression may be regarded as a natural reformulation of the choice process.

The rest of the paper is organised as follows. First, in section "Research background", we outline previous research regarding the measurement of reliability and demand. Next, section "Service reliability and path choice" describes the theoretical framework regarding behaviour and PT service reliability, along with our rationale behind the choice of reliability measures and analytical approach. Section "Data and research setting" outlines the data, design, and analysis of our case study in Scania, Sweden, followed by sections "Results" and "Discussion", in which we present and discuss the results from the case study. Finally, section "Conclusion" contains conclusions drawn from the case study, along with research ideas for the future regarding the study of behavioural responses to reliability changes in PT.

Research background

Reliability and choice in PT

The importance of how uncertainty and risk affect individual preferences in choice processes has been a topic of research for the last 40 years or so, ever since Kahneman and Tversky showed the existence of non-linear, non-reciprocal response patterns in such circumstances (Kahneman and Tversky 1979; Tversky and Kahneman 1981). This phenomenon has also been shown within the context of travel time uncertainty at path choice; although primarily for car travel and using stated preference (SP) surveys (Abdel-Aty et al. 1995; Avineri and Prashker 2004). Here, a number of framing-related issues unfortunately cloud the validity of the presented results to some extent. Noland and Polak (2002) further discuss these framing dilemmas in terms of individuals’ sensitivity, or perceptual limits, vis-a-vis travel time uncertainty (i.e., what deviations from schedules are actually perceived as delays and how do passengers respond?). According to the authors, significant issues concern, on the one hand, how these dilemmas are presented in terms of trade-offs between probabilities and absolute magnitudes of different delays in SP surveys and, on the other, how they depend on context that, in the PT realm, essentially includes network structure and service frequency. Moreover, the magnitude of travel time deviation has been empirically shown (Yap et al. 2017) to have non-linear effects on path choice. Also, as Heiner (1983) highlights, successive learning from experience will itself affect the adaptability to uncertainty—ultimately to minimise mental stress and disutility. Here, the amplitude (occurrences of major disruptions) may have a more adverse effect on path choice than standard deviations (common, minor departure perturbations)—a notionFootnote 3 supported by both prospect theory (Kahneman and Tversky 1979) and empirical results (Diab et al. 2015; Yap et al. 2017). An empirical account of the relationship between perceived reliability and path choice is presented by Carrel et al. (2013), in their survey of PT passengers, in which they find that departure regularity is the most important path-specific feature. They also mention the gradual change in passenger behaviour with respect to headway length and departure reliability, in which an interval may be discerned for PT passengers’ adaptation behaviour with respect to service departure frequencies F, between chiefly stochastic behaviour at “short” headways (H = 1/F) and mostly deterministic behaviour at “long” headways. However, the exact figure for this interval appears to be dependent on service reliability and information, with empirical results ranging from five minutes for reliable services (Frestad Nygaard and Tørset 2016; Luethi et al. 2007) to up to 30 min for less reliable services (Ingvardson et al. 2018). As noted by Furth and Muller (2006) and Trompet et al. (2011), passenger behaviour with respect to PT service reliability is also conditional on expected waiting times, which themselves are a function of both scheduled headway and its variability.

A number of attempts have been made to identify theoretically appealing and operationally feasible measures of reliability in PT. Currie et al. (2011) conducted a qualitative evaluation of ten different measures based on international expert judgement and found that excess wait times for high-frequency services and customer journey time delays were best suited in terms of availability and information value—the latter measure conditional on the availability of AVL data. However, the most comprehensive evaluation of reliability measure approaches to our knowledge is provided by Gittens and Shalaby (2015) in their account of 20 different measures in which both theoretical and empirical underpinnings were presented. They arrived at the conclusion that a useful reliability index should include both in-vehicle and wait time variation, and that it should take the service context, i.e., headway distributions, into account. Other researchers have proposed adjusted or completely new measures to cater to advances in data capture methods.

Jenelius (2018) used AVL and APC data to extract perceived travel times during periods of congestion and crowded conditions, based on disaggregate boarding and alighting figures on a stop and trip level. Bagherian et al. (2016) used scheduled timetables and AFC data to calculate the buffer times needed by passengers to compensate for reliability deficiencies during daily PT journeys. Dixit et al. (2019) used AFC and AVL data to calculate the behavioural impacts of both common, minor, deviations and large-scale disruptions to the provision of PT, also taking composite multimodal trips into account. PT passenger path choices made under uncertainty, as the topic is described by Noland and Polak (2002), are explicitly considered in the study by Yap et al. (2017). In their algorithm, they investigate the impact on multimodal PT trips from both common, small-scale variability in headways due to driver behaviour and varying passenger loads, and large-scale disruptions caused by, for example, technical failures or road closures.

Trip patterns from smart card data

The inherent difficulty in obtaining disaggregate data describing trip patterns made by PT passengers comprises two principal challenges when dealing with AFC systems that only register boarding events: (a) The inference of alighting locations, regardless of whether they are points for transfers or final destinations, and (b) the generation of complete origin–destination (OD) trip itineraries in order to obtain OD matrices. Critical factors of the latter include the degree of multi-leg PT trips in the system and the amount of location information provided by the AFC system. The first component of difficulty is usually approached by using some kind of mirroring of the boarding profile (Navick and Furth 2002; Trépanier et al. 2007), with slightly different approaches for potential transfers and other activity destinations respectively. Trépanier et al. (2007) reported an 80 per cent success rate in peak hours for their approach to infer alighting locations (bus stops). Barry et al. (2009) used a schedule-based shortest path algorithm to infer trip paths, thereby generating complete OD trips when no explicit location information was available for the intermediate trip points. Exact information on locations is sometimes available in contemporary AFC systems, but when it is lacking, AVL data have usually been merged with travel card validations in order to obtain full trip itineraries, one example of which has been reported by Farzin (2008). Different approaches have been reported regarding how to distinguish between and measure transfers, among other kinds of activities, in trip itineraries from travel cards. Different kinds of distance cut-offs have been utilised (Munizaga and Palma 2012; Zhao and Rahbee 2007), as have scheduled headways of line routes boarded after transfer (Gordon 2012; Wang et al. 2011). Seaborn et al. (2009) take bus in-vehicle times into account in their calculation of transfer time thresholds from travel card transactions at bus boardings and station gate passings while Chu and Chapleau (2008) include a maximum transfer walking distance to distinguish transfers from activities. Finally, clustering of PT stops and stations has been introduced by Goulet-Langlois et al. (2016) in their approach to infer activities based on likely activity locations.

Service reliability and path choice

This section outlines the rationale behind our choice of PT service reliability indicators and the theory and practice behind our approach to gauge passenger responses to changes in reliability, as evidenced by individual passengers’ path choice.

In line with previous research regarding PT passengers’ response to uncertainty variation, we have set out to explore the long-term impact on path adjustment from passengers’ assumed perception of uncertainty during trips (Diab et al. 2015). Here we rely on the assumption that this perception of the predictability of the PT system may be sub-divided into (a) time-table adherence and (b) headway variability. Both these indicators of reliability are associated with the individual passenger’s desire to minimise the amount of non-useful time spent travelling, including time components such as waiting and in-vehicle travelling—components which have a certain degree of variation or fuzziness which should be specifically associated with negative perceptions of travel time uncertainty.

Similar to An et al. (2014), we propose an index to measure headway regularity that is normalised by headway durations, but only to represent the headway variation as initially proposed by Osuna and Newell (1972) in their formulation of a waiting time function. This is to be able to distinguish the relative impact on path choice from schedule adherence and headway variability. Thus, our regularity index takes the coefficient of headway variation CVh as a starting point (Eq. 1):

$${CV}_{h}={\left[\frac{var\left({H}_{r}\right)}{{E}^{2}\left({H}_{s}\right)}\right]}^{1/2}$$
(1)

where Hr is the actual, and Hs the scheduled, service headway of interest. In our application, Hs was derived from the number of scheduled departures per hour, and the value of this line route-based measure per two-week analysis period was obtained by accounting for the relative weight of the number of trips recorded by hour and analysis period in the farecard dataset. Hr, on the other hand, was derived directly from the AVL data regarding each PT line route.

To standardise the values of the regularity index to the interval (0,1] by taking scheduled headway into account and thus enable a comparison with measures standardised in the same numeric interval such as punctuality, we propose an adjusted measure, formulated in Eq. (2):

$${RI}_{h}=1/\left(1+\frac{var\left({H}_{r}\right)}{{E}^{2}\left({H}_{s}\right)}\right)$$
(2)

The mathematical properties of this index reflect an intuitive sense of behavioural responsiveness, since, as previously mentioned, the headways of PT line routes should adhere to those time spans within which passengers intending to travel by these line routes are expected to arrive randomly at the first boarding point. Given this restraint, as variation in actual headway approaches zero, RIh may be approximated by one (for “very reliable services”), while, as the variance in actual headway moves toward infinity (for “very unreliable services”), RIh approaches zero asymptotically. As noted by Furth and Muller (2006), CVh is conditional on the headway itself, in which the variation of each departure time (schedule adherence) becomes more important for long headways, while the interval between successive departures dominates at short headways as the numerator of Eq. (1) increases in relation to the denominator.

In order to gauge punctuality in an operationalised way using a binary measure, it is propitious to define a threshold value of departure deviation from schedule, above which a departure may be regarded as non-punctual (Currie et al. 2011). To find such a value, we tested both the quite common five-minute threshold value for lateness, as referred to by Currie et al. (2011), and a value that could also measure even smaller deviations. Due to its higher explanatory power, we decided to use two minutes as the final threshold value for the punctuality threshold value. The reason to not set it at zero is to partially exclude errors of measurement and rounding without causing unnecessary data loss. The chosen punctuality index may thus be formulated according to Eq. (3):

$${P}_{L}=\sum_{v=1}^{{V}_{L}}\sum_{s=1}^{{S}_{L}}{I}_{s,v}/{N}_{{S}_{L,}{V}_{L,}}$$
(3)

where \(\left\{\begin{array}{c}{I}_{s,v}=1 {if} \, {an} \, {actual} \, {departure} \\ {deviates} \, {at} \, {least} \, 2 \, {minutes}\\ {from} \, {the} \, {schedule}, \, {and}\\ {I}_{s,v}=0 otherwise\end{array}\right.\), PL is the punctuality index (proportion of non-punctual departures) for line route L, s is a stop or station in the set of stops or stations SL for line L, vL is a vehicle run of line route L and NS,L is the total number of stops or stations along line route L.

As a measure of in-vehicle time variation, we introduce the OD-based reliability buffer time RBT (Gittens and Shalaby 2015; Uniman et al. 2010) defined as the difference between the 95 percentile and the median travel time (Eq. 4) per OD pair:

$${RBT}_{OD,L}={TT}_{95perc,OD,L,\{i\}}-{TT}_{median,OD,L,\{i\}}$$
(4)

The travel time for each passenger trip (TTi) and OD was calculated based on the departure and arrival time attributes of the temporally and spatially matched PT vehicle trip from AVL data, and the quantiles TT95perc,OD,L and TTmedian,OD,L were calculated based on the sets of trips {i} for each day type (weekday or weekend day), OD pair OD and line route L.

For each OD pair in the AFC dataset, the line route(s) registered for fare card boarding transactions per card ID were assumed as being the “choice set” for each passenger, in which each option comprised (a sequence of) line routes between origin and destination stops or stations. In this setup, the reliability measures were associated with the line route level only, regardless of date or running direction. The rationale here was that passengers may adapt their behaviour already in their trip planning rather than taking account of specific path-based reliability properties en-route during a trip (which they had less chance to adapt to). Thus, the intention was to capture general perceptions of service reliability per line route rather than the specific reliability for each vehicular trip.

Data and research setting

A selection of 20 PT line routes (bus and regional trains) was made for our analysis, covering principal travel demand patterns within the commuting region of south-west Scania in southern Sweden, particularly within and between the cities of Malmö and Lund (Fig. 1). To enable a panel analysis, two two-week time frames separated by one year were selected for data extraction from AVL and AFC databases, from November 2016 and 2017, respectively. The main rationale as to the durations and spacing of the panel waves was to gather sufficiently large datasets for each wave to enable an analysis of changes in long-term travel strategies due to changes in marginal service reliability without getting into calculation issues due to the sheer size of the datasets. A similar approach was applied by Le et al. (2020) in their analysis of PT usage changes due to changed service satisfaction.

Fig. 1
figure 1

Map depicting the geographical context, extent and line types of the PT subnetwork of 20 line routes used in this study (Note that local routes 731–750, running in the city of Lund, are not specifically indicated due to spatial limitations)

We assumed an inflection point between short and long headways to 12 min, reflecting the midpoint of the interval [10,15] minutes mentioned in the Transit Capacity and Quality of Service Manual (Board, National Academies of Sciences, and Medicine 2013), and above which the majority of PT passengers start consulting timetables and scheduling starts to become the most important factor. Thus, shorter headways imply random arrivals at origin boarding points among PT passengers, and one would expect a stronger focus on headway regularity in order to minimise the expected wait times.

Properties of the PT network under study

The selected line routes of the case study comprise six urban bus routes, five suburban/regional and nine regional train routes with characteristics indicated in Table 1. In the autumn of 2017, substantial road works were carried out (construction of a new light rail line in the city of Lund) affecting the punctuality and/or regularity on line routes 160, 166, 169, 731, 733, 736 and 750.

Table 1 Characteristics of the line routes included in the study. U—local city route, S/R—suburban or regional bus route, T—train route

The scope and location of the case study, and thus the selection of line routes, has been determined by three principal criteria: (a) There should be enough demand (card transactions) during at least two measurement periods one year apart to be able to obtain a sufficiently large dataset to enable panel analysis; (b) there should be a sufficient number of path options available between each origin and destination pair (reasonably representing the travel patterns of all PT travellers) and (c) the AFC and AVL data should be readily available for analysis.

The distribution of headways across line routes used by travel card holders in the respective datasets is quite heterogeneous (Fig. 2)—reflecting both the configuration of the timetables and ridership across line routes, the latter as indicated in Fig. 3.

Fig. 2
figure 2

Distribution of the number of observed (travelled) OD-pairs per scheduled headway in the time frames (panel waves) of 2016 (left) and 2017 (right), respectively, from the AFC dataset

Fig. 3
figure 3

Distribution of analysis cases (CardID*line route*OD pair) across line routes in the panel data set

Data, its enrichment and refinement

Smart card transaction data covering trips made on the selected PT line routes of Table 1 were retrieved from the AFC system of the regional PT provider Skånetrafiken, covering two weeks in November 2016 and 2017, respectively. The data comprised time stamps, stop names, line route names, as well as card identifiers for each boarding event on the selected routes during the two-week time frames. In addition, data from stationary ticket vending machines (TVM) were used. However, the line route used for the actual trip was not specified in the TVM data but were inferred from subsequent on-board validations.

In order to enable analysis of the longitudinal behavioural effects of changes in service attributes such as reliability, trip types that were subject to the impact of changes in line route properties had to be obtained. Thus, it was necessary to identify recurring trips primarily involving commuting of any kind (Chu and Lomone 2016) that occurred during both panel waves. In our study, this was operationalised based on two main criteria (the first criterion also applied by Chu and Lomone): (a) The same card was used to make at least ten trips in both time frames (panel waves), and (b) origin–destination pairs and line routes recurred at least three times per time frame (panel wave).

As shown in Fig. 4, transaction data from a total of 244,790 cards were collected from the 2016 wave while the corresponding figure for the 2017 wave was 205,880 cards, of which 58,346 cards were identified in both waves from their ID.Footnote 4 The grand total of transactions was 1,588,833 in the 2016 wave and 1,246,416 in the 2017 wave. However, a total of only 10,230 cards remained after filtering out the cards that had been validated less than ten times in each wave, out of the 58,346 cards. This figure was further reduced in the identification of recurring trips (per line route and OD pair), along with some minor data loss in the trip itinerary generation process, leaving 1293 cards and 9211 and 9617 transactions per time frame, respectively. The final dataset for regression analysis was made up of relative frequencies, wave by wave, for each case of travel card ID, line route ID, and OD pair—i.e., the revealed choice of each travel card holder to choose a particular line route (included in their revealed personal choice set) for each OD trip during each panel wave.

Fig. 4
figure 4

The data refinement procedure of analysis cases to be included in the panel data set. One analysis case comprises the attributes of Line route (L), RIL,2016, RIL,2017, PL,2016, PL,2017, CardID, Number of trips in 2016L,(OD), Number of trips in 2017L,(OD), Trip proportionL,(OD),2016, Trip proportionL,(OD),2017, Origin stop and Destination stop. *Unique cards in each wave, but the same card may be found in both waves

In addition to the AFC data, reliability data, in the form of scheduled and actual departure times at all stops, were obtained from the AVL system for the same line routes as the card transaction data (cf. Table 1) and merged with the dataset of analysis cases on line route ID and time frame identifiers (the merging procedure is outlined schematically in Fig. 5), including separate identifiers for weekdays and weekends, respectively. The raw AVL data consisted of scheduled and actual stop arrival and departure times of all vehicular trips, during the two panel waves, of the same 20 line routes as the AFC dataset for all stops and stations serviced by them for passenger boarding and alighting. If data regarding actual departure times were missing, the scheduled times were used instead.

Fig. 5
figure 5

Schematic diagram of OD matrix generation process

Fig. 6
figure 6

Schematic representation of the inference of alighting stops from transfers with walking links [adapted from Gordon (2012)]. A passenger is recorded, in the form of a transaction in the AFC system, as boarding at stop A1 of line route A and subsequently as boarding on stop B4 of line route B. The closest stop of route A, A6, which is serviced by the service run of A recorded as previously being boarded and which is within the search radius of 4 kms, is then inferred as the alighting stop

Thus, two primary datasets were prepared for analysis—one with relative line route frequencies by panel wave included regardless of the day type and one in which each frequency was associated with either a weekday or a weekend day. Both data preparation and processing, as well as regression analysis, were performed using SAS software on an ordinary PC laptop with an i7 1.80 GHz CPU and 16 GB installed RAM.

Generation of OD matrices

The PT provider for Scania (Skånetrafiken), whose PT network is used in this study, uses tap-on only as fare validation for bus trips, and on-board validation by itinerant staff for train trips. Ticket types include pre-paid periodical or rechargeable cards used for pre-payment of single-trip tickets. All purchase of single-trip train tickets must be performed in advance, either at TVMs or from the bus driver on a preceding bus trip leg.

Alighting stops had to be inferred in order to generate OD trips for subsequent behavioural studies of passenger line route usage. This was accomplished based on a series of steps, roughly outlined in the upper part of Fig. 5 and largely inspired by the work of Gordon (2012). For train trips for which the fare was pre-paid using TVMs, the line route was inferred by sorting all transactions by travel cardID and time stamp.

In the lower part of Fig. 5, the inference of full origin stop—destination stop trips using activity detection is outlined (further described below).

Like Gordon (2012) and a number of other authors, we use subsequent boarding stops as candidates for alighting stops for a particular leg of a trip (mirroring). However, in order to allow for transfers that not only use one stop point, but also transfers between stop points in close proximity to each other, we include other candidate stops for possible alighting events within a network-congruent walking distance (according to a distance matrix) of four kilometres. This cut-off distance value is based on the walking tolerances that were surveyed and presented in Berggren et al. (2019) in the same temporal and spatial contexts as the travel card data obtained for the panel analysis. From this array of possible alighting stops per trip, one of them was selected that is on the same line route as the boarding stop of that particular trip in the AFC data. Since this matching is based on both spatial (boarding stop and line route) and temporal criteria (the most likely departure according to the minimisation problem expressed in Eq. (5), each passenger (travel card ID) trip has a corresponding vehicle movement determined from the AVL data.

$${t}_{diff, opt}=\underset{i}{\mathrm{min}}\left|{t}_{boarding,AFC}-{t}_{dep,AVL,i}\right|$$
(5)

where tdiff,opt is the optimal time difference, tboarding,AFC is the time of boarding according to the AFC dataset, and tdep,AVL,I is the departure time relevant to trip i.

Figure 6 illustrates a specific spatial case of transfer, or activity, inference when different stops are used for alighting and subsequent boarding. In our case, the headway in the peak period of the line route used for the post-intermission boarding was considered in order to discern inter-trip activities from transfers between PT services. Thus, if an intermission (“time gap”) between an alighting and a boarding event—occurring on the same day and meeting the spatial constraints mentioned above—lasted longer than the maximum weekday headway of the line route used for the subsequent boarding, it was regarded as an activity and the alighting stop was the trip destination—otherwise the intermission was regarded as a transfer event that was included in the trip.

A random sample of time gaps is shown in Fig. 7 and its cumulative distribution (CDF) in Fig. 8. As indicated in the scatterplot of the first of these diagrams, a large portion of the gaps were less than 60 min. Although the median (and mode) headway in Fig. 7 is as low as 14 min, most gaps (95 per cent) were classified as activities since time gaps also included date changes, leaving just five percent of them being transfers. This is also attributable to the relatively small selection of line routes in the data, making possible transfers between routes somewhat limited. Consequently, the mean number of trips (trip legs) per travel card ID and day was as low as 1.7 during both panel waves. The second “peak” in Fig. 7 is located around the nine-hour mark (400–650 min), suggesting that these time gaps correspond to work or education-related activities. Negative time gaps are due to inconsistencies between boarding time stamps and previous trip alighting stop arrival times that originate from the process of inferring alighting stops or stations in which the observed trips in the travel card data were matched with the service trips in the AVL data.

Fig. 7
figure 7

Time gaps (10 per cent random sample of all weekday observations) between trip elements in minutes, plotted against headways (also in minutes) of the line route used for a subsequent trip, on days with multiple trip elements. The red line and the shaded area below it indicate gaps categorised as transfers

Fig. 8
figure 8

Distribution (cdf) of time gaps, in minutes, between trip elements, on days with multiple trip elements

To control for the use of different stop points for the same ultimate origin or destination (e.g., home or work), street network-congruent walking time matrices for neighbouring stops within 600 m were applied to increase the flexibility of OD pair definitions (inspired by the work of Goulet-Langlois et al. (2016) and with the cut-off distance value based on the empirical findings of Berggren et al. (2019)Footnote 5). Thus, only stops within the cut-off network walking distance to neighbouring stops and present on at least one occasion in the card transaction data were included in order to rule out irrelevant stops.

Identifying recurring trips and matching across time frames (panel waves)

In order to be able to study the panel effect of changes in reliability (punctuality and regularity), it is crucial to obtain trip types that are subject to the impact of changes in line route properties. Thus, only trips fulfilling this criterion during both measurement periods were included in the analysis. The rationale here is to control for changes in line route usage that are due to exogenous causes, such as a change of residence or employment location. In order to maximise the success rate of the matching of specific trips (travel card IDs using a specific line route in a specific OD pair) across the two panel waves, a procedure of stop clustering was applied, similar to the method applied by Goulet-Langlois et al. (2016). This hierarchical clustering principle is illustrated in Table 2. In this example, card no. 1 only uses four stops according to the full AFC data set, while card no. 2 uses five stops and card no. 3 a total of three stops. The stops in each set of stops identified as being used for boarding by each card holder were used as candidate stops in the matching of trips per OD stop pair, relaxing the requirement for an exact origin stop-destination stop pair match into an origin cluster-destination cluster match criterium, each cluster involving extended matching options compared to just one origin stop-destination stop pair per trip. Cut-off values for maximum network distances between the stops of each cluster were set to 0.72 and 0.78 km for 2016 and 2017 data, respectively. These values were double the median access/egress walking distances obtained in the survey presented in Berggren et al. (2019).

Table 2 Simple example of stop clustering structure with three travel cards, two stop clusters (A and C) and six stops

As mentioned initially in this section, a total of 1,293 cards met the criteria regarding recurrence and cross-wave matching. However, since each individual may use more than one line route recurrently, the total number of cases amounted to 2,076 (cf. Fig. 4). Of these, a change in the relative usage of a line route was found in 437 cases (21 per cent of the total).

In order to study the possibility of differences in behaviour depending on whether the trip was made on a weekday or on a weekend, a day type dummy was introduced based on the date on the timestamp of each boarding event in the transaction (AFC) data. Here, the total number of cases was larger since the trip frequencies per OD pair and card ID were disaggregated based on an additional variable (day type) and in all possible day type combinations across the two measurement periods (2016 weekday vs 2017 weekday, 2016 weekend vs 2017 weekend, 2016 weekday vs 2017 weekend and 2016 weekend vs 2017 weekday). The resulting dataset contained 2,418 cases, of which 452 included a line route change between the 2016 and 2017 measurement periods.

Reliability measures and other covariates

Reliability data was derived from an AVL database covering actual and scheduled departure and arrival times for all stops along the same subset of PT line routes that was identified in the smart card dataset. From the data, regularity and punctuality indices, RI and P, were calculated for each line route, day type (weekday or weekend day), and panel wave time frame using Eqs. 2 and 3, respectively.

To check model robustness in relation to other common path choice-determining trip attributes, the potential covariates of mean OD line route travel time and ridership-weighted mean PT service frequencyFootnote 6 per line route was also appended to each case in the analysis dataset. In addition, travel time variability was controlled for as measured by reliability buffer time using the definition specified in Eq. (4) (RBT, Gittens and Shalaby 2015; Uniman et al. 2010).

In this context it should be noted that the set of trips per line route and OD pair used to calculate the travel time distributions was obtained from the 57,351 (2016) and 43,109 (2017) cards present in the third filtering step of AFC data (section "Data, its enrichment and refinement", Fig. 4), meaning that there might be trips included that were made by cards not being present among the recurring trips in both panel waves (the final filtering step in Fig. 4). However, since all calculations were made per OD pair and line route, only travel time data for relevant recurring OD pairs and line routes were used.

There was a total of 232 cases with a cross-wave increase of line route usage in the data, which constitutes 11 per cent of all cases. The number of cases with decreased line route usage across waves amounted to 8.6 per cent or 179 cases. Thus, a large majority, or 80 per cent of all cases, did not include any change of line route usage. Descriptive statistics of the covariates used in model estimations are displayed in Table 3. As indicated, the mean headway regularity RIL has generally improved but the mean timetable adherence PL has decreased across panel waves. However, the median values have the opposite signs for each measure, indicating outlier values with large leverage impacts on the data (e. g. one line route had a punctuality loss of 105 per cent i.e., the proportion of delayed departures increased from 22.6 to 46.4 per cent, while another gained 25 per cent). This outlier effect was also evident in travel times, regarding both mean values and the buffer time required to allow for travel time variability (RBT). Thus, median values were used in order to obtain the most representative view of the data. A total of 1,254 cases displayed increased RBT while 776 cases entailed a decreased need for a travel time buffer.

Table 3 Descriptive summary statistics for regressors/independent covariates used in the four sub models, based on the total dataset of analysis cases (travel card ID*line route*OD)

Viewing these statistics on a line route level, only five out of 20 routes showed an increase in punctuality, while the remaining routes were more delayed in the second than in the first panel wave. A similar picture was evident for headway regularity, in which five line routes improved while 13 had decreased regularity (two routes remained unchanged in this regard). For the ridership-weighted mean service frequency covariate, only two line routes had decreased levels while all others had increase departure frequencies across panel waves.

At the significance level of 0.05, the inter-variable correlation between the two measures of reliability was found to be significantly different from zero. Consequently, schedule adherence P and headway regularity RI ranged from 0.58 for Pearson’s correlation (Table 4) to 0.69 for Spearman’s correlation (not shown). PT supply frequency (the difference in ridership-weighted maximum departures per hour) correlated with P and RI, respectively, only according to the Spearman test, with a coefficient of 0.48. The other variables were not significantly correlated.

Table 4 Pearson’s correlation coefficients for the independent variables/regressors in the total dataset of analysis cases (cardID*line route*OD)

Modelling approach

We have used a fixed effects and logistic regression approach to analyse our panel data regarding the degree and magnitude of impact of changed service reliability—punctuality and regularity—on the individual passenger’s long-term trade-off between line routes in the given OD pairs. More specifically, we have regressed the likelihood of changing the relative path used for each OD pair and line route, given changes in reliability, as well as other potential covariates, including service frequency, travel time variability, and travel time, across panel waves. In a sense, it is a dual model approach since increased and reduced line route usage were analysed in separate models. These were not of a pure (discrete) choice variant, in that we did not keep track of the complete choice set for each individual passenger (travel card ID) in the models. Instead, inherent to the structure of the panel data was that competing line routes were not taken into account, only the impact of the changing attributes of the line route registered at least once by each cardholder was, while the effects of changing attributes of other line routes couldn’t be traced. In the panel dataset, the relative frequency of the usage of each line route, travel card ID, and OD stop pair were matched across the two panel waves—each comprising a two-week period for November 2016 and 2017 respectively, along with the line route reliability attributes of each panel wave.

A (Boolean) response variable indicated whether the analysis case implied a change in the relative line route usage frequency—positive, negative or not at all. The regressions were based on a linear logit transformation of the probability p of a certain outcome, as a function of observed covariates xi as shown in Eq. (6):

$$logit \left(p\right)=\mathrm{log}\left(\frac{p}{1-p}\right)= \alpha +{\sum }_{k}{\beta }_{k}.{x}_{k}$$
(6)

where α is the intercept, βk is the k:th independent coefficient and xk is the k:th explanatory variable.

Estimates of the regression parameters α and βk were given by maximum likelihood estimation.

A total of fourteen different models were tested, as indicated in Table 6. Two subsets of data were evaluated depending on increased or decreased line route ridership and across weekday covariates, respectively (models with acronyms ending with a “ + ” or "−", respectively, in Table 6). In order to model positive changes in line route usage (increased ridership proportions), all cases of an increased line route ridership proportion were set to “1” and the remaining cases to “0” for the negative change in line usage (decreased ridership proportions); all cases with a decreased ridership proportion were set to “1” and all other cases to “0”. The explanatory variables were normalised to a full fraction scale of change, in which for example a 50-per cent decrease in travel time or reliability index value were both set to -0.5. Using the first model pair of M1, we analysed line route switching behaviour in relation to both schedule adherence (P) and headway regularity (RI), as well as the auxiliary covariates of travel time, RBT, and service supply. Using models M2—M3, on the other hand, we were able to analyse the same dependent variable as in M1 but in relation to one reliability measure at a time (Table 6).

Since the reliability measures were found to be correlated, we had each reliability index interact with a dummy variable, indicating cases with short maximum weighted headwaysFootnote 7 (at or below twelve minutes) in order to find the potential impact of short versus long headways (above twelve minutes). The rationale is to be able to discern the potentially different approaches used by passengers to minimise wait times and scheduling effort for short and long headways, respectively. To establish whether changed service reliability in high frequency line routes, i.e. routes with a maximum twelve-minute headway, imposed a different behavioural response than low frequency routes, a test was conducted using a dummy for analysis cases involving high-frequency line routes. In the models M4-M5, the reliability measures interacted with this dummy.

The influence of day type (weekdays versus weekends) on long-term change responses was tested in a data subset in which the day type for each relative line route usage was considered using a dummy variable for weekend days, which interacted with each reliability measure (M6-M7). In an additional model, not shown in Table 6 due to lack of space, we tested the difference in propensity to change line route usage across day types and panel waves regardless of the direction of change and without any additional covariates regarding service reliability, etc.

The fit of the different models was measured using McFadden’s pseudo R2 (McFadden 1973) and the receiver operating characteristic (ROC). The former indicates the respective model’s ability to explain the variation in the response variable, and the latter the sensitivity vs. fall-out rate regarding the models’ ability to detect responses (Hand and Till 2001). Pseudo R2 < 0.05 indicates low fit, Pseudo R2 > 0.20 indicates a very good fit, and Pseudo R2 > 0.40 is rarely observed (Andreß et al. 2013).

To check the validity of our research hypotheses regarding PT service reliability and passengers’ long-term trade-off between line routes, we evaluated the logistic regression coefficients for each independent variable and calculated a measure of sensitivity for the impact of each independent variable on the propensity to change line route—both regarding increased and decreased proportional ridership. The sensitivity S, evaluated at each regressor βk, indicates the marginal change in the probability of changing the proportional use of a line route with respect to a marginal change of that specific covariate. The effect of intercept α is subtracted according to Eq. (7) to remove the effects of unexplained variation on the dependent variable.

$$S=\frac{{e}^{\left(\alpha +{\beta }_{k}\right)}}{1+{e}^{\left(\alpha +{\beta }_{k}\right)}}-\frac{{e}^{\alpha }}{1+{e}^{\alpha }}$$
(7)

Results

The estimation results of the fourteen feasible models, presented in Table 6, do support some parts of our research hypothesis—but not in all possible trip conditions and not for all reliability indicators applied. In this section, we first present the model fit and outcome from each model, followed by a joint interpretation and discussion regarding the general picture that the results convey.

The model with by far the best fit was model M1-, which resulted in a McFadden’s pseudo R2 equal to 0.154 and an area under the ROC curve of 0.81, while model M6 + had the worst fit with a McFadden’s pseudo R2 of 0.04 and an area under the ROC curve of 0.57. Since we assumed that there could be a certain degree of collinearity between a few of the auxiliary reliability measures (RI and P had a collinearity R2 value of 0.3 and RBT and RI had a R2 value of 0.05), the results of the multi-variate regression models were somewhat difficult to interpret. Thus, in our analysis, we focused on the simple models of M2 through M7 to isolate the variables and thus implicitly accepted a lower goodness of fit.

Strategic choice behaviour regardless of line route headway

When regressing the probability of an increased line route usage rate against both line route regularity and punctuality, as in model M1 + (Fig. 9), only regularity emerged as a significant driver towards a higher propensity to choose these line routes, while punctuality appeared to work the opposite way (Table 6). In terms of relative impact, the sensitivity (Table 5) to the regularity RIL entailed an increased long-term choice propensity of 0.89 units for each unit of increased regularity, while the corresponding (and reverse) impact of punctuality was ten times smaller.

Fig. 9
figure 9

Visualisation of model M1 + by which the probabilities of positive ridership change are regressed against the weekday RIL difference (DiffRI_WD), with the other regressors (displayed in \* MERGEFORMAT Table 3) fixed at their median values: Change in PL = -− 0.04, Change in travel time = -− 0.01, Change in RBT = 0.08 and Change in service frequency = 0.008

Table 5 Sensitivity to the marginal probability of changing (increasing or reducing, respectively) the proportionate line route ridership in relation to each independent weekday variable in model M1

Both reliability measures negatively impacted on decreased line route usage (Table 6) according to model M1- (Fig. 10), which appears reasonable since the increased reliability of a line route should counteract a passenger’s propensity to switch to a different route.

Fig. 10
figure 10

Visualisation of model M1- by which the probabilities of negative ridership change are regressed against the weekday RIL difference (DiffRI_WD), with the other regressors (as displayed in \* MERGEFORMAT Table 3) fixed at their median values: Change in PL = −0.04, Change in travel time = −0.01, Change in RBT = 0.08 and Change in service frequency = 0.008

Also, increased RBT positively impacted decreased line route ridership, which appear to be reasonable given the potential disutility of a longer travel time buffer to compensate for the increased travel time uncertainty captured by the RBT measure. However, increased RBT also affected increased line route usage positively, which is less intuitive.

The two other covariates of M1, Travel time change and Departure frequency change had a less significant and at least in part unintuitive impact on line route switching behaviour. Thus, increased departure frequency in an OD entailed a reduced propensity to increase line route usage, and increased travel time entailed a reduced propensity to decrease line route usage. No significant effect on increased line route usage was found from increased OD travel time.

Sensitivities (Table 5) specify the propensities at around equal, with a five percentage-unit decreased likelihood of a negative change in the line route usage for each unit of reliability increase.

Models M2 and M3 indicate the impact on the line route switching behaviour of each reliability measure separately. Comparing M2 + with M1 + , regularity in itself had minimal explanatory power regarding increased line route usage compared to when it was accompanied by punctuality, travel time, and service supply attributes. In M2−, however, regularity had a stronger effect than when it was combined with the other line route attributes of model M1−. However, when punctuality in model M3 + was correspondingly analysed, it displayed a counter-intuitive negative impact on increased line route usage, but a somewhat stronger positive (decreased incidence of decreased line route usage) impact in model M3−.

Strategic choice behaviour and headway categories

The systematic impact of regularity on line route switching behaviour was most pronounced for line routes that had at least five departures per hour (HW1), as indicated by the signs and values of the parameters for RI in model M4 in Table 6. Accordingly, there was a positive impact of RI on increased line route usage (M4 +) and a negative impact on decreased line route usage (M4−). However, for line routes that had less than five departures per hour (HW2), there were no such indications of a consistent impact since RI had a negative influence on both increased and decreased line route usage. A corresponding pattern was evident in model M5, in which the impact of punctuality P on line route usage was analysed (Table 6).

Influence of day type

Finally, the last models of Table 6, M6 and M7, analysed the effects of day type—weekday or weekend day—on the propensity to change the proportional use of line routes in a positive or negative direction. The results did not indicate a significantly different outcome regarding how reliability affected long-term line route choice depending on day type.

Table 6 Results from the logistic regression models (M1 to M7) in terms of parameter estimates. Significance at the 0.05 level is indicated in bold
Table 7 Mean values of relative line route usage frequencies per panel wave, for cases with changed relative frequencies across panel waves, subdivided by headway group

However, from the results of the additional day type model (not present in Table 6), in which neither the direction of change regarding line route usage nor reliability differences across panel waves and day types were taken into account, there was evidence of a difference in line route switching behaviour between weekdays and weekends. In this model, and depending on the direction of measurement, a significant parameter value of 0.76 for line route usage change between weekdays and weekend days and, correspondingly, values of − 1.6 to − 1.8 for differences between weekend days and weekdays, indicated a more pronounced switching behaviour during weekends than during weekdays. This difference across day types in “sticking” to a specific line route or path is in line with previous research by Kim et al. (2017), who also showed that PT passengers are more prone to sticking to specific paths on weekdays than on weekends. However, it should be noted that their analysis covered a period of nine consecutive weekdays and two consecutive weekends, while our data spans one year.

To summarise our findings, the regression analyses indicated that headway regularity appeared to have the most significant impact on line route switching behaviour regardless of service frequency, in which the greatest influence of improvements on regularity is to attract more trips, whereas the repelling effect of a decline in regularity is less articulated. For punctuality, on the other hand, the clear and positive impact was restricted to line routes with at least five departures per hour. Thus, our data give some support to somewhat different kinds of adaptation behaviour to changes in reliability depending on headway, although only for punctuality. The reader should note that the data itself indicated a general flux of passengers from low-frequency train routes to high-frequency bus routes across our panel, illustrated in Table 7. Since the train routes in our AVL dataset experienced a general decline in both regularity and punctuality during this period, while the most popular bus routes displayed improved reliability figures, this shift of passengers is perfectly analogous to an expected passenger response to reliability change. It also sheds some explanatory light on the somewhat ambiguous impact of the reliability of low-frequent line routes on changes in their usage.

Discussion

We applied average line route-specific departure reliability metrics by day type in order to analyse marginal changes in the relative passenger demand per line route. The main rationale was to get a generalised measure that targeted the general impression that passengers are expected to have of the service reliability of the line routes that they use on a recurring basis. Thus, the exact reliability of specific vehicular trips or on particular days or time periods was not of interest here, due to the increasing difficulties by which to associate each passengers’ line route perception to these disaggregate metrics. It would be of interest for future research efforts to look into whether passengers make trade-offs across line routes differently depending on e.g., travel direction and time of day, but in this paper, we restricted the analysis to focus only on day type specific trade off patterns.

In this study, indicators of revealed choice constituted the target of measurement. However, as noted, the explanatory power is rather low, and the model suffers from multicollinearity between some of the independent variables. These phenomena are not uncommon in models of human behaviour relying on the kind of empirical mobility data used in this paper but may be possible to manage more directly in an experimental setting such as that applied when recording stated choices of selected respondents in a controlled environment.

As our results suggest, the positive impact of improved service departure reliability (and negative impact from decreased reliability) on long-term line route choice probabilities is confined to cases where (a) departure reliability is being measured using line route headway regularity only, regardless of service frequency, or (b) where regularity and punctuality are only regarded for line routes with an hourly service frequency of at least five departures. To some extent, these results challenge the consensus among researchers and practitioners regarding the importance of schedule adherence for low-frequency services. However, we believe that these results may be attributed to properties of the data, particularly the phenomenon of a general decline in punctuality among less frequent line routes described in the last paragraph of Chapter 5 (and indicated in Table 7). In addition, there may be other factors at play that were not captured by the applied model framework, such as factors related to social choice as well as pure randomness. Nevertheless, the relationships between a relative change of these two indicators of departure reliability and travel demand warrant further research efforts to bring more clarity to the issue.

We believe that a number of behavioural phenomena may be at play in the data. Primarily, the relatively few cases that actually included a line route change across panel waves (21 per cent of all cases, 54 per cent of cases in which there were multiple distinct alternative line routes for the particular case OD) may indicate a kind of endowment effect (Kahneman et al. 1991), or satisficing behaviour, i.e., a reluctance on the part of a passenger to change away from a route with which (s)he is sufficiently content (Kahneman and Krueger 2006; Kaufman 1990). On the other hand, people who actually responded to reliability changes across years may display some aspect of risk-averse behaviour, although the net impact from reliability change on behaviour was relatively weak in our data. There is also the issue of perception, or sensitivity, to both absolute reliability and changes in reliability (Friman et al. 2001), in which we have focused on common deviations from the timetable—punctuality and regularity—and have left the potential behavioural effects of large-scale disruptions outside the scope of our analysis. However, as also described by Yap et al. (2017), the long-term effects of what Friman et al. call “remembered frequency of negative critical incidents” may also have substantial effects on long-term PT path choice. Our analysis has focused on what Carrel et al. (2013) call the trip planning phase; in that we have analysed long-term behavioural change. However, it is not possible for us to truly disentangle these effects from the short-term decisions that are made based on events occurring en route. Having said that, no large-scale interruptions were reported by the transport provider, nor by the media during the two periods of our data collection.

A potential cause for confounding effects is the modification to the PT route network that occurred between the panel waves, and which affected six of the line routes included in our study. Our approach to mitigating these effects included the clustering of origin and destination stops based on walking distance. However, there may be other effects on behaviour from this event, the mechanisms of which have not been fully captured by our models.

Another phenomenon that may have a potentially counteracting effect on behavioural response to reliability deficiencies is the occurrence of bus “overtakings”, as reported by Barabino et al. (2015). This feature of bus operations, typically, entails a re-ordering of bus trajectories which may result in severe delays being recorded on some vehicle trips in the AVL system, but which are not perceived as serious by passengers if their total travel time is not negatively affected compared to their expected travel time, including the levels of actual and expected total trip time predictability. Fortunately, this phenomenon tends to occur quite infrequently in our AVL data when comparing scheduled and actual vehicle running times. A total of 2.5 per cent of all stop or station departure times, regardless of the day type or line route, included a vehicular trip that had been subject to re-ordering of the actual running sequence of buses—and this applies to both data collection periods.

Conclusion

In the study described in this paper, we applied a panel approach in order to study the long-term behavioural impact of changes on the level of PT service departure reliability. We used a panel of revealed trip data from tap-on travel card transactions of an AFC system in south-west Scania, Sweden, and combined them with true vehicle trajectories from AVL data covering the same line routes and time frames. The common challenge of obtaining full OD trips with data lacking tap-off information was approached by means of an alighting and transfer stop inference procedure. Despite the unavoidable uncertainties that such an inference procedure imposed on the observations, the results appear reasonable given the additional measurement noise. The main rationale behind the chosen panel approach was to enable the controlling for factors other than line route-based changes in service reliability. By using a stop clustering procedure and keeping both origin and destination stop clusters constant for each individual passenger (card identifier) between the panel waves, we were able to reduce the impact of other causes of path change, such as changes in the individual passenger’s locations of origin or destination (related to, for example, change of residential or employment location). Other line route-based attributes such as travel times and service frequency were controlled for in the model formulation in order to elicit the relative impact of departure reliability on long-term path choice.

Our research hypothesis regarding the behavioural impact of departure reliability change in PT is rather bold, yet straightforward. Although we found that passengers in general tend to shift from marginally unreliable to marginally more reliable line routes, this appears not to include all passengers (which could be assumed from a literal interpretation of our research hypothesis), nor all service frequencies. In fact, the majority of cases we studied showed an indifferent reaction on the part of the passenger to changes in reliability. However, the behavioural response foreseen in the second part of the research hypothesis is clearly supported by our findings. Thus, results suggest a positive impact on long-term choice probabilities from an increased level of service reliability on the line route level, but a consistent impact may only be found for only one of the measures applied in this paper; headway regularity, and only for cases with at least a certain minimum level of service frequency. In general terms, the main takeaway of this study would be that service reliability has a significant impact on the strategic path choice of PT passengers according to the panel data based evidence presented in this paper, although with certain limitations.

The results presented in this paper should have important implications for modelling and forecasting the demand for public transport, and ultimately on PT network design. This would primarily concern the inclusion of line route reliability—particularly headway regularity –in path choice algorithms and—eventually—mode choice utility functions. In addition to the well-researched measure of RBT, penalties or substitution rates attributable to reliability will have to be estimated and tested in order to accurately model passengers’ response to reliability changes in PT. These would constitute a couple of directions for future research that are within immediate reach. However, the subject of behavioural long-term responses to changes in PT service reliability is surprisingly uncharted terrain in transportation research, in which most studies thus far have focused on short-term effects. In addition, the way in which different passenger groups perceive and respond to irregularities and delays in PT constitutes another area of great interest, although there are few examples of large-scale studies of this kind of disaggregate revealed behaviour to date.