1 Introduction

This paper documents a revisit of an analysis of the punctuality of Norwegian railroads from 2010. In the expanded study, carried out 5 years after the initial investigation, we looked into assumed causes of delays and low punctuality with the additional data from the period 2010–2014. The original study (Olsson et al. 2010) indicated that development in the delay hours in 2005–2009 was due to a combination of three factors: (1) an increased error rate in infrastructure and rolling stock, (2) extensive work close to the tracks due to new investments and an increased amount of maintenance and (3) an inability to address normal variation of weather conditions. Following the initial investigation, several measures were put in place by several actors in the Norwegian Railway sector (infrastructure manager, passenger and freight train operators) in order to improve performance. Via correlation and regression analyses for the two periods, the article contributes to the scientific literature on empirical analyses of delay causes by presenting a double investigation into causes of delays and low punctuality, as well as an approach for an ex post evaluation of effects from punctuality improvement initiatives.

The revisit, enabled by new data on performance in the years following the analysis, provides a rare opportunity for re-evaluating the original findings.

1.1 Quality indicators for railway performance: punctuality and delays

Punctuality is an important element in the quality of service for both passenger and freight transport on railways. Several studies have shown that punctuality is of critical significance to the customer satisfaction of railway travellers (Lam and Small 2001; Norheim and Ruud 2011); Salkonen and Paavilainen (2010). Punctuality is claimed to be one of the most important quality indicators in railroad operations (Nyström 2008; Seco and Gonçalves 2007). Several studies have concluded that punctuality is highly valued among railway customers (Andersson 2014; Coulombel and De Palma 2014; Kroes et al. 2007; Palmqvist et al. 2017; Paulley et al. 2006; Rietveld et al. 2001; Wardman 2001). Li et al. (2010) found that travellers are willing to pay more for their fare in exchange for a reduction in travel time variability. Kroes et al. (2007) highlight that in the greater Paris area, creating new infrastructure to develop the public transport network is of no use if the existing network is not of high quality. Hadiuzzaman et al. (2019) found that the perceived service quality for train travellers is related to both physical conditions and service of intercity trains. Monchambert and de Palma (2014) model the influence of punctuality on mode choice of commuters, illustrating that punctuality influences the choice between bus and train. According to van Hagen and Sauren (2014), NS (Nederlandse Spoorwegen) has developed a pyramid of customer preferences. The base of the pyramid is formed by the basic needs of reliability and safety. They describe punctuality (or reliability) as a “dissatisfier”, meaning that this quality aspect is rated negatively if it does not meet expectations. If the desire for punctuality is met, the passenger will experience a sense of control and be satisfied with the journey (but no more than that).

Punctuality and delays are most often mentioned as similar terms but have different meanings. Delays are measured in time units, whereas punctuality is expressed through percentage. Parbo et al. (2016) say that punctuality, as a numerical measurement, is the percentage of trains arriving on time to the stations. To measure punctuality, one must define the threshold for when an arrival is counted as a delay. These registration limits vary from country to country. For example, in Germany, the UK and the Netherlands, the delay limits are 5, 10 and 3 min, respectively (Hansen and Pachl 2008). In Japan, a train is considered delayed if it is more than 1 min behind schedule (Tomii, 2010). In Norway, a local train is on time if it reaches the destination with less than 4 min (3 min and 59 s) delay, whereas a long-distance train is on time if it reaches the destination with less than 6 min (5 min and 59 s) delay.

The 3 and 5 min limits used in Norway trace their origin to a time when punctuality was measured manually and in full minutes at the final destination of each train. Trains being more than 3 and 5 min delayed were considered as non-punctual. This was calculated as “>3” and “> 5” minutes and discussed as if the limits were 3 and 5 min, but these were actually in effect “≥ 4” and “≥ 6” minutes. To maintain consistency when delays came to be measured in seconds, the limits became 3 min and 59 s and 5 min and 59 s, respectively. Punctuality can be measured in several ways. The most common is based on arrival delay to the final destination (Parbo et al. 2016), which is discussed above. Depending on data availability, which has been increasing in recent years (Wallander and Mäkitalo 2012), it is also possible to measure punctuality at any station (Palmqvist 2019).

An alternative measure of delay is travel time variability (Noland and Polak 2002; Rietveld et al. 2001), which is especially relevant for train traffic that is to a lesser extent based on timetables, such as some kinds of freight trains (Gorman 2009). While punctuality is typically measured based on the number of trains being delayed, Salkonen and Paavilainen (2010) call for a refocus of punctuality measures from the number of trains affected, to how (and how many) passengers are affected. This would mean to shift focus from train punctuality to traveller punctuality (Kristoffersson and Pyddoke 2019). While this may be a better measure of the railway system performance in terms of a user perspective, it is often a challenge to obtain data on passenger volumes (Sørensen et al. 2018). An alternative approach is to study train interactions, and especially the share of interactions that occur as planned (Palmqvist and Tomii 2019). König (2020) provides a taxonomy scheme for railway punctuality problems and connects the field of delay management to other parts of the planning process.

There are several approaches to research on punctuality, including optimization of timetables (Andersson et al. 2013) analysing historical traffic data (Benezech and Coulombel 2013; Takeuchi et al. 2007) and studying timetable supplements (Vromans 2005). Another tradition is to study delay causes. This paper continues this latter tradition. In the following, we give a brief overview of some research in the field.

1.2 Previous studies on causes of delays and low punctuality

Harris and Godward (1992) listed five variables that affect the punctuality of trains and investigated their correlations to punctuality for three British datasets (InterCity, Network SouthEast and Regional) and one Dutch dataset. The study showed that two variables were statistically significant in determining punctuality, distance covered from departure station and train length (in number of carriages). The other factors investigated were: the previous number of station stops, age of a motive power unit and track occupation. Ten years later, a similar study was performed on data from Norwegian railways by Olsson and Haugland (2004) and presented the correlation between delays and punctuality and the following factors:

  • Number of travellers,

  • Load factor (number of passengers/seats),

  • Capacity utilization of tracks,

  • Cancellations,

  • Speed restrictions,

  • Development close to tracks,

  • Departure and arrival punctuality,

  • Regulations for handling exceptions.

The factors with the strongest correlation to punctuality were the number of travellers, the load factor of the trains and departure punctuality. The impressions of practitioners had led to the authors expecting a stronger correlation than what was actually found between punctuality and capacity utilization and the reduction of speed. The weak degree of correlation seen in Olsson and Haugland (2004) led to new hypotheses regarding the relationship between the studied factors and punctuality. Rather than having a linear effect on punctuality, the various factors are expected to have threshold values. If thresholds are exceeded, punctuality is affected. Factors, such as capacity utilization and speed restrictions, can have such thresholds. If the thresholds are not exceeded, the effect of these factors is often hidden in the noise from other factors.

Passenger behavior and numbers have been studied as an explanatory factor for delays (Heinz 2003; Palmqvist et al. 2020). Carey and Carville (2003) also highlight the importance of how train stations can influence punctuality. Harris et al. (2013) state that the time required for passenger boarding and alighting at stations is a critical element of overall train service performance. Bender et al. (2013) studied how to manage an unknown number of passengers that want to board a train, the objective being to minimize the total delay of all passengers.

Bergström and Krüger (2013) state that a major share of all delay time is associated with the tail of the delay distribution. Thus, extreme delays cannot be neglected when prioritizing between measures designed to improve rail infrastructure. Bergström and Krüger (2013) also claim that delays are not only concentrated in size but are concentrated in space and time and follow a precise power law with respect to days and an exponential distribution with regard to stations. In terms of punctuality, their study showed that large and small delays had an equal influence on the overall punctuality (assuming a small delay is more than the delay threshold of 4 min).

Several European studies have looked into the effects of severe weather conditions as a potential explanation for delays. Xia et al. (2013) studied how weather conditions affect train operator and railway infrastructure performance in the Netherlands. Their study documented that wind gusts, snow, precipitation, temperature and leaves had strong positive effects on the number of infrastructure disruptions, and a strong effect on train arrival punctuality and cancellation rates. Several weather-related aspects, such as snow, leaves, high temperatures and variation in temperature within the day, have negative impacts on punctuality and cancellation rates. Their impacts are, in general, indirect by causing disturbances in the railway infrastructure. Gusts, precipitation and low temperatures, on the other hand, have both direct and indirect effects. Ludvigsen and Klæboe (2014) presented four cases of how the harsh 2010 winter weather affected rail freight operations in Norway, Sweden, Switzerland and Poland. The study emphasized the reactive behavior of rail managers who mobilized to reduce the adverse outcomes. During the snowy and cold winter of 2009/2010, many problems occurred. Eleven countries experienced rolling stock problems (Enno Wiebe 2010). According to Szafránski (2011) and Trafikverket (2010), problems that occurred in Sweden were related to four different areas: infrastructure, contractor capacity, internal management and process, and public relations with passengers and operators. More recently, Wang and Zhang (2019) found that severe weather was strongly influencing train delays.

Railway capacity has been widely assumed to be an important factor regarding train punctuality, especially concerning the volume of secondary delays. Gibson et al. (2002) undertook an extensive study on the relationship between capacity utilization and delays in the UK. Their aim was to find empirical justification for congestion charges on the parts of the railway network with the highest capacity utilization. Wenyi et al. (2010) studied delays on Chinese high-speed railways, with a special focus on delays caused by the train control system; they found the train control system accounted for 10% of total delays of 5 min or more.

The availability of large amounts of data from railway systems has resulted in more analyses of punctuality and delays, laying the foundations for enhanced timetabling, simulations, driver assistance and train dispatching. Examples include Huang et al. (2011) and Zhou and Mi (2013), who analysed the effect of speed restrictions based on simulations. Liu et al. (2009) discuss optimal driving patterns on lines with speed restrictions.

In summary, in the railway literature there is a large tradition of simulation-based approaches for timetable and punctuality analyses. We have briefly reviewed empirical studies that investigate factors that influence punctuality. Common factors that are analysed in such studies include weather, availability of infrastructure and rolling stock, capacity utilization, passenger behavior and train dispatching.

1.3 Notes on special characteristics of Norwegian railways

The Norwegian railway network consists of railway tracks that extend to 4087 km across the country from North to South. There are 2622 km of electrified tracks, 242 km of double-tracks, 60 km of high-speed tracks, 696 tunnels and 2760 bridges (Jernbaneverket 2015). Two special features of the network are the relatively low share of double-tracks and high share of non-electrified railway (38% is not electrified). The major tracks for long-distance, regional and freight trains are single-tracked. Nearly all double-tracks are located in the larger Oslo-area, where there is a continuous construction of new double-tracked lines. The Oslo area is important for the rail traffic for two reasons: the majority of trains (with the majority of travellers) run there, and train traffic in the Oslo area affects the trains in large parts of the country.

Two different measurements related to delays are used in Norway. One is related to delays to the final destination and is a measurement of time or percentage of trains that are considered as delayed. The other measure is related to registered delay causes. When trains are more than 4 min delayed or additionally delayed, train dispatchers are required to register a delay cause. Delays are registered while under way, and a train number can get several registrations of additional delays. These delays are summarized and referred to as delay hours. A train may hence accumulate delay hours even though it arrives punctually at the final destination. The delay hours are allocated to operators, infrastructure or traffic management.

1.4 Development in overall punctuality: 2005–2010

The number of delay hours allocated to passenger trains was largely on a steady rise from 2005 to 2010. There was an increase in most categories for the period as a whole. For the operators on the Drammen–Eidsvoll route (mainly NSB and the Airport express train), the number of delay hours remained stable until the winter of 2009–2010. The winters of 2007 and 2009 also showed an increase in delay hours, but the variation in delay hours during the year remained remarkably stable. The winter of 2009–2010 stands out as having a higher peak in the number of delay hours and that it took longer for the number of monthly delay hours to return to normal levels.

For the infrastructure manager, the situation with regards to delay hours gradually deteriorated over the period from 2005 to 2010. In addition to an increase in the average number of delay hours from year to year, the variation from week to week also increased significantly. The winter of 2009–2010 was a continuation of the trend regarding the number of delay hours for the infrastructure manager.

1.5 Measures taken from 2010

Several efforts were put in place to improve the reliability of the rail system from 2010. Some measures were relatively straightforward: extra capacity (manpower) for removal of snow at strategically important parts of the infrastructure (especially switches), and electric heating of switches was gradually put in place. Conductor rails were installed in the Oslo tunnel to reduce the tearing down of the overhead power line.

New rolling stock was put in place (introduced from December 2012) to serve local and regional traffic in the greater Oslo region. Additionally, a new timetable for the greater Oslo area was put in place in December 2013 as well.

Publicity campaigns in the form of posters at the stations and on the platforms were produced to change traveller behavior regarding how to board and alight from the train, as described by Harris, Mjøen and Haugland. Markings were made on the platforms to facilitate positioning for the travellers preparing to board the train, as well as making it easier for the train drivers to stop at an appropriate place at the platform.

Finally, the funds dedicated to operations and maintenance for the Infrastructure Manager has increased year-by-year throughout the period (as well as the funds for infrastructure investment).

1.6 Development in overall punctuality: 2010–2014

The years 2009 and 2010 stand out as years of crisis for Norwegian railways. In what has been regarded as an “annus horribilis”, many of the things that could go wrong did go wrong. The cold winter and the large number of delays ensured an increased focus on robustness in case similar conditions were to occur again. Figure 1 shows the delay hours on Norwegian railways in the period from 2005 to the end of 2014. The years following 2010 have a similar volume of delays as the years prior to 2010.

Fig. 1
figure 1

Delay-hours per month and three-month moving average for the Norwegian railways (sum of delays for goods and passenger trains) from 2005–2014. Values are indexed. 100 equals average number of delay hours per month for 2000

2 Methods

2.1 The task force and its mandate

Following an initiative from the Ministry of Transportation, a task force consisting of representatives from the infrastructure manager (IM), three train operating companies (TOCs) for people transport, and the largest goods operator was established to review the development in punctuality and delay hours over the period from 2005 to the winter of 2009/2010. In the year 2005, registration of punctuality and delay hours-registration was automated (although the tagging of delay-codes remained a manual task). Therefore, 2005 was a natural starting point for the review. The introduction of automated registration coincided with the highest average punctuality level registered for the Norwegian railways.

A team of two researchers was engaged to organize and facilitate four workshops with the task force and to conduct analyses between each session using data provided by the participants. The analyses laid the groundwork for the task force’s work on identifying the causes for delays and causes to reduce the amount of yearly delay hours and to increase punctuality.

The following was defined as a mandate for the work: “Reviewing developments in delay hours and identify the underlying root cause(s) as for why the trend shows a steadily deteriorating development from 2005 until now (2010). This includes investigating potential connections with changes in operating conditions and traffic volume. Recognized statistical methods are to be used. Special in-depth analyses can be performed on identified problem areas or the implementation of measurements in areas where data are not available.”

The participating TOCs and the IM willingly shared their data, under the prerequisite that sensitive data should remain with the researchers and that all analyses would be presented to the task force (a list of the data used is included in Sect. 2.2). The willingness to share information based on sensitive data was grounded in the understanding of a secondary, unofficial goal of the task force: Creating a shared understanding of the current status and performance of the rail system and thus cooling down an ongoing public “blame game” in the media. Analyses were conducted between sessions, and preliminary findings were sent to the participants for review before each meeting. Potential root causes, referred to as “assumed causes of delays and low punctuality” in the following, along with questions regarding the analyses were discussed in the group.

The original task force analysis covered the period from 2005 to the first months of 2010. When we remade the analyses including the development from 2010 and onwards, we wanted to have data sets that were as comparable as possible. The analyses therefore cover two periods of five full years, January 2005 to December 2009, and January 2010 to December 2014, in addition to analyses of the full 10-year period.

Examining data for 2010–2014, new regression models indicate that although their influence on delays and low punctuality has changed as various measures have been put in place, some factors remain influential: Low temperature (< − 10 °C and < − 15 °C) and snowfall (> 10 cm), reduced train lengths and increased volume of train services (freight and passenger trains) still contribute to delays and low punctuality. The relationship between Infrastructure Management activity (operations, maintenance and investment) appears to have been reversed in the second period. The overall punctuality has improved since 2010; however, the performance of the record year 2005 has not been duplicated.

This paper is a continuation of the work done in the task force. We study punctuality-influencing factors in the period 2005 to 2009, as well as the same factors in the second period from 2010 to 2014. The paper utilizes a wider dataset and a longer time period compared to the initial study by Zakeri and Olsson (2017).

2.2 Data sources

Data that captured factors which were expected to influence punctuality and delays internally and externally to the actors were collected with a resolution of per day, per week and per month. Throughout this article, the emphasis will rest on the results and the analyses of the aggregated “per month” dataset, covering the complete network.

2.2.1 Dependent variables: punctuality and delays

Measures of punctuality and delay hours collected from 2005 to the end of 2014 were treated as dependent variables in the regression analyses. The data used for the analyses are specified in Table 1, including the dependent variables representing punctuality and delays used in the study. Punctuality was measured as the percent of trains arriving less than 4 min (local trains) or 6 min (freight and long-distance passenger trains) to their final destination. Delay hours were based on registered delays at any point along a train route. If a train gets an increase in delay by 4 min or more between two measuring points, the cause for the delay increase is registered by the train dispatchers. The delay hour measure is the sum of these registered delays for a given line in a given time period.

Table 1 Overview of dependent variables, punctuality and delay hours

Punctuality and delay hours were chosen as variables because they are well established in the Norwegian railway industry. Traffic quality was measured based on these variables, and the indicators were discussed at regular quality evaluation meetings between the infrastructure manager and the operators. The variables, however, have some weaknesses. For example, the number of passengers is not represented in the measures.

2.2.2 Independent variables: causes of delay and low punctuality

The independent variables consisted of measures of Infrastructure Manager activity (operations, maintenance and investment), volume of transport services for freight and passenger services (ton and ton-kilometers for freight operators and train and seat kilometer produced by passenger train operators) and the number of passengers. Additional independent variables related to passenger transport included changes to the type of rolling stock used for passenger trains, especially reduced seat capacity. Measures of performance at the freight terminals included departure delays from the main terminal. Factors related to weather and climate were included in the analyses. Weather was hypothesized to negatively influence delays and punctuality when snowfall, precipitation and low temperature hit certain thresholds. Registrations from three measuring posts in and around Oslo were collected and used in the analyses (Blindern, Asker and Gardermoen, respectively). The thresholds used were the number of days with – 10 °C and – 15 °C, the number of days with equal or more than 10 mm precipitation and the number of days with snowfall of 10 cm or more for each measuring post. These thresholds were defined in co-operation with the taskforce industry participants. An index was constructed for each factor combining the number of days each factor had exceeded the thresholds at each station. Table 2 is a summary of independent variables and how they are measured.

Table 2 Overview of independent variables, potential drivers of delays and low punctuality

The factors that were assumed to influence punctuality are a less well-established group of variables. The choice of variables was based on literature studies and previous research but also based on data availability. As independent researchers, we had access to data that in general is considered to be company-internal information to the train operators. This was the case for data about passenger rolling stock availability, and measures from freight terminals. Data on weather and infrastructure spending were added because both aspects were highlighted by industry representatives in the task force.

2.2.3 Modelling delays and punctuality

Regression models of delays and punctuality on the network were developed iteratively as data were made available. The first step in the analyses consisted of calculating the correlation coefficient of the independent variables and the measures of punctuality and delay hours. Correlation does not mean causation, so the expertise of the task force was put to the test with identifying causes of delays and low punctuality.

The correlation within the set of independent variables was subsequently calculated to investigate if the variables were internally independent. The correlation between punctuality and delay hours was calculated. In addition, one-way analyses of variance (one-way ANOVA) were carried out to further investigate the relationship between the weather and delay hours and punctuality.

The discussions based on simple correlation factors form the basis for the regression analyses and the ordinary least square models of the railway system. Three models were developed for each of the independent variables: a complete model that used the complete data set of independent variables, a model consisting of only statistically significant variables in which independent variables were removed step-by-step based on their p value, and a reduced model in which only the assumed primary explanatory factors are included. The coefficients have been calculated for 2005–2009, 2010–2014 and 2005–2014. Standardized coefficients have been calculated for the reduced models.

The availability of relevant data combined with detailed knowledge from the participants in the task force facilitated the identification process of connections between incidents, general (known) issues and delays. The analyses are based on a data set including the assumed delay causes and potential explanatory factors, from different databases belonging to the participants (including punctuality data and maintenance records for infrastructure and rolling stock), in addition to other available data sources (most importantly data on weather). The punctuality and delay data have been studied in relation to a number of possible explanatory variables to uncover correlations and causes. All of the data requested by the task force were made available. Preliminary results were presented and reviewed by the task force for quality assurance.

3 Analysis of punctuality and influencing factors

3.1 Factors influencing punctuality

To begin with, we show an analysis of correlations between the different dependent variables and between the independent ones. Table 3 presents correlation coefficients (Pearson product-moment correlation coefficient or Pearson’s r) for the correlation between delay hours (sum and delay hours allocated to the actor responsible for the delays) and punctuality (both overall and broken down into various types of traffic). These factors are treated as dependent variables in the subsequent analyses. There is a high degree of correlation between delay hours allocated to the various actors and punctuality for different types of traffic. The correlation between delay hours and punctuality is also high, indicating that although it is possible for trains to generate delay hours while still being on time at the final destination (and hence being punctual according to the statistic), these variables are not independent of each other.

Table 3 Pearson product-moment correlation coefficients (Pearson’s r) of current quality measures for the railway system, Delay hours and punctuality (dependent variables)

Table 4 presents the correlation between the assumed causes of delays and low punctuality. The assumed causes are treated as independent variables in the subsequent analyses. However, several of them are not independent of each other. The funds available for operations, maintenance and investment appear to be coupled so that the increase in funding throughout 2010–2014 has resulted in an even increase for each activity. The increased funding has also had a capacity-increasing effect on the network, indicated by a strong correlation between funds and freight ton-kilometres and passenger volumes.

Table 4 Pearson product-moment correlation coefficients (Pearson’s r) between assumed causes of delays and low punctuality

Weather, constituted by precipitation, snowfall and low temperatures, appears to be independent of all the other assumed causes of low punctuality and delays in the analyses. The same is the case for performance at the freight terminal, as captured by the variables “delay at terminal” and “ready at terminal”.

3.2 Correlation between assumed causes of delays and low punctuality and the performance of the railway system

The following tables present correlation coefficients (Pearson product-moment correlation coefficient or Pearson’s r) for the correlation between assumed explanation factors (or drivers of delays and low punctuality) and the performance of the railway system, as expressed by punctuality and/or delay hours.

We present the correlation coefficients for the periods 2005–2009 (before the initial study), 2010–2014 (the period after the study as several actions were put in place) and the combined period of 2005–2014 in Tables 5 and 6. Correlation coefficients that are significant at the 0.01 level are marked with bold font, those significant at the 0.05 level are in italic font in the tables. Although not all changes between the periods can be attributed to actions taken from 2010, special note should be taken of factors assumed to be causes of delays and low punctuality of which the direction of the relationship changes (from − to +) or becomes (in-)significant.

Table 5 Pearson product-moment correlation coefficients (Pearson’s r) between delays and assumed drivers of causes and low punctuality
Table 6 Pearson product-moment correlation coefficients (Pearson’s r) between punctuality and assumed causes of delays and low punctuality

Table 5 presents the correlation coefficients between delay hours and assumed causes of delays and low punctuality. For the initial period under investigation, there are significant correlations between factors considered to be the responsibility of all the actors (infrastructure manager, passenger and freight operators) and the sum of delays on the railways. Increased infrastructure manager activity, increased number of passengers and increases in freight all coincide with increases in delays. In addition, low temperatures coincided with periods with increased numbers of delays. For the second period (2010–2014), many of these relationships changed or appeared to be dissolved. Infrastructure manager activity increased as delays were reduced, the relationship between the number of train and seat kilometres and delays became insignificant and the number of passengers increased as delays were reduced. The relationship between freight volumes and delays appear to have been reversed or dissolved. The relationship between weather and delays, however, correlates to a higher degree in the second period. The analyses of variance (ANOVA) between weather and punctuality are largely in line with the findings based on correlation. Low temperatures appear to influence overall delay hours and delay hours attributed to the train operating companies in both periods. There is a statistically significant relationship between snowfall and delay hours attributed to dispatching for both periods.

Delay hours attributed to the infrastructure manager follow a similar pattern. Special note should be taken of the strong degree of correlation between infrastructure manager activities in the period 2005–2009. In the subsequent period of 2010–2014, the correlation is no longer significant. Delay hours attributed to the Infrastructure Manager in the second period appear to be largely independent of all the assumed causes of delays and low punctuality, the only exceptions being reduced train length (which is a factor under the train operators’ control) and funds for operations (which correlates negatively with IM delay hours in the second period, hence increased funding indicates a reduced number of delay hours).

Delay hours attributed to the passenger train operating companies correlate closer with weather in the second period than in the initial period. Production volume (seat and train kilometres) and the number of passengers correlate less closely in the second period, where the correlation is largely insignificant. The correlation between incidents of reduced train length and delay hours attributed to passenger train operators remain approximately unchanged from the initial to the second period.

Similarly, delay hours attributed to dispatching correlate more closely with the weather in the second period compared to the initial period. Production volume (seat and train kilometres) and the number of travellers correlate closely with delay hours attributed to dispatching in the initial period. In the second period, the relationship appears to be insignificant. As was the case with delay hours attributed to the infrastructure manager and the passenger train operators, incidents of reduced train length correlate with delay hours in the second period. The relationship between freight train volumes and performance at the terminals is reversed or dissolved in the second period.

Table 6 presents the Pearson correlation factors between assumed causes of delays and low punctuality and punctuality of different types of train traffic. The causes of low punctuality are expected to contribute to a lowering in the percentage of trains that reach their destination on time, so the correlation coefficients are negative when the relationship is as expected.

The correlation coefficients of the assumed causes of low punctuality and punctuality are for most of the cases in the region of − 0.4 for the 2005–2009 period. The exceptions are production volumes (seat and passenger train kilometers and tons transported) and precipitation, all of which are close to zero and not significant.

The correlation between bad weather and low punctuality is lower for the second period for local passenger traffic and freight. For overall punctuality and bad weather, the correlation coefficients are higher in the second period with strong snowfall as the only exception.

The correlation between assumed causes of delays and low punctuality and overall punctuality changes very little between the two periods in the investigation. Punctuality for regional traffic, on the other hand, oversees a reduction in factors with a significant correlation from eight in the initial period to a single significant variable in the second period. The reduction in variables that correlates at the 0.01-level with punctuality for local traffic is similarly reduced from 12 in the initial period to one in the second period (three variables have correlation coefficients that are significant at the 0.01-level in both periods, but the signs have changed).

There were several instances of reversed relationships between the assumed causes of low punctuality and punctuality for freight trains from the initial to the second period in the investigation.

3.3 Regression models of delays and punctuality

Following the pairwise correlations, we present the regression models in which the identified factors’ relationship with the number of delay hours produced by the system is proposed. To propose quantifications of the relationship between the assumed causes and delays, regression models were developed using regular least-square models and the GRETL software (GNU Regression, Econometric and Time-series Library). The regression coefficients have been standardized in Tables 7 and 8. Standardization of the coefficients is performed to facilitate the evaluation of the effect of each independent variable on the dependent variable.

Table 7 Regression models of delay hours
Table 8 Regression models for punctuality

The initial calculations of the correlation coefficients for punctuality, delays and their assumed causes were presented during discussions of the task force to identify root causes and primary mechanisms explaining how the assumed causes actually influenced the railway system (and which actors were vulnerable for the various assumed causes). These are discussed further in Sect. 4. However, these discussions formed the basis for the reduced set of assumed causes of delays and low punctuality included in the regression models.

Cold winter weather (represented by the indicators “number of days with less than 10 °C”, “number of days with less than 15 °C” and “number of days with more than 10 cm snowfall” and number of days with more than 10 mm precipitation) was identified as essential to all actors. However, although some weather-factors are significant at the 0.05-level in every model (except delays due to dispatching 2010–2014), weather is not the dominant cause of delays at the “per month” level.

The passenger operating company acknowledged that splitting up trains to use shorter trains than planned produced delays in the system and contributed to delays attributed to other actors. This finding is supported by the regression models. The variable “Reduced train length” is significant in every model except for delay hours attributed to dispatching for 2005–2009.

The regression model for the sum of delay hours could be represented well by the reduced set of independent variables (R2 of 0.83 and 0.79). The only variable that was significant at the 0.05-level for both periods was reduced train length. Funds for investment contributed to fewer delay hours in the second period, whereas funds for operation and management contribute to delays in the model. Freight volumes (ton kilometer) and departure delays at terminal changed from being significant at the 0.01 and 0.05-level to becoming insignificant.

The regression model of delay hours attributed to the Infrastructure Manager by the identical set of independent variables resulted in a good fit for the initial period of investigation (R2 of 0.73). The fit between modelled and actual delay hours for the second period was significantly lower (R2 of 0.46).

The fit between actual and modelled delay hours attributed to the passenger train operating companies was very close (R2 of 0.81 and 0.87). Low temperatures (number of days colder than − 15 °C) was significant at the 0.05-level in both periods. The regression model for delay hours attributed to dispatching was the only one in which all assumed causes of delays and low punctuality were represented with positive coefficients. Half of these signs changed in the second period.

The fit of the regression models for punctuality was lower than those of the models of delay hours. This is to be expected as trains may catch up to their timetable after being delayed. Three models are presented: punctuality for commuter trains, average punctuality for all passenger trains and punctuality for freight trains. The three models are quite similar with regards to significant variables and development from the initial to the second period.

Weather is not presented as the main cause of low punctuality in the models. However, snowfall is significant at the 0.1-level in three models. Funds for investment and funds for operations and management are significant in the second period for all three models. The same is the case for the production volume of passenger train services. Freight volumes are presented as a cause of low punctuality in the initial period of the study. However, in the latter period, the variable becomes either insignificant or appears as a factor contributing to high punctuality.

4 Discussion

The original study indicated that development in delay hours in 2005–2009 was due to a combination of three factors: (1) an increased error rate in infrastructure and rolling stock, (2) extensive work close to the tracks due to new investments and an increased amount of maintenance and (3) an inability to address normal variations of weather conditions. The study highlighted that the Norwegian railway had a systematic winter issue. Delays occurred to a greater extent in years with colder winters, compared to years with milder winters. In addition, construction work and activity on the tracks were related to increased delays. When construction activities increased, so did the delay hours.

The increase in delay hours and lowering punctuality from 2005–2009 appeared as a year-on-year trend, with the winter of 2009/2010 at the extreme end. It was, therefore, to some extent expected that the railway’s performance would “regress to the mean” in the years following 2010. As was generally expected after record-levels of delays in 2009 and 2010, the following years proved to be better with regard to delays especially from 2012, as shown in Fig. 1.

In the years following the investigation, a range of measures was adopted by both IM and the TOCs. These measures include extra capacity (manpower) for the removal of snow at strategically important parts of the infrastructure (especially switches), and electric heating of switches was gradually put in place. Conductor rails have been installed in the Oslo tunnel to reduce the incidence of damage to overhead power lines. New rolling stock was put in operation (introduced from December 2012) to serve local and regional traffic in the greater Oslo region. Additionally, a new timetable was put in place in December 2013. Publicity campaigns in the form of posters at the stations and on the platforms were produced to change traveler behavior regarding how to board and alight from the train. Markings were made on the platforms to facilitate positioning for the travelers preparing to board the train, as well as making it easier for the train drivers to stop at the appropriate place at the platform.

Five years later, some of the measures taken are perceived as being highly successful, whereas the measures that were expected to have the greatest effect have yet to produce extensive results. The effects of the measures taken can be evaluated by analysing data for the period 2010–2014 by extending the time series and reviewing the correlation and regression series.

In the following sections, we discuss in more detail the extent and the mechanisms of how the assumed causes of delays and low punctuality affect the rail system to produce delays. This analysis can be used as an ex post analysis of the effects of implemented measures, and especially the combined effects and their influence on the whole railway system. The regression models are suitable to identify how relationships between assumed causes and the assumed resulting low punctuality and delays have changed. The measures that have been put in place vary with regards to when and how gradually they have been introduced. Some, such as the introduction of a new timetable, are in operation from a fixed date. Others, such as the introduction of new rolling stock, have been gradually introduced over several years.

4.1 Winter—variations in weather

The various seasons each introduce distinct challenges for rail operations, such as frost heaving in early spring, sun kink during summer, and defoliation during autumn. Figure 1 reveals that the Norwegian railway has systematic winter issues. Every year sees a dip in punctuality during winter and an increase in both delay hours and the number of cancellations. One may expect trouble to occur in periods with especially severe weather, as is the case with other modes of transportation. It appears, however, that it is not only the “extreme” weather that causes problems for Norwegian railways but also regular winter. Only winters that are exceptionally mild have seen rail operations performing in line with the norm for the rest of the year.

The regression models only partially support the notion that weather is an important cause of delays and low punctuality. There may be several explanations as to why that is. One is the resolution in the analyses. Instances of “bad” or extreme weather rarely last for months. Still, every model (with two exceptions for the second period) includes at least one weather-variable that is significant at the 0.01-level.

There were two distinct issues with winter operations that concerned the representatives from the IM and TOCs. One was the failure rate of switches and crossings, which increased due to ice forming and obstructing them from locking in position. The second was snow on the rolling stock melting when trains passed through tunnels, to re-freeze once the trains left the tunnels. Both were of special importance in the Oslo area. There has been an increase in the rate of introducing electric heating to switches and crossings after 2010. The number of tunnels, on the other hand, is also increasing because new railway lines in densely populated areas tend to require lines to run underground.

The study indicates that the snow has a stronger impact on the punctuality at the final destination than on the number of delay hours, whereas low temperature is more important for delays. The correlations between snowfall and low temperature on the one side and goods freight and passenger traffic on the other were similar in magnitude, implying that freight and passenger traffic are somewhat equally influenced by winter conditions.

4.2 Infrastructure manager activity

The investigation focused on infrastructure manager activity rather than instances of infrastructure failures. Infrastructure failures (and partial failures) may result in temporary speed restrictions, a full stop of all traffic or no effect at all on traffic. Old infrastructure is usually more prone to failures than new (after the initial run-in period). When failures occur, identifying, getting to and locating the failure sources often take more time than the repairs. However, the data on infrastructure condition and the number of failures were not significant when introduced in the regression model for 2005–2009. The corresponding data for 2010–2014 was not evaluated in the ex post investigation. Infrastructure failures are probably still relevant, for example, as part of the mechanism in which the weather influenced the number of delay hours.

Instead, a hypothesis was put forward that increased infrastructure manager activity, with the aim of reducing the number of delays and increase capacity on the network through operations, maintenance activities, investment projects, and upgrades, which were a cause for delays and low punctuality.

The study documented the unfortunate link between action to reduce delays and the amount of delays in the railway system for the period 2005–2009. Maintenance activities and investment in upgrading infrastructure, as well as the construction of new infrastructure, contributed significantly to delays in the period up to 2009. The amount of IM operation and maintenance and construction work increased rather than decreased in the years following 2010. Still, both the correlation and regression models indicated a reversal in the relationship between investment and delays and low punctuality for the second period. This raises questions regarding the causal relationship between construction (investment), operations and maintenance and delays and low punctuality. The causal relationships are multi-faceted and are expected to include:

  1. 1.

    Low punctuality. This highlights the need for increased maintenance and investments in the infrastructure (indicating that funds for construction and maintenance lag the development in punctuality).

  2. 2.

    Ongoing work (both construction and maintenance). This affects traffic due to the introduction of speed restrictions and capacity limitations as work is conducted. The effect will increase in cases where the timetable is not adjusted according to current capacity/speed limits.

  3. 3.

    Punctuality levels. These increase after work has been performed, but mainly after a run-in period for new infrastructure and after new times are introduced, as shown by Olsson (2006).

During construction close to infrastructure in operation, as well as some maintenance work, temporary speed restrictions are put in place. Speed restrictions increase travel time, reduce capacity and remove buffers on the network. From 2010, additional care has been taken to ensure sufficient buffers in the timetables for lines where investment projects were expected to take place. Additional measures to avoid investment, operations and maintenance resulting in delays and low punctuality include executing these in periods with less traffic (summer, nights and weekends). Even with these measures in mind though, it should not be expected that investments actually work as buffers against delays; some elements of chance or some confounding factors are more probably in play.

4.3 Passenger transport and rolling stock

Reduced seat capacity showed a strong correlation (between 0.49 and 0.66) with the total number of delay hours, delay hours attributed to the main passenger train operator and delay hours attributed to dispatching. Reduced seat capacity due to shorter trains was only studied for passenger trains. Splitting of double sets would often result in a reduced number of seats and available doors to board and step off the train, and hence the need for longer stops at stations. In addition to reduced seat capacity, shorter trains may also mean that passengers standing along the full platform length may need to walk along the platform to board. The total number of passenger train kilometers was also identified as being correlated with the number of delay hours attributed to the IM.

In the regression models presented in Tables 7 and 8, a reduced train length (compared with the planned number) resulted in an increased number of delay hours attributed to the infrastructure manager as well as to the operator for both passenger and freight trains. This result was expected for passenger trains but not for freight trains and caused a debate in the task force. Periods in which there were less splittings of trains also saw reductions in the number of delay hours attributed to the infrastructure manager, which may indicate that splitting trains is a way of reducing the inconvenience of passengers when severe delays have occurred and rolling stock is not in a position to execute the timetable as planned, independent of who is responsible for the initial delays.

4.4 Freight

The total ton-kilometers (the product of tons of goods and the number of kilometers it was transported) was identified as significant for overall delay hours as well as for passenger operators’ and goods freight operators’ delay hours. The regression models support the notion of freight ton-kilometers as a factor contributing to delays and low punctuality only in the initial period. In the second period, the relationship appeared to be reversed, with the exception of delay hours attributed to the infrastructure manager.

Freight trains do not disturb regular traffic as long as they keep up with their schedule. If they do not, the single-tracks of the Norwegian railway system result in delays, with the risk of spreading to the rest of the network. Freight train punctuality effectively illustrates the vulnerability of the rail system.

Departure punctuality from the freight terminal has, therefore, traditionally been treated as an essential issue. Departure punctuality is closely related to “freight train ready at terminal”, which is logged by the freight operator. Sufficient turnaround times, available personnel and lift capacity are key factors to address for the operators to ensure punctual departure. However, the variables “ready at terminal” and “average departure delay from terminal” are significant in a lower number of the regression models than might have been expected. A possible explanation lies in the nature of using averages. Higher resolution for the data (for instance average departure delay and delay hours per day) will probably provide additional insight into the relationship between departure delays and performance on the rail network.

5 Conclusion

The paper has documented a revisit of an investigation into causes of delays and low punctuality in the Norwegian train system. Enabled by new data on performance in the years following the initial analysis, new analyses provide the grounds for re-evaluating the original findings. Appending data for 2010–2014, new regression models indicate that although their influence on delays and low punctuality has changed as various measures have been put in place, the same factors remain influential: low temperature (< − 10 °C and < − 5 °C) and snowfall (> 10 cm), reduced train lengths (mainly due to running single rather than double sets of rolling stock) and increased volume of train services (freight and passenger trains) still contribute to delays and low punctuality. The relationship between infrastructure management activity (operations, maintenance and investment) and delays and punctuality has been reversed in the models for the second period. In the first period, more activity was correlated with more delays, while this was not the case in the second period.

The scientific contribution of this paper is related to two aspects. First, we contribute to the literature on empirical analyses of delay causes. Second, we illustrate an approach for ex post evaluation of effects from punctuality improvement initiatives.