1 Introduction

The railway transport sector is a key enabler of economic growth worldwide. The United Kingdom (UK) has a railway network of 17,732 km of track (the 17th largest in the world) which is spread over wide geographical areas throughout the country [1]. The number of railway passengers as well as freight volumes has increased significantly in recent years. According to recent statistics published by the Office of Rail and Road (ORR), a total of 1.654 billion journeys were made in 2014–2015, making the UK’s railway network the fifth most used in the world [2]. The growth of journeys is partly attributed to a shift away from private motoring due to increasing road congestion, but also to the improved quality of railway transport services. The British railway industry was privatised over the period 1994–1997, but nowadays most of the railway tracks are managed by Network Rail (NR) [3]. Nevertheless, the network is still confronted with serious problems caused by premature failure of assets that require costly and time-consuming maintenance work.

The railway assets in general can be categorised into two types: The first one is the infrastructure which consists of fixed assets such as tracks, points and interlocking, bridges, signalling system, electrical units, etc. The other one is the rolling stock which includes assets that can move on railway, e.g. locomotives, passenger coaches, freight cars. A rolling stock is a multi-component system that consists of wheels, bogies, doors, power unit, brake control unit, coupler, compressor, pantograph, etc. Figure 1 illustrates the major components of a British Class 800 rolling stock asset and their relationships to one another. A failure of any of rolling stock components can cause a complete failure of the system and consequently lead to traffic delays and disruptions, passenger inconvenience and economic losses for train operating companies. Rolling stock failures may also result in the derailment of waggons and casualties of passengers and crew. For these reasons, it is crucial to develop practical methodologies for analysing and mitigating the risks associated with failure of various rolling stock components at a system level.

Fig. 1
figure 1

Railway rolling stock components (www.hitachirail-eu.com)

In recent years, a great deal of attention has been paid to the study of the failure/damage mechanisms for railway infrastructure assets. However, few attempts have been made by researchers to develop failure criticality assessment models for rolling stock components. There are several tools and techniques that are currently used to determine and evaluate the risk of failures occurring in engineering systems throughout their entire life cycle—from design to production, operation and maintenance. One of the widely used techniques in this regard is the failure mode, effects and criticality analysis (FMECA) which is an extended version of the failure mode and effects analysis (FMEA) method [4, 5]. In the FMECA technique, all potential failure modes that could occur in various components of a system are systematically analysed. The causes of each failure mode and their associated impact on system operation are identified. A “risk” or “criticality” measure is then calculated for each failure mode based on the rate of occurrence of failure and severity of the possible consequences. Finally, the failure modes are prioritised or classified according to their levels of criticality and some preventive actions are proposed to improve the reliability of the system.

In this paper, the potential risks of unexpected failures occurring in rolling stock are identified, analysed and evaluated using a FMECA-based approach. The criticality of a failure is measured as the product of the likelihood of occurrence of the failure mode (O) and the severity of damage caused by the failure (S), where O and S are allocated numbers from 1 to 10. According to criticality levels ranging from 1 (lowest) to 100 (highest), the most critical failure modes in the rolling stock with respect to both reliability and economic criteria are identified. Finally, several potential protective measures to eliminate the root causes of rolling stock failures are provided. The presented model is applied to a rolling stock passenger door system in a Scottish train operating company and the results are discussed.

The remainder of this paper is organised as follows. Section 2 gives a brief overview of the risk evaluation in the railway industry. Section 3 presents a FMECA methodology for risk evaluation of rolling stock failures. In Sect. 4, a case study of the passenger train door system is described and the results are presented in detail. Finally, the paper is concluded in Sect. 5.

2 Risk Assessment in the Railway Industry

As stated in ISO 31000:2009 [6], risk is defined as “the effect of uncertainty on objectives” and an effect is “a positive or negative deviation from what is expected”. In general, risk is a combination of two factors: (i) the probability of occurrence of a failure and (ii) the magnitude of the consequences of the failure.

Risk analysis is defined as a systematic use of available information to characterise the likelihood that a specific event may occur and the impact of its likely consequences. The purpose of risk analysis is to determine the overall priority of a hazard, so that further actions can be taken to reduce and mitigate the most critical ones where resources are limited. Risk analysis can be either qualitative or quantitative or a combination of both. The qualitative risk evaluation methods use the judgement and opinions of knowledgeable experts to categorise the risks, while quantitative tools are based on probabilistic and/or statistical models that calculate risk over time. Typically, quantitative risk assessment techniques are more robust than the qualitative ones. However, the data requirements for quantitative risk assessment techniques are higher, which makes them difficult to apply.

In the last decade, many studies have been carried out to analyse the likelihood of failure of railway assets as well as to evaluate the impact of a failure on transport operations. Several risk assessment tools and techniques have been used for this purpose, including root cause analysis (RCA), fault tree analysis (FTA), event tree analysis (ETA), Weibull analysis, human reliability assessment (HRA), etc. In what follows, we briefly review the most relevant, recent works on the subject below.

Haile [7] identified the strengths and weaknesses of the quantitative risk analysis (QRA) technique in application to railway system design and operation. Carretero et al. [8], Garcia Marquez et al. [9] and Pedregal et al. [10] used a Reliability Centred Maintenance (RCM) methodology for failure analysis of railway infrastructure assets. Podofillini et al. [11] developed a model to calculate the risks and costs associated with inspection of railway tracks. Zio et al. [12] proposed a risk-informed approach for improving the service level of railway networks as well as maintaining high standards of safety. Their approach uses importance measures to identify those sections of the network having the highest impact on the overall trains’ delay. Kumar et al. [13] developed an approach for risk assessment of railway defects that can be used to support the decision-making process for scheduling of railway inspection and grinding activities based on the type and the risk of defect. Macchi et al. [14] presented a two-stage methodology for maintenance management of the railway infrastructures. The first step of this methodology consists of a family-based approach for the equipment reliability analysis and the second step builds a reliability model for the railway system in order to identify the most critical items. Cheng et al. [15] applied the FMECA method to analyse the reliability of metro door systems. Kim and Jeong [16] used the FMECA method to evaluate the consequences of brake system failure in a railroad vehicle and then analysed the adequacy of preventive maintenance (PM) programmes for the asset. Recently, Rahbar and Bagheri [17] presented a framework to evaluate the risks associated with moving hazardous materials (hazmat) by rail transport.

As the review shows, very few studies assessing the criticality of railway rolling stock component failures and the subsequent impacts on infrastructure services have been conducted so far. In what follows, we propose a FMECA-based methodology to determine the criticality level of failures occurring in rolling stock assets.

3 FMECA Methodology to Rolling Stocks

The proposed methodology for risk evaluation of rolling stock failures, as shown in Fig. 2, includes nine steps. These steps are described in detail as follows:

Fig. 2
figure 2

Risk evaluation methodology for railway rolling stock failures

Step 1

Select a rolling stock component for the study

A railway rolling stock is usually composed of two main parts, namely car body and bogie parts, each consisting of different components and each performing certain essential function(s). The main rolling stock components that can be considered for risk analysis study include (but not limited to) the following:

  • Door unit The train doors are “opened and “closed” at each station to allow passengers to enter or leave the coach.

  • Scroll compressor It is a certain type of compressor used for HVAC and brake systems to compress air.

  • Bogie It is a framework carrying either four or six wheels attached to the coaches.

  • Pantograph It is a device mounted on the roof of the train to collect electric current from overhead lines.

  • Coupling system A coupler is a device used for connecting rolling stocks in a train.

  • Braking unit It is used in order to decrease velocity of trains, enable deceleration, control acceleration and keep them fix when parked.

  • Air spring suspension It gives a better ride and the pressure can be adjusted automatically to compensate for additions or reductions in passenger loads.

  • Heating ventilation and air conditioning (HVAC) It provides fluid air through the facility providing either hot or cool air dependent on the desired temperature.

Step 2

Collect the component function information

As each of the components’ functions in rolling stock is different, the mechanism of the occurrence of failure will be different from one component to another. The risk analysts must have a good understanding of the components of the system and the way in which they interact with each other and with their surrounding environment. The component function information can be collected by answering some of the following questions:

  • What functions does the component perform?

  • Can rolling stock operate without this component?

  • Does the component contain redundancies or backups?

  • Will rolling stock fail if the component fails?

  • In which ways will the component affect the other components or the overall system?

In order to define the logical interaction of components within the rolling stock, a Reliability Block Diagram (RBD) can be useful. An RBD is a diagrammatic method for showing how components’ reliability contributes to the success or failure of a complex system. Each block represents a component of the system with a certain probability of failure or failure rate. The blocks are often configured (i.e. interconnected) in series structure, parallel structure, k-out-of-n structure, etc. [18]. In a series structure, the entire system will fail if one of the components fails. A parallel structure is used to show redundancy wherein the whole system can function properly as long as at least one component is working properly. For k-out-of-n structures, a system is considered functioning if at least k out of a total of n components are working properly (1 < k < n). As an example, the RBD of a railway train passenger door system is shown in Fig. 3.

Fig. 3
figure 3

A reliability block diagram for the rail train passenger door system

Step 3

Determine potential failure modes that can cause damage to the component through reviewing past failures

The identification of potential failure modes is an important part of the risk analysis studies. For each component chosen, there exist some failure modes that can be determined by reviewing past failures, inspection records and non-destructive testing (NDT) measurements. The major failure modes in rolling stock components include disconnection, fracture, fatigue, cracked, degraded, deformed, stripped, worn, corroded, binding, leaking, buckled, sag, loose, misalignment and obstruct. Any of these failure modes or their combination can cause rolling stock to fail. For some rolling stock components, more than one failure mode may be present.

Step 4

Identify root causes that contribute to failure of the rolling stock component through interviewing experts from various fields

After all the failure modes have been identified, the risk analysts begin to investigate what, how and why a failure happened, thus preventing recurrence. The failure root causes can be determined by interviewing experts including designers, train operators, inspectors, maintenance technicians, etc. and using some analytical techniques like Root Cause Analysis (RCA) and Fault-Tree Analysis (FTA) [19]. RCA is a useful process that helps analysts identify and understand the initiating causes of a failure. FTA is a top-down and deductive failure analysis method through which all undesired events that may lead to system failure are analysed.

Some common root causes of the rolling stock failures are electrical/mechanical overloading, installation failure, software failure, hardware failure, material defects are calibration errors. It is worth mentioning that more than one failure cause (known as competing risks) may be found for some failure modes of the rolling stock.

Step 5.

Assign a likelihood rating to each failure mode of the rolling stock component

The failure data are analysed using statistical techniques (e.g. Weibull analysis, regression models, data mining) to create models for estimation of the likelihood of rolling stock defects. The likelihood of occurrence of a failure is evaluated on the basis of failure rates (in year) estimated from historical data or expert knowledge. The failure rate of the failure mode i is estimated by

$$ {\lambda}_i = \frac{{\text{Total number of failures resulting mode }i{\text{ since installation time}}}}{\text{Duration of time (in years) operation}}. $$
(1)

Based on the failure rates obtained, a likelihood of occurrence rating based on a 10-point scale is assigned to each failure mode (see Table 1). As shown, the recommended likelihood rating scale varies from 1 to 10, where 1 represents “remote” and 10 indicates “almost certain”.

Table 1 Likelihood ratings for a failure in railway rolling stock

Step 6

Assign a severity (consequence) rating to each failure mode of the rolling stock component

Each of the possible failure modes on rolling stock components has different impacts on train safety, transport operations as well as the environment. The failure consequences of a rolling stock component can be addressed from the following points of view throughout the service life-cycle:

  • Economic impacts Costs of inspection, maintenance and renewal (IMR), and penalty charges due to train delays or cancellation;

  • Social impacts Passengers’ dissatisfaction caused by service interruptions;

  • Safety impacts Fatalities or injuries due to train derailment;

  • Environmental impacts Greenhouse damages, chemical spills, etc.

In this study, the severity of failure is evaluated in terms of economic, social and safety losses and is described on a 10-point scale where 1 represents “no effect” and 10 indicates “dangerous without warning”. The recommended severity rating scale is presented in Table 2.

Table 2 Severity ratings for a failure in railway rolling stock

Step 7

Evaluate the criticality level of a rolling stock failure and prioritise the failure modes in descending order

The criticality level of a rolling stock failure is defined by a risk factor (R) which is calculated by multiplying the likelihood rating (O) by the impact rating (S), i.e.

$$ {\text{R}} = {\text{O}} \times {\text{S}}. $$
(2)

Since the likelihood of occurrence and the severity of damage have rating values between 1 and 10, the risk factor R will range from 1 to 100. The risk factors obtained for all failure modes are prioritised in descending order and the most critical ones with respect to both reliability and damage severity are identified. The most critical failure modes will be the ones occurring most frequently and leading to largest losses.

Step 8.

Categorise the failure modes into five classes of criticality

The failure modes according to the level of their criticality are categorised into five classes, namely very low, low, medium, high and very high critical. These classes of failure criticality and the associated improvement actions are described in Table 3. A failure mode will be very low critical when its risk factor is between 1 and 4, will be low critical when the risk factor is between 5 and 9, will be medium critical when the risk factor is between 10 and 25, high critical when its risk factor is between 26 and 49, and very high critical when the risk factor is between 50 and 100.

Table 3 Five classes of failure criticality and the associated improvement actions

Obviously, the criticality classes defined in Table 3 can vary depending on the type of rolling stock, available maintenance resources, safety standards, railway operations, traffic density, train speed, etc. The completed criticality matrix provides a useful, graphical portrayal of the risk factors obtained from the analysis. Different regions of the criticality matrix represent different levels of criticality for rolling stock components. For example, as shown in Fig. 4, the red cells at the top right-hand corner of the matrix represent “very high critical” region, whilst the green cells at the bottom left-hand corner represent “very low critical” region.

Fig. 4
figure 4

A criticality matrix for rolling stock failures

Step 9.

Propose potential protective measures to prevent recurrences

In order to achieve an acceptable level of criticality and enhance the reliability of the system, some improvement actions need to be proposed or initiated for medium, high and very high critical failure modes and components. Generally, the following protective measures can be considered to achieve a lower level of risk of failure in railway rolling stocks:

  • improving the reliability of individual components (parts improvement method);

  • adding redundancy to critical components in order to increase the mean time between failures (MTBF);

  • planning and undertaking scheduled cost-effective maintenance activities to minimise interruptions to railway transport services (e.g. see [20]);

  • utilising sensor-based technologies to continuously monitor the behaviour of rolling stock components; and

  • minimising the service disruption through shortening the repair lead times [21].

4 Application to Passenger Door Unit

In this section, the proposed risk evaluation model is applied to a passenger door system of the Class 380 electric multiple unit (EMU) that operates on the national railway network in Scotland [22]. The Class 380 trains are some of the newest and most advanced fleets available on the market, which account for around 10 % of the total number of trains operating on Scotland’s railway network. These trains have spacious seating, wide aisles, roof-mounted air conditioning, 230 V power sockets for laptops and handheld devices under each table, ample luggage provision, dedicated areas for cycles and wheelchairs, and Closed Circuit Television (CCTV) for added security.

There are several key components on the Class 380 trains that are often far more critical to the functionality of the system than the others. An analysis of performance data indicates that a great number of failures are associated with door system (see Fig. 5), having a detrimental effect on the train reliability and consequentially passenger satisfaction. A door system consists of the following major components:

Fig. 5
figure 5

The Class 380 train’s passenger door unit

  • Door drive Gearbox, upper locking devices, synchronising cable and guides;

  • Control elements and switches Open/close limit switches and pushbuttons;

  • Door leaf Mounting of leaf, window and lead-mounted guides;

  • Safety and emergency devices Mechanical switches, finger protection and light barrier;

  • Other components Interior panelling, wiring, lighting and steps.

The data required for this study were collected from the literature, the company’s maintenance management software system called EQUINOX and the UK’s railway performance management software DATASYS BUGLE [23]. These systems not only monitor all maintenance activities carried out by sub-contractors, but also record the trains’ activities from the operations side of business.

A fleet of 38 Class 380 trains (including 22 trains with four cars and 16 trains with three cars) is considered for this study. These trains are in operation since early December 2010 and have experienced a total of 2493 failures within the duration of this study. Of these, 205 failures (i.e. 8.2 % of the total failures) were related to defects associated with door unit components. The total mileage that these trains have been in operation is 2,235,312 miles. Therefore, the mean number of failures (MNF) per train and the mean mileage between failures (MMBF) associated with door unit are given by

$$ {\text{MNF }} = \frac{ 20 5}{ 3 8}\,{ = }\, 5. 3 9 4;{\text{MMBF}}\,{ = }\frac{ 2, 2 3 5, 5 1 2}{ 3 8} = 5 8, 8 2 4 {\text{ miles}}. $$

The five why’s technique was used to identify the potential failure modes and determine the root causes of failures. An example of the technique applied to the door system is given below:

The results of the analysis show that the door defects are due to twelve primary sources (root causes), as illustrated in Fig. 6. These, in order, are given as follows:

Fig. 6
figure 6

Failure mode frequencies for a passenger door system

  1. a.

    No fault found (NFF) No particular root cause was found for 87 door defects (i.e. 42.4 % of the total door defects reported).

  2. b.

    Faulty push buttons These were found to be the cause of 39 door defects (i.e. 19 % of the total door defects reported).

  3. c.

    Faulty door control unit (DCU) There have been 20 failures recorded with failure modes such as internal power supply failure, internal obstruction detection due to motor voltage and also falshcodes on DCU.

  4. d.

    Mechanical failures 18 failures were reported to be in relation to actuator rods becoming loose or not disengaging from limit switches.

  5. e.

    Light barrier There have been seven failures due to light barrier.

  6. f.

    Door drive There have been 6 failures in relation to door drive of the system. These failures are due to different reasons such as motor failure, encoder failure and faulty connections to the drive system.

  7. g.

    Guard operating panel (GOP) six failures were found to be due to GOP defects.

  8. h.

    Limit switches there have six faults occurred in relation to limit or micro-switches on the drive system.

  9. i.

    Loose plugs Six failures were found to be due to loose plugs or loose connections within the plugs themselves.

  10. j.

    Obstructions There have been six failures of door obstruction of the door leaves themselves, mostly due to dirt or debris stuck in door tracks.

  11. k.

    Door roller two failures were reported to be due to the rollers becoming detached from housing and not tough due to being damaged.

  12. l.

    Lubrication There have been two failures as a result of poor lubrication on the door system.

Table 4 presents the frequency of door system defects occurred in each train due to the above-mentioned 12 failure root causes.

Table 4 Frequency of door defects in each train due to various root causes

Qualitative assessment of the severity of different types of door defects was performed based on the negative impacts on transport services in terms of train delays, speed restriction and service cancellation. The delay information was extracted from a database system called TRUST (TRain RUnning SysTem TOPS) that is used for monitoring the progress of trains and tracking delays on the UK’s railway network. The total delay time of the train due to door defects was 518 min. The train operating company is penalised £50 per minute delay in service. Thus, the total penalty charges due to train delays will be 518 min × £50/min = £25900.

A Delphi technique was used to elicit the experts’ estimates of the failure likelihood and damage severity. Three academics who have published several papers in the field of risk and reliability, three maintenance engineers from the operating company with over 15 years of experience, one designer from the design consultancy and one designer from the manufacturer company were involved in this FMECA study. The results of the risk evaluation for the rolling stock door system are given in a worksheet format in Table 5. As shown, the level of criticality for various failure modes ranges from 3 to 28, where less than three percent of the failure modes fall into “very low critical” category, around 15 % of the failure modes are classified as “low critical”, around 70 % of the failure modes are “medium critical” and 12 % of the failure modes fall into “high critical” category. The high critical failure mode includes nine items, of which four failure modes have the risk factor of 27 and five failure modes have a criticality of 28 (out of 100). To avoid the recurrence of these failure modes, it is crucial to plan and carry out PM actions in a cost-effective and timely manner.

Table 5 Risk evaluation results for the Class 380 train’s door system

The Class 380 trains are expected to run 160,000 miles per year and to be in operation for 300 days of the year. Thus, the average daily miles for each train will be 533 miles. The current maintenance programme includes eleven tasks as described in Table 6 [24].

Table 6 Current maintenance programme for the passenger door system

The current maintenance activities were selected according to the original equipment manufacturer (OEM)’s recommendations as well as using the experience of other fleets. It was found that when previous fleets were introduced in the UK’s railway network, too much intrusive maintenance was undertaken and thus led to excessive delays. However, the Class 380 has different doors in the sense that they are electrically powered and the older fleets have pneumatic operations. The controls of the pneumatic system can be adjusted, which was found to cause problems, and the technology at time of manufacture was not sufficient to fit tamper-proof components. Overall, the current maintenance programme is not adequate and in order to reduce the number of door-related defects, a new PM programme including fourteen tasks has been proposed by company’s asset management team (see Table 7).

Table 7 Proposed PM programme for the passenger door system

By implementing such a PM programme, the reliability of the door system will undoubtedly increase as the majority of failures can very likely be detected and rectified with certain mileage-based maintenance tasks at the periodicities given. However, a further study will be required to assess the performance of the proposed maintenance programme in terms of system availability, service reliability and safety and cost of IMR.

5 Conclusions and Future Work

In the current study, a failure mode, effects and criticality analysis (FMECA)-based approach was presented to identify, analyse and evaluate the potential risks associated with unexpected failure of rolling stock components. The criticality level of a rolling stock failure is calculated by multiplying the likelihood of occurrence of the failure mode (O) and the severity of damage caused by the failure (S), each being rated with a number from 1 to 10 (1 = lowest, 10 = highest). The failure modes according to the level of their criticality were categorised into five classes, namely very low, low, medium, high and very high critical. The most critical failure modes in the system with respect to both reliability and economic criteria were identified and possible methods for mitigation were discussed.

The analysis model was applied to the passenger door unit of a fleet of 38 Class 380 trains operating on Scotland’s railway network. The data required for the analysis were collected from the literature, the company’s maintenance management software system called EQUINOX, the UK’s railway performance management software DATASYS BUGLE and the UK’s train movements monitoring system called TRUST. The five why’s technique was used to identify the potential failure modes of door unit components and their root causes, including the defects in relation to pushbuttons, door control unit (DCU), mechanical failures, light barrier, door drive, guard operating panel (GOP), limit switches, loose plugs, obstructions, door roller and lubrication. The results of the risk evaluation showed that the nine failure modes (12 % of the total number of failure modes identified) are “high critical” to door system functionality. The results of this study were used not only for assessing the performance of current maintenance practices, but also to plan a cost-effective preventive maintenance (PM) programme for different components of rolling stock. To avoid the recurrence of the failure modes, a new mileage-based preventive maintenance (PM) programme including 14 tasks was proposed.

There is a wide scope for future research in the area of risk analysis in relation to railway rolling stock failures. Some of the possible extensions of the present work are as follows:

  1. a.

    proposition of a multiple criteria FMECA approach for risk evaluation of different rolling stock components;

  2. b.

    evaluation of the cost effectiveness of PM programmes for rolling stock with respect to risk evaluations (see [25]);

  3. c.

    development of a more quantitative approach to characterise the likelihood that a rolling stock failure may occur and the impact of likely consequences.