1 Introduction

Since the first subway line was put into operation in October 1969, there are more than 20 cities owned their subway systems in China, with a total operating mileage over 2400 km. Chinese subway companies have accumulated large amounts of failure data up to now, these data truly reflect field operating conditions. However, there are certain shortcomings in the data, much of it does not comply with uniform standards or is derived from complex data resources, and it may be missing important information [1, 2]. A large-scale subway system in China requires the successful prevention of major accidents and sudden incident; otherwise, catastrophic results might occur. Therefore, how to analyze and deal with such complex large-scale operation failure data, to ensure the safety of urban rail transit has become a major research topic in the field of subway reliability research.

Wang et al. presented the service life estimation method based on the three-parameter Weibull maximum likelihood estimation, respecting to the component wearing of high speed multiple units [3]. A new product data management method was created in [4] to process the component maintenance and historical failure data of electric multiple units, which resulted in a 30% increase in the reliability. Others performed the reliability analysis in [5] for the Bogie system of Sweden’s railways based on data collected by a wireless sensor network. The problem with this method was that the uncertainty in the time domain was not considered. In [6], they adopted the random process and reliability theory to investigate the failure distribution rules and reliability of rail vehicle components [6]. Yu et al. deduced the safety domain curve of high-speed trains through deducting the extreme sensitivity of system reliability [7]. Some articles used the nonparametric method to estimate the reliability function of the mechanism under extreme impact [8, 9], and Jiang analyzed the application of the proportional risk function in a repairable system [10].

The existing failure-data-based reliability analyses were mainly focused on railway passenger and freight vehicles and high-speed train. However, little analyses attention has been paid to the subway system. In essence, the subway is different from the railway in various aspects, such as the departure intervals, operating cycle, line conditions, the failure position, frequency, and maintenance data.

Parameter estimation method of reliability can only be used in known lifetime distribution. Unknown distribution usually uses probabilistic paper graph method and similar WPP graphic estimation method to study the distribution; these methods need to draw the curve of reliability and failure time. By further studying the shape of the curve, the reliability model of the failure data is determined. If the distribution model of a group failure data is not known, survival analysis theory can assume the group failure data conforms to all models, then each distribution model is fitted and the best fitting distribution model is selected, Finally, the parameter estimation and hypothesis testing are carried out. Survival analysis method can effectively solve uncertain failure time interval problems under the mechanism of censored data on subway vehicles, in order to get more reasonable results of reliability analysis. Therefore, we used survival analysis technology to perform the reliability analysis of the subway vehicles for the purpose of accurately grasping the working status of key subway systems, including identifying failures, performing maintenance, and securing the subway’s operation. The survival analysis method has particular advantages in the processing and analysis of censored data during the application of non-parametric, parametric, and semi-parametric survival analysis.

2 Fault distribution model and methods analysis

2.1 Survival analysis

Survival analysis is a technology of statistical analysis about survival time. Based on data collected via experiment or survey, it statistically analyzes the survival time of living creatures, human, or other things with a survival cycle and represents the results in the form of a survival function, probability density function, danger scale function, and average life [11, 12].

2.1.1 Survival function

Survival function which is also called reliability function is defined as

$$ R\left({t}_j\right)=S\left({t}_j\right)=P\left(T>{t}_j\right)=1-P\left(T\le {t}_j\right)=1-F\left({t}_j\right) $$
(1)

On the equation, R(t j ) is reliability function and S(t j ) is survival function. The probability that the individual failure interval T is greater than t j , which has the following properties R(0) = 1 and R(∞) = 1. F(t j ) is unreliability function. It indicates the probability that product is unable to complete the function under the specified time and conditions. F(t j ) is the distribution function of T [13].

$$ F\left({t}_j\right)=P\left(T\le {t}_j\right)=\frac{n(t)}{N},0\le t\le \infty $$
(2)

On the equation, N is the product sample and n(t) are the numbers of failure at samples time.

2.1.2 Probability density function

Probability density function p(t j ) is the ratio of the failure numbers D j and the total observation numbers N that during the period t j − 1 to t j .

$$ p\left({t}_j\right)=f\left({t}_j\right)=p\left(T={t}_j\right)=\frac{D_j}{N},1\le j\le n $$
(3)

2.1.3 Danger scale function

Danger scale function λ(t j ) represents the instantaneous failure rate of the observing objects at the moment t j which is not failure at the moment t j − 1 . It is also called damage function and the failure rate function. It is used to measure whether an individual is prone to fail at some time [14].

$$ \lambda \left({t}_j\right)=P\left(T={t}_j|T\ge {t}_j\right)=p\left({t}_j\right)/\left[{S}_{\left({t}_{j-1}\right)}\right] $$
(4)

On the equation, there is the following relationship

$$ S\left({t}_j\right)={\prod}_{t_i\le t}\left[1-\lambda \left({t}_j\right)\right] $$
(5)

2.1.4 Average life

Average life means trouble-free working time of product. For repairable products, average life is the mean operating time between failures [15].

$$ u=E(t)={\sum}_{j=1}^nS\left({t}_j\right)\left({t}_j-{t}_{j-1}\right) $$
(6)

2.2 Model building and methods

Figure 1 shows a flowchart of reliability analysis via the determination of failure distribution model using survival analysis theory.

Fig. 1
figure 1

Flowchart of determination of the failure distribution model

2.2.1 Fault data collection and pretreatment

In terms of fault data collection and pretreatment, we use statistics method, eliminate or merge the fault entry, and eventually determine the effective subway vehicles failure data.

2.2.2 Candidate distributions

A large number of articles were reviewed to determine the candidate distributions, including the exponential distribution, logarithmic normal distribution, two-parameter Weibull distribution, and three-parameter Weibull distribution [16].

2.2.3 The maximum likelihood estimation

The maximum likelihood estimation method was used in this study for the parameter estimation of the optimal distribution. The basic principle of this method is as follows: assuming the known population distribution and an unknown parameter θ, one value \( \hat{\theta} \) is chosen from all possible values, which can result in the maximal probability of the observed results. \( \hat{\theta} \) is then defined as the maximum likelihood estimation value of θ, and the parameter estimation method was named as maximum likelihood estimation method [17].

X 1, X 2,…, X n are samples from the X, thus the joint density of X 1, X 2,…, X n is

$$ {\prod}_{i=1}^nf\left({x}_i,\theta \right) $$
(7)

x 1, x 2,…, x n is a sample value corresponding to the sample X 1, X 2,…, X n , the function is

$$ L\left(\theta \right)=L\left({x}_1,{x}_2,\cdots, {x}_n;\theta \right)={\prod}_{i=1}^nf\left({x}_{i,\theta}\right) $$
(8)

L (θ) is called the likelihood function of the sample. If

$$ L\left({x}_1,{x}_2,\cdots, {x}_n;\widehat{\theta}\right)={\max}_{\theta \in \Theta}L\left({x}_1,{x}_2,\cdots, {x}_n;\theta \right) $$
(9)

The \( \widehat{\theta}\left({x}_1,{x}_2,\cdots, {x}_n\right) \) is called the maximum likelihood estimation of θ.

Thus, the problem to determine the maximum likelihood estimation is attributed to seek the maximum in the differential calculus problem.

In many cases, f (x i , θ) is differentiable on θ, \( \widehat{\theta} \) served from the equation

$$ \frac{d}{d\theta}L\left(\theta \right)=0 $$
(10)

2.2.4 Degree of fitting

For the degree of fitting and hypothesis testing in the candidate model, Minitab software was used to perform the A-D (Anderson-Darling) test to verify the effectiveness of the models. The statistics from the A-D test can be used to compare the fitting condition of several distributions, thereby identifying the optimal distribution. In engineering practice, the A-D test statistical variable A2 can be calculated from common discrete expressions (11). Specifically, it is the weighted square distance between data points and the fitting curve. The closer to the end of distribution the point is, the bigger the weight becomes [18, 19]. Hence, a small A2 represents a higher degree of fitting, the expression is:

$$ {A}^2=-n-\frac{1}{n}{\sum}_{i=1}^n\left(2i-1\right)\left[{\ln}^{F\left({x}_i\right)}+{\ln}^{\left(1-F\left({x}_{n-i+1}\right)\right)}\right] $$
(11)

On the equation, n is the sample size and F(xi) is the empirical cumulative distribution function obeying to the normal distribution.

$$ F\left({x}_i\right)=\phi \left(\frac{x_i-\overline{x}}{\sigma}\right) $$
(12)

First, the p values of the four candidate distributions were compared. If p > 0.05, it indicated that the corresponding distribution was able to fit the failure data. The distributions with good fitting results were preserved, and then the A-D statistical variable was calculated. The distribution with a minimum A-D value was chosen as the optimal distribution model.

3 Example analysis and results

3.1 Fault data statistics

The structure of subway vehicles includes the running gear, traction system, brake system, control and diagnostic system, and the auxiliary system. All of these subsystems play a significant role in the vehicle’s reliability and safe operation. There are frequent subway failures and accidents due to the rapid development of the urban subway transportation system. Therefore, we investigated the reliability of the key subsystems in subway vehicles in this study.

The original data of operational failures covered the above five systems were screened and calculated statistically, including a total of 8000 entries from January 2009 to December 2013. Each failure was recorded with its number, date of occurrence, vehicle number, failure description, and failure consequences. Figure 2 shows the statistics of the annual failures about each subsystem, vehicle door system, the illumination system, and other incorporate into the auxiliary system. Figure 3 shows the statistics of the data. By sorting the data based on the number of failures, we found that most failures were related to the auxiliary systems, followed by the traction system, running gear, braking system, and control and diagnostic systems.

Fig. 2
figure 2

Annual fault distribution diagram of key system

Fig. 3
figure 3

Statistics of the operational failure of subway key components

Because most of the subway vehicles system life distribution data is censored data, we use the survival analysis in the system time between failures to process censored data. In the fault data statistics, censored data mainly includes two categories. One kind is interval-censored data, if the maintenance work is reliable and the failure occurs between the overhaul and the last overhaul, so fault time is an interval, uncertain value, and fault specific time unknown. One kind is the right censored data, statistical period of the beginning and the end will have censored data, and fault time is greater than a certain value of tracked. We use common failure distribution function on censored data for maximum likelihood method of parameter estimation to calculate A-D statistics to select fitting of better distribution function.

3.2 Result and discussion

Based on the screened data, the operating time between failures was calculated and imported into Minitab [20, 21]. The “Reliability/Survival Statistics” tool was used to perform the maximum likelihood estimation for four candidate distributions (the exponential distribution, logarithmic normal distribution, two-parameter Weibull distribution, and three-parameter Weibull distribution). Figure 4 shows the fitting graph of the service life distribution for traction system. Table 1 presents the p value, A-D statistical variable, screen for the optimal distribution, and parameter estimation obtained from the maximum likelihood estimation method.

Fig. 4
figure 4

Degree of fitting of the service life distribution of traction system

Table 1 Fault distribution fit test table of the key subsystems

The parameter value of the maximum likelihood estimation that meets the distribution of the operating time between failures in Table 1; every subsystem was substituted into the reliability characteristic functions of the optimal distribution, thereby allowing for the derivation of the reliability characteristic functions of each subsystem (failure density function, cumulative distribution function, reliability function, and failure rate function). The mean time between failures was based on the operating time between failures; it was calculated and is shown in Tables 2 and 3. Similarly, the graph of the reliability characteristic function of traction system was plotted, as shown in Fig. 5.

Table 2 The reliability characteristic functions of each subsystem
Table 3 The reliability characteristic functions of each subsystem
Fig. 5
figure 5

Reliability characteristic functions of traction system

Tables 2 and 3 shows that the mean operating time between failures for the running gear, traction system, brake system, control and diagnostic system, and auxiliary systems was 14, 11, 25, 32, and 7 days. The mean operating time between failures, namely, the failure rate, increased in the following order: auxiliary systems, traction system, running gear, brake system, and control and diagnostic systems. These results are consistent with the number of failures collected from the field data.

The reliability characteristic function model can be used to predict various reliability characteristics, such as the reliability, unreliability, and the mean time between failures. In addition, the subway system can reduce the occurrence of incidents by mean of vehicle maintenance schedules according to the characteristic variables. For example, assuming that the reliability of the running gear of a subway vehicle should be above 95%, R(t= 0.95 was substituted into the reliability characteristic function of the running gear in Tables 2 and 3. We can get the formula as follows:

$$ t=13.5450{\left[\mathit{\ln}\frac{1}{R(t)}\right]}^{\frac{1}{0.9124}}=0.5224 $$
(13)
$$ F(t)=1-\exp \left[-{\left(\raisebox{1ex}{$t$}\!\left/ \!\raisebox{-1ex}{$13.5450$}\right.\right)}^{0.0124}\right]=0.05 $$
(14)
$$ \lambda (t)=\frac{0.9124}{13.5450}{\left(\raisebox{1ex}{$t$}\!\left/ \!\raisebox{-1ex}{$13.5450$}\right.\right)}^{-0.0876}=0.0896 $$
(15)

It can be concluded that maintenance should be scheduled every other day in order to meet the reliability requirements of the running gear. Similarly, the maintenance plan for the other subsystem can be formulated.

4 Conclusions

Based on the operational failure data of subway vehicles, a reliability analysis method of subway subsystems was developed based on the survival analysis theory. By filtering, classification, and the preprocessing of the failure data, the numbers of failure and mean operating time between failures were obtained for each subsystem. The results showed that the failure rate increased in the following order: auxiliary systems, traction system, running gear, brake system, and control and diagnostic systems. The optimal failure distribution model of every subsystem was determined by the use of Minitab. We can formulate the vehicle maintenance schedule to direct our daily maintenance work, which could observably reduce the failure of subsystem.

The reliability characteristic functions can be used to obtain a scientific estimation of the reliability characteristic variables. As the rapid construction and increasingly complex of domestic subway system, reliability characteristic function for future subway has guiding significance to the construction and systemic maintenance. In the future, reliability analysis of the subway will get widespread attention and long-term development.

Due to the limitation of time and ability, this article only focuses on the subject of each subsystem. We will analyze the reliability of the specific components to find fault specific reason and provide guidance for train maintenance to reduce the incidence of failure.