Abstract
Assessment of system availability usually uses either an analytical (e.g., Markov/semi-Markov) or a simulation approach (e.g., Monte Carlo simulation-based). However, the former cannot handle complicated state changes and the latter is computationally expensive. Traditional Bayesian approaches may solve these problems; however, because of their computational difficulties, they are not widely applied. The recent proliferation of Markov Chain Monte Carlo (MCMC) approaches have led to the use of the Bayesian inference in a wide variety of fields. This study proposes a new approach to system availability assessment: a parametric Bayesian approach using MCMC, an approach that takes advantages of the analytical and simulation methods. By using this approach, mean time to failure (MTTF) and mean time to repair (MTTR) are treated as distributions instead of being “averaged”, which better reflects reality and compensates for the limitations of simulation data sample size. To demonstrate the approach, the paper considers a case study of a balling drum system in a mining company. In this system, MTTF and MTTR are determined in a Bayesian Weibull model and a Bayesian lognormal model respectively. The results show that the proposed approach can integrate the analytical and simulation methods to assess system availability and could be applied to other technical problems in asset management (e.g., other industries, other systems).
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Availability represents the proportion of a system’s uptime out of the total time in service and is one of the most critical aspects of performance evaluation. Availability is commonly measured as Mean Time to Failure (MTTF) and Mean Time to Repair (MTTR). However, those “mean” values are normally “averaged”; thus, some useful information (e.g., trends, system complexity) may be neglected, and some problems may even be hidden.
Assessment of system availability has been studied from the design stage to the operational stage in various system configurations (e.g., in series, parallel, k-out-of-n, stand-by, multi-state, or mixed architectures). Approaches to assessing system availability mainly use either analytic or simulation techniques.
In general, analytic techniques represent the system using direct mathematical solutions from applied probability theory to make statements on various performance measures, such as the steady-state availability or the interval availability (Dekker and Groenendijk 1995; Ocnasu 2007). Researchers tend to use Markov models to assess dynamic availability or semi-Markov models using Laplace transforms to determine average performance measures (Dekker and Groenendijk 1995; Faghih-Roohi et al. 2014). However, such approaches have been criticised as too restrictive to tackle practical problems; they assume constant failure and repair rates which is not likely to be the case in the real world (Raje et al. 2000; Marquez et al. 2005). Furthermore, the time dependent availability obtained by a Markovian assumption is actually not valid for non-Markovian processes (Raje et al. 2000).
Simulation techniques estimate availability by simulating the actual process and random behaviour of the system. The advantage is that non-Markov failures and repair processes can be modelled easily (Raje et al. 2000). Recent research is working on developing Monte Carlo techniques to model the behaviour of complex systems under realistic time-dependent operational conditions (Marquez et al. 2005; Marquez and Iung 2007; Yasseri and Bahai 2018) or to model multi-state systems with operational dependencies (Zio et al. 2007). Although simulation is more flexible, it is computationally expensive.
Traditionally, Bayesian approaches have been used to assess system availability as they can solve the problem of complicated system state changes and computationally expensive simulation data; however, their development and application were stalled by the strict assumptions on prior forms and by computational difficulties. Research is more concerned with the prior’s selection or the posterior’s computation than the reality (Brender 1968a, b; Kuo 1985; Sharma and Bhutani 1993; Khan and Islam 2012).
The recent proliferation of Markov Chain Monte Carlo (MCMC) simulation techniques has led to the use of the Bayesian inference in a wide variety of fields. Because of MCMC’s high dimensional numerical integral calculation (Lin 2014), the selection of prior information and descriptions of reliability/maintainability can be more flexible and more realistic.
This study proposes a new approach to system availability assessment: a parametric Bayesian approach with MCMC, with a focus on the operational stage, using both analytical and simulation methods. MTTF or MTTR are treated as distributions instead of being “averaged” by point estimation, and this is closer to reality; in addition, the limitations of simulation data sample size are addressed by using MCMC techniques.
The rest of this paper is organized as follows. Section 2 describes the problem statement, the balling drum system, the data preparation, and the preliminary analysis of failure and repair data. Section 3 proposes a Bayesian Weibull model for MTTF and a Bayesian lognormal model for MTTR and explains how to use an MCMC computational scheme to obtain the parameters’ posterior distributions. Section 4 presents a case study, results, and discussion. Section 5 offers conclusions and suggestions for further study.
2 Problem statement
This section presents the study problem statement, the balling drum system and its configuration, the system availability framework, and data preparation; it performs a preliminary analysis of failure and repair data based on which parametric Bayesian models are constructed subsequently.
2.1 Balling drum systems in the mining industry
Our study is motivated by a balling drum system in the mining industry. The case study mine consists of five balling drums, labelled 1–5 (see Fig. 1). All five balling drums receive their feed for production in the same manner. Each balling drum is expected to produce the same amount of pellets at its maximum. According to the working mechanism and an i.i.d test, they are regarded as independent; if one of the balling drums breaks down, it does not affect the rest of the balling drums, except that total production will be reduced. One assumption is made here that the system will fail only if all subsystems fail; therefore, it is treated as a parallel system.
The availability of a single balling drum, denoted as A, can be computed by
According to Fig. 1, the five balling drums are in parallel. The total system availability, \({\text{A}}_{\text{system}}\), can be calculated as
2.2 Data preparation and preliminary analysis
The study uses the failure and repair data of the five balling drums from January 2013 to December 2018. There are 1782 records. In the first step, the null values are removed, and the data are reduced to 1774 records.
The next step reveals there are different reasons for the TTF and TTR of individual balling drums. It is noticed that, for TTR data, if 150 shutdowns are considered normal (denoted as a threshold, see Fig. 2), then those exceeding 150 should be treated as abnormal and investigated using Root Cause Analysis (RCA).
After checking the work order types of such kind of abnormal data, it is found that most of them are caused by “preventive maintenance” which may due to lack of maintenance resources. To simplify the study, we assume all maintenance resources are sufficient for “preventive maintenance”; thus, the abnormally data might be caused by shortage of spare parts or skilled personnel will not be treated specially in this paper.
To determine the baseline distribution of Time to Failure (TTF) and Time to Repair (TTR), we conduct a preliminary study of failure data and repair data using traditional analysis. In this preliminary study, several distributions are considered: exponential distribution, Weibull distribution, normal distribution, log-logistic distribution, lognormal distribution, and extreme value distribution. Table 1 lists the results.
Based on the results, the Weibull distribution and lognormal distribution are selected for the TTF and TTR for balling drums 1–5; these are applied to the parametric Bayesian models in the next section.
3 Parametric Bayesian Models
This section proposes a Bayesian Weibull model for TTF and a Bayesian lognormal model for TTR in the proposed parametric Bayesian models and explains the procedure of MCMC computational scheme to obtain the posterior distributions.
3.1 Markov Chain Monte Carlo with Gibbs sampling
The recent proliferation of Markov Chain Monte Carlo (MCMC) approaches has led to the use of the Bayesian inference in a wide variety of fields. MCMC is essentially Monte Carlo integration using Markov chains. Monte Carlo integration draws samples from the required distribution and then forms sample averages to approximate expectations. MCMC draws out these samples by running a cleverly constructed Markov chain for a long time. There are many ways of constructing these chains. The Gibbs sampler is one of the best known MCMC sampling algorithms in the Bayesian computational literature. It adopts the thinking of “divide and conquer”: i.e., when a set of parameters must be evaluated, the other parameters are assumed to be fixed and known. Let \(\uptheta_{\text{i}}\) be an i-dimensional vector of parameters, and let \({\text{f}}\left( {\uptheta_{\text{j}} } \right)\) denote the marginal distribution for the jth parameter. The basic scheme of the Gibbs sampler for sampling from \({\text{p}}\left(\uptheta \right)\) is given as follows:
-
Step 1. Choose an arbitrary starting point \(\theta^{\left( 0 \right)} = \left( {\theta_{1}^{\left( 0 \right)} , \ldots ,\theta_{k}^{\left( 0 \right)} } \right)\);
-
Step 2. Generate \(\theta_{1}^{\left( 1 \right)}\) from the conditional distribution \(f\left( {\theta_{1} |\theta_{2}^{\left( 0 \right)} , \ldots ,\theta_{k}^{\left( 0 \right)} } \right)\), and generate \(\theta_{2}^{\left( 1 \right)}\) from the conditional distribution distribution \(f\left( {\theta_{2} |\theta_{1}^{\left( 1 \right)} ,\theta_{3}^{\left( 0 \right)} , \ldots ,\theta_{k}^{\left( 0 \right)} } \right);\)
-
Step 3. Generate \(\theta_{j}^{\left( 1 \right)}\) from \(f\left( {\theta_{j} |\theta_{1}^{\left( 1 \right)} , \ldots ,\theta_{j - 1}^{\left( 1 \right)} ,\theta_{j + 1}^{\left( 1 \right)} \ldots ,\theta_{k}^{\left( 0 \right)} } \right)\);
-
Step 4. Generate \(\theta_{k}^{\left( 1 \right)}\) from \(f\left( {\theta_{k} |\theta_{1}^{\left( 1 \right)} ,\theta_{2}^{\left( 1 \right)} , \ldots ,\theta_{k - 1}^{\left( 1 \right)} } \right)\); the one-step transition from \(\theta^{\left( 0 \right)}\) to \(\theta^{\left( 1 \right)} = \left( {\theta_{1}^{\left( 1 \right)} , \ldots ,\theta_{k}^{\left( 1 \right)} } \right)\) has been completed, where \(\theta^{\left( 1 \right)}\) is a one-time accomplishment of a Markov chain.
-
Step 5. Go to Step2.
After \({\text{t}}\) iterations, \(\uptheta^{{\left( {\text{t}} \right)}} = \left( {\uptheta_{1}^{{\left( {\text{t}} \right)}} , \ldots ,\uptheta_{\text{k}}^{{\left( {\text{t}} \right)}} } \right)\) can be obtained. Each component of \(\uptheta\) can also be obtained. Starting from different \(\uptheta^{\left( 0 \right)}\), as \({\text{t}} \to \infty\), the marginal distribution of \(\uptheta^{{\left( {\text{t}} \right)}}\) can be viewed as a stationary distribution based on the theory of the ergodic average. Then, the chain is seen as converging, and the sampling points are seen as observations of the sample.
3.2 Bayesian Weibull model for TTF
Suppose the time to failure (TTF) data \({\text{t}} = \left( {{\text{t}}_{1} ,{\text{t}}_{2} , \ldots ,{\text{t}}_{\text{n}} } \right)^{\prime}\) for \({\text{n}}\) individuals are i.i.d, and each corresponds to a 2-parameter Weibull distribution \({\text{W}}\left( {\upalpha,\upgamma} \right)\), where \(\upalpha > 0\) and \(\upgamma > 0\). Then, the p.d.f. is \({\text{f}}\left( {{\text{t}}_{\text{i}} |\upalpha,\upgamma} \right) =\upalpha \upgamma {\text{t}}_{\text{i}}^{{{\upalpha} - 1}} { \exp }\left( { - {\upgamma \text{t}}_{\text{i}}^{{\upalpha}} } \right)\), while the c.d.f. is \({\text{F}}\left( {{\text{t}}_{\text{i}} |{\upalpha},{\upgamma}} \right) = 1 - { \exp }\left( { - {\upgamma \text{t}}_{\text{i}}^{{\upalpha}} } \right)\). The reliability function is \({\text{R}}\left( {{\text{t}}_{\text{i}} |{\upalpha},{\upgamma}} \right) = { \exp }\left( { - {\upgamma \text{t}}_{\text{i}}^{{\upalpha}} } \right)\).
Denote the observed data set as \({\text{D}}_{0} = \left( {{\text{n}},{\text{t}}} \right).\) Therefore, the likelihood function for \({\upalpha}\) and \({\upgamma}\) is
In this study, we assume \(\upalpha\) to be a gamma distribution (Kuo 1985), denoted by \({\text{G}}\left( {{\text{a}}_{0} ,{\text{b}}_{0} } \right)\) as its prior distribution, written as \({\uppi}\left( {{\upalpha}|{\text{a}}_{0} ,{\text{b}}_{0} } \right)\); we assume \({\upgamma}\) to be a gamma distribution denoted by \({\text{G}}\left( {{\text{c}}_{0} ,{\text{d}}_{0} } \right)\) as its prior distribution, written as \({\uppi}\left( {{\upgamma}|{\text{c}}_{0} ,{\text{d}}_{0} } \right).\) This means
Therefore, the joint posterior distribution can be obtained according to Eqs. (3)–(5) as
and the parameters’ full conditional distribution with Gibbs sampling can be written as
3.3 Bayesian Lognormal model for TTR
Suppose the time to repair (TTF) data \({\text{t}} = \left( {{\text{t}}_{1} ,{\text{t}}_{2} , \ldots ,{\text{t}}_{\text{n}} } \right)^{\prime}\) for \({\text{n}}\) individuals are i.i.d., and each \({ \ln }\left( {\text{t}} \right)\) corresponds to a normal distribution, \({\text{N}}\left( {{\upmu},{\upsigma}^{2} } \right)\). We can get \({\text{t}}_{\text{i}}\)’s lognormal distribution with parameters \({\upmu}\) and \({\upsigma}^{2}\). Then, the p.d.f. and c.d.f. are given by Eqs. (9) and (10):
Denote the observed data set as \({\text{D}}_{0} = \left( {{\text{n}},{\text{t}}} \right)\). Therefore, according to Eq. (9), the likelihood function for \({\upmu}\) and \({\upsigma}\) becomes
In this study, we assume \({\upmu}\) to be a normal distribution denoted by \({\text{N}}\left( {{\text{e}}_{0} ,{\text{f}}_{0} } \right)\) as its prior distribution, written as \({\uppi}\left( {{\upmu}|{\text{e}}_{0} ,{\text{f}}_{0} } \right)\); we assume \({\upsigma}\) to be a gamma distribution denoted by \({\text{G}}\left( {{\text{g}}_{0} ,{\text{h}}_{0} } \right)\) as its prior distribution, written as \({\uppi}\left( {{\upsigma}|{\text{g}}_{0} ,{\text{h}}_{0} } \right).\) This means
Therefore, the joint posterior distribution can be obtained according to Eqs. (11)–(13) as
Then, the parameters’ full conditional distribution with Gibbs sampling can be written as
4 Case study
This section presents a case study; it explains the procedure, gives the results, and offers a discussion.
4.1 The procedure
The procedure applied in this case study to assess the system availability of the mine’s five balling drums has a total of seven steps, as described in Table 2.
4.2 Results
In this case study, the calculations are implemented with WINBUGS. A three-chain Markov chain is constructed for each MCMC simulation. A burn-in of 1000 samples is used, with an additional 10,000 Gibbs samples for each Markov chain.
Vague prior distributions are adopted as follows:
-
For Bayesian Weibull model using TTF data:
$$\alpha \sim G\left( {0.0001,0.0001} \right),\quad \gamma \sim G\left( {0.0001,0.0001} \right)$$ -
For Bayesian lognormal model using TTR data:
$$\mu \sim N\left( {0,0.0001} \right),\quad \sigma \sim G\left( {0.0001,0.0001} \right).$$
Using the convergence diagnostics [i.e. checking dynamic traces in Markov chains, determining time series and Gelman–Rubin–Brooks (GRB) statistics, and comparing MC error with standard deviation (SD)] (Lin 2014), we consider the following posterior distribution summaries for our models (see Tables 3, 4), including the parameters’ posterior distribution mean, SD, Monte Carlo error (MC error), and 95% highest posterior distribution density (HPD) interval.
Using the results from Tables 3 and 4, we calculate the availability of individual balling drums in Table 5, where MTTF = \({\text{E}}\left[ {{\text{f}}\left( {{\text{t}}_{\text{i}} |{\upalpha},{\upgamma}} \right)} \right]\), and MTTR = \({\text{E}}\left[ {{\text{f}}\left( {{\text{t}}_{\text{i}} |{\upmu},{\upsigma}^{2} } \right)} \right]\).
According to Eq. (2), the system availability of the five balling drums is
4.3 Discussion
Compared to the traditional method of assessing availability in Eq. (1), the proposed approach extends the method to Eq. (17), where
Equation (17) shows the flexibility of assessing availability according to reality. For one thing, the parametric Bayesian models using MCMC make the calculation of posteriors more feasible. More importantly, however, parametric Bayesian models can be applied to predict TTF, TTR, and system availability in the future.
In this study, since the five balling drums are relatively new, the gamma distributions and normal distributions are selected as vague priors due to lack of prior information. This could be improved with more historical data/experience.
The system configurations could be extended to other more complex architectures (series, k-out-of-n, stand-by, multi-state, or mixed) by modifying Eq. (2).
The data analysis reveals that for TTF data, the shape parameter for the Weibull distribution is less than 1. The TTFs have a decreasing trend (as in an early stage of the bathtub curve) which is not suitable for the experience of mechanical equipment. The TTF data include not only corrective maintenance but also preventive maintenance. In this case study, a high percentage of TTF work orders are for preventive maintenance. The decreasing trends also indicate that a possible way to improve TTF is to improve the preventive maintenance plan.
Among those three stages, Step 1 to Step 4 can be treated as Plan stage; Step 5 and Step 6 as Do and Check stage, while Step 7 as Action stage. The outputs from Step 7 could become input for Step 2 for the next calculation period. It means these eight steps are following the “PDCA” cycle and the results could be continuously improved.
5 Conclusions
This study proposes a parametric Bayesian approach for system availability assessment on the operational stage. MCMC is adopted to take advantages of the analytical and simulation methods.
In this approach, MTTF and MTTR are treated as distributions instead of being “averaged” by a point estimation. This better reflects the reality; in addition, the limitations of simulation data sample size are compensated for by MCMC techniques.
In the case study, TTF and TTR are determined using a Bayesian Weibull model and a Bayesian lognormal model. The results show that the proposed approach can integrate the analytical and simulation methods for system availability assessment and could be applied to other technical problems in asset management (e.g., other industries, other systems).
References
Brender DM (1968a) The Bayesian assessment of system availability: advanced applications and techniques. IEEE Trans Reliab 17(3):138–147
Brender DM (1968b) The prediction and measurement of system availability: a Bayesian treatment. IEEE Trans Reliab 17(3):127–138
Dekker R, Groenendijk W (1995) Availability assessment methods and their application in practice. Microelectron Reliab 35(9–10):1257–1274
Faghih-Roohi S, Xie M, Ng KM, Yam RC (2014) Dynamic availability assessment and optimal component design of multi-state weighted k-out-of-n systems. Reliab Eng Syst Saf 123:57–62
Khan MA, Islam H (2012) Bayesian analysis of system availability with half-normal life time. Qual Technol Quant Manag 9(2):203–209
Kuo W (1985) Bayesian availability using gamma distributed priors. IIE Trans 17(2):132–140
Lin J (2014) An integrated procedure for bayesian reliability inference using Markov Chain Monte Carlo methods. J Qual Reliab Eng 2014:1–16
Marquez AC, Iung B (2007) A structured approach for the assessment of system availability and reliability using Monte Carlo simulatoin. J Qual Maint Eng 13(2):125–136
Marquez AC, Heguedas AS, Iung B (2005) Monte Carlo-based assessment of system availability. A case study for cogeneration plants. Reliab Eng Syst Saf 88:273–289
Ocnasu AB (2007) Distribution system availability assessment—Monte Carlo and antithetic variates method. In: 19th international conference on electricity distribution, Vienna
Raje D, Olaniya R, Wakhare P, Deshpande A (2000) Availability assessment of a two-unit stand-by pumping system. Reliab Eng Syst Saf 68:269–274
Sharma K, Bhutani R (1993) Bayesian analysis of system availability. Microelectron Reliab 33(6):809–811
Yasseri SF, Bahai H (2018) Availability assessment of subsea distribution systems at the architectural level. Ocean Eng 153:399–411
Zio E, Marella M, Podofillini L (2007) A Monte Carlo simulation approach to the availability assessment of multi-state system with operational dependencies. Reliab Eng Syst Saf 92:871–882
Acknowledgements
The motivation for the research originated from the project “Key Performance Indicators (KPI) for control and management of maintenance process through eMaintenance (In Swedish: Nyckeltal för styrning och uppföljning av underhållsverksamhet m h a eUnderhåll)”, which was initiated and financed by LKAB. The authors wish to thank Ramin Karim, Peter Olofsson, Mats Renfors, Sylvia Simma, Maria Rytty, Mikael From and Johan Enbak, for their support for this research in the form of funding and work hours.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Saari, E., Lin, J., Zhang, L. et al. System availability assessment using a parametric Bayesian approach: a case study of balling drums. Int J Syst Assur Eng Manag 10, 739–745 (2019). https://doi.org/10.1007/s13198-019-00803-y
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13198-019-00803-y