1 Introduction

The power system usually faces many uncertain factors, such as the variation of the load, the changes of the weather, and the unexpected outage of the power equipment, which bring great risk to the system operation and control. Risk assessment has become an important tool to support the decision making of the power system nowadays, and the accuracy of the risk assessment depends greatly on the outage model of the power equipment.

The outage model describes the stochastic behavior of the power equipment under various factors, and there have been many research achievements in this area.

The Markov process based model is the most commonly used mathematical model which can be solved analytically [1, 2]. Various states, such as normal, abnormal, outage and so on which provide a complete description of the stochastic process of equipment can be contained in the state diagram of Markov model [3]. The Markov model is very suitable for the case that all factors which influence the equipment are random. The two-state Markov model is the simplest outage model to describe the alternation process of operating and failure, which has been used widely in traditional reliability analysis [4]. In order to demonstrate the equipment behavior more accurately, the operating and repair state has been classified further into several sub-states. In [58], the outage model is represented by the state transition diagram which contains deterioration, inspection, maintenance and other states. The equipment is inspected and maintained periodically and the optimal inspection or maintenance frequency is calculated to obtain the minimal cost or the maximal availability. Similar results can also be found in many other areas [911]. In [12, 13], the authors point out that the classical maintenance model may be inaccurate compared to the real world especially when the inspection rates are non-periodic, and a new Markov state diagram is proposed to solve the problem, the basic idea of which is to divide the original deterioration process into several sub deterioration processes. Beside the Markov models, there are also other outage models which demonstrate the deterioration and repair process of equipment in different ways, for instance, the Kijima I and II models [1417], with much more amount of calculation.

Most of the existing literatures relating to the outage model only focus on the steady state of the equipment; namely, the state probability of the equipment is constant and does not change with time. This kind of models is applicable to the long time scale (usually several years or longer) problems, such as the reliability analysis, capacity expansion, network planning and so on, in which the steady value of the state probability of power equipment is accurate enough. But in the short term problems (usually several weeks or months), the use of steady outage model may bring significant error, which will be explained in the next section. However, there are few literatures concerning about the special features of the outage model used in the short term problems.

Generally speaking, there are two kinds of features in the outage model used in the short term problems.

1) Transient availability

The outage model used in short-term problems should be transient model, which means the probability of states in the model should be time-varying. In the long term problems, the transient process of the state probability of equipment is usually neglected since the period of the transient process is extremely short compared to the whole time scale. However, in the short term problems, the period of the transient process is in the same order of magnitude of the whole time scale to be considered.

2) Coexistence of random factors with deterministic factors

In the short term problems, the deterministic factors and the random factors usually coexist together, both of which affect the behavior of the power equipment together. In the long term problems, almost all the factors, no matter environmental or human, can be regarded as random factors because of the characteristics of long time scale. However, in the short term problems, the factors related to the human subjective intention should be more treated as deterministic factors, such as the maintenance schedule for certain power equipment in the next few weeks. The starting and ending time of these deterministic events should not be changed arbitrarily and randomly according to the common sense and operation characteristics of power system. However, in the traditional steady model, the effect of deterministic factors on the behavior of the equipment can hardly be considered since the impact cannot be reflected on the steady value of the state probability of the equipment.

Based on the above analysis, a new outage model which is applicable to the short term problems is proposed in this paper. The transient probability is considered in this model and both the random and deterministic factors are incorporated. The paper is organized as follows. In Section 2, the comparison of the transient model and steady model is given to show the necessity of the transient probability. The basic Markov model is presented in Section 3, which only considers the random factors same as conventional models for long term problems. The deterministic factors in the short term problems are added to the outage model in Section 4. The comparisons of proposed model and the conventional model are shown through some examples in Section 5. In Section 6 there are some conclusions.

2 Comparison between transient model and steady model

Firstly, a simple example will be given to show the difference between the transient model and steady model, which explains the necessity of the research on transient model.

The most common used outage model in power system is the two-state Markov model, which shows in Fig. 1. The state 0 represents the working state, while the state 1 represents the outage state. λ is failure rate and μ is repair rate.

Fig. 1
figure 1

Two-state Markov outage model

The Forkker-Planck equations of the equipment are given as follows. Here P i (t) means the probability of equipment in state i at time t.

$$\left\{ \begin{aligned} \frac{{{\text{d}}P_{0} \left( t \right)}}{{{\text{d}}t}}& = - \lambda P_{0} \left( t \right) + \mu P_{1} \left( t \right) \hfill \\ \frac{{{\text{d}}P_{1} \left( t \right)}}{{{\text{d}}t}}& = \lambda P_{0} \left( t \right) - \mu P_{1} \left( t \right) \hfill \\ \end{aligned} \right.$$
(1)

It should be noticed that (1) are differential equations, and it is a common sense that the summation of all the state probability equals to 1 at any time. Once the initial state of the equipment is known, (1) can be solved and the expression of P i (t) can be obtained. Suppose the equipment is working at time t = 0, then the expressions of P i (t) can be shown as follows.

$$\left\{ {\begin{array}{*{20}c} {P_{0} \left( t \right) = \frac{\mu }{\lambda + \mu } + \frac{\lambda }{\lambda + \mu }{\text{e}}^{{ - \left( {\lambda + \mu } \right)t}} } \\ {P_{1} \left( t \right) = \frac{\lambda }{\lambda + \mu } - \frac{\lambda }{\lambda + \mu }{\text{e}}^{{ - \left( {\lambda + \mu } \right)t}} } \\ \end{array} } \right.$$
(2)

It is defined that availability A(t) is the probability that the equipment is in working state at time t, and unavailability U(t) is the probability that in outage state at time t. Obviously, in the two-state Markov model, A(t) = P 0(t), U(t) = P 1(t).

The outage model with (1) can be called the transient model, the feature of which is that the availability and unavailability of the equipment vary with time and the expressions of the state probability contain exponential terms. However, usually in the traditional analysis, only the steady model is considered. The steady model is actually the transient case with the time t → ∞. In this case, the differential (1) turn to algebraic equations as follows.

$$\left[ {\begin{array}{*{20}c} 0 & 0 \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {P_{0} \left( t \right)} & {P_{1} \left( t \right)} \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} { - \lambda } & \lambda \\ \mu & { - \mu } \\ \end{array} } \right]$$
(3)

Obviously, the calculation of (3) is much more easier than that of (1), and the availability and unavailability obtained from (3) are A(t) = μ/(λ + μ), U(t) = λ/(λ + μ), both of which are commonly used in the traditional reliability analysis. It should be noted that the steady model can only be used under the premise that the time tends to infinity.

The curves of unavailability of transient model and steady model are shown in Fig. 2. U steady is the steady value of unavailability and

Fig. 2
figure 2

Curves of unavailability of the equipment

From Fig. 2 it is clear that at the beginning of the time period, the difference between transient unavailability and steady unavailability is distinct. The period that the transient value varies with time significantly can be called the transient period. Suppose the criterion for the end of the transient period is |1 − U(t)/U steady| ≤ ε, where ε is the threshold value, then the duration of the transient period can be calculated as follows.

$$T_{\text{transient}} \ge \frac{1}{\lambda + \mu }\ln \frac{1}{\varepsilon }$$
(4)

The duration of the transient period depends on the transition rate λ and μ. According to the common sense in the power system, the maintenance time for the important equipment, such as transformer, may last for several days or even several weeks. Suppose the threshold value ε = 10%, the duration of the transient period of the state probability for the power equipment may be plenty of weeks.

Therefore, whether to choose the transient model in the system analysis depends on the relative magnitude between the duration of transient period and the whole time scale to be studied. In the long term planning of the power system where the time scale usually covers for years, the transient period can be ignored and the steady model of the power equipment can be used. However, in some short term research, such as the maintenance scheduling or the risk assessment in the near future, the time period to be considered is just next month or next quarter. In this case, the used of steady model may bring appreciable error since the state probability of the power equipment is still in the transient period in most of the time. Hence, in this case, the transient model should be considered to describe the behavior of the power equipment in the short term more accurately.

3 Markov-based equipment outage model considering corrective maintenance

The influncing factors of power equipment can be classified into two categories, random and deterministic factors. In this section, a Markov process model which describe the stochastic behavior of power equipment is constructed. Transient availability is considered here to correlate with the first feature proposed in Section 1. In the next section, deterministic factors will be added to the model to reflect the second feature.

Deterioration and failure are two major random factors for power equipment. The maintenance which is carried out when a failure occurs is defined as Corrective Maintenance (CM) [18], the aim of which is to restore the equipment to operable condition. Obviously, CM should be treated as random event and the start time or the duration of CM is unpredictable.

Deterioration of power equipment is an agonizingly slow and irreversible process. From the physical sense, the deterioration process of equipment is a non-Markov process, which means the transition rates between states are time-varying. However, the non-Markov model can hardly be solved analytically. In the traditional research, the deterioration process is usually modeled by multi-state Markov process on analytic calculation grounds.

The alternate process of operation and failure can be expressed by state transition diagram. A multi-state Markov model is shown in Fig. 3, which is the most commonly used model in previous literatures [13].

Fig. 3
figure 3

Multi-state Markov equipment outage model

In Fig. 3, Numbers 1 ~ N are the stages of deterioration and number N + 1 is failure state caused by deterioration. λ ij means the transition rate from state i to state j, and μ means the repair rate of CM after failure caused by deterioration.

The differential equations of the model in Fig. 3 are shown as (5).

$$\left\{ \begin{array}{lllll} \frac{{{\text{d}}\user2{P}}}{{{\text{d}}t}} = \user2{P}\left[ {\begin{array}{*{20}l} { - \lambda _{{12}} } & {\lambda _{{12}} } & {} & {} & {} \\ {} & \ddots & \ddots & {} & {} \\ {} & {} & { - \lambda _{{N - 1,N}} } & {\lambda _{{N - 1,N}} } & {} \\ {} & {} & {} & { - \lambda _{N} } & {\lambda _{N} } \\ \mu & {} & {} & {} & { - \mu } \\ \end{array} } \right] \hfill \\ \user2{P} = \left[ {\begin{array}{*{20}l} {P_{1} \left( t \right)} & {P_{2} \left( t \right)} & \cdots & {P_{N} \left( t \right)} & {P_{{N + 1}} \left( t \right)} \\ \end{array} } \right] \hfill \\ \end{array} \right.$$
(5)

In (5), P 1 ~ P N means the probability of deterioration state 1 ~ N and P N+1 means the probability of failure state N + 1. Suppose the initial state of equipment at time t = 0 is S 0 (S 0 = 1, 2,…, N), the equations above can be solved according to Laplace transformation. The mathematical expression of transient probability for each state is the summation of steady value of the probability and several exponential terms damping with time, which is shown as (6).

$$P_{k}^{{S_{0} }} \left( t \right) = P_{k\infty } + \sum\limits_{i = 1}^{N} {L_{i}^{k} {\text{e}}^{{ - \tau_{i} t}} } \;\;\;\;k = 1,2, \ldots ,N + 1$$
(6)

where the superscript S 0 is the initial state of the equipment; \(P_{k\infty }\) is the steady value of probability for state k; \(L_{i}^{k}\) is the coefficient for i th exponential term and \({\varvec{\tau}}_{i}\) is the corresponding damping exponent, both of which can be calculated by solving the (5). The detail of the calculation is given in Appendix A.

Based on the expression of transient probability for each state, the expression of transient availability can be estimated as below. In Fig. 3, states from number 1 to number N are all working states, although with different deterioration. Hence, the availability of equipment in Fig. 3 is the summation of the probability from P 1 to P N .

$$A^{{S_{0} }} \left( t \right) = \sum\limits_{k = 1}^{N} {P_{k}^{{S_{0} }} \left( t \right)}$$
(7)

The superscript S 0 means the initial state of the equipment as well. Obviously, the transient availability is also a time-varying function and it will graduate to the steady availability when time tends to infinity.

4 Markov-based equipment outage model considering corrective and preventive maintenance

Besides CM, there is another kind of maintenance for power equipment called Preventive Maintenance (PM). PM is the maintenance which is carried out regularly on power equipment in operation. The aim of PM is to improve the working condition of the equipment although there is no failure occuring temporarily.

In the power systems, periodic inspection will be carried on the power equipment and if necessary, PM may be executed to improve the working condition of the equipment. If the inspection result shows that the status of equipment is poor, PM may be scheduled in a short time(several days or weeks, e.g.), otherwise PM might be scheduled after a long time (several months, e.g.) or even no need to maintain. After PM, the equipment will return to working state.

Figure 4 shows a simple example of state transition diagram considering the inspection and maintenance in long term[12−13]. Here the meanings of 1 ~ N+1, λ ij , and μ are the same as before. The meanings of other symbols are the same as literature [12, 13]. I k and M k means the inspection state and maintenance state respectively. σ k means the inspection rate and ω k means the repair rate of planned maintenance. ξ k is the transition rate between I k and M k .

Fig. 4
figure 4

Simple example of Markov outage model considering inspection and PM

In Fig. 4, the maintenance, including PM and CM, are all treated as random events and modeled as states in the Markov process. However, as mentioned before, some factors which affect the behavior of equipment in short term problems shows strong characteristic of determinacy. If the period to be considered here is from the end of inspection to the end of PM, which may be a few days or several weeks, the risk assessment and maintenance scheduling in this period forms a short term problem which can be regarded as a part of the long term model in Fig. 4. In this case, PM, the start time and end time of which are predetermined according to the inspection result and other objective conditions, is the typical example of the deterministic events.

The time axis of the inspection and PM in the short term problem is shown in Fig. 5. M means the start time of PM and d means the duration of PM, both of which are deterministic.

Fig. 5
figure 5

Time axis of the inspection and PM

Based on the above analysis, it is necessary to propose a new model which is applicable to the short term problem. Before introducing the new model, the following preconditions are given.

1) Noticed that the equipment with PM is usually the one with bad working condition and the failure rate of the equipment may be higher than the one with normal condition. Hence during this short term period, the equipment may suffer from unexpected failures, so the outage model contains deterioration and failure state as usual.

2) The time after inspection is set to be the initial time (t = 0) of the short term period, and the working condition of the equipment at this time is known. The deterministic PM is scheduled in the near future according to the inspection result and other factors.

3) The state of the equipment at time 0, M and M + d can be expressed as S 0, S M and S M+d , respectively, and only the state S 0 is known. The state S M depends on the stochastic deterioration process and the difference between S M and S M+d depends on the effect of PM.

4) Similar to the previous literatures, it is supposed that after PM the equipment will return to the previous state by one stage, like the model in Fig. 4. If the equipment is in D 1, after PM it will return to D 1 again. It’s obvious that this assumption can be easily relaxed or generalized and the analytic procedure will be similar [7].

5) Once an unexpected failure occurs, the CM will be executed on the equipment and the state of the equipment will return to D 1 after CM as shown in Fig. 3. It is apparent that this assumption can also be easily relaxed and the analytic procedure is similar.

Based on the above preconditions, the calculation method of transient availability of the outage model used in the short term problem is given below.

There are two possible cases to be considered in the short term problem.

4.1 Case A: no unexpected failure occurs before time M

The probability of this case can be calculated as follows.

$$P_{caseA} = 1 - H^{{S_{0} }} \left( M \right)$$
(8)

where \(H^{{S_{0} }} \left( t \right)\) is the probability cumulative distribution function of equipment life and the superscript S 0 represents the initial state of the equipment. The detailed expression of \(H^{{S_{0} }} \left( t \right)\) is given in Appendix B.

In this case, the equipment keeps working during the period [0, M] and the PM will be implemented as usual during the period [M, M + d]. The equipment will return to operation after PM with better working condition.

Due to the irreversibility of natural deterioration, the state of the equipment at time t (0 < t < M) .i.e. S t can be any state between S 0 to N. Given the premise that the equipment keeps running during the period [0, M], the probability of the equipment in state k at time t is a conditional probability, which can be denoted as \(P_{conk}^{{S_{0} }} \left( t \right),k = 1,2, \cdots ,N\). The expression of \(P_{conk}^{{S_{0} }} \left( t \right)\) is given in Appendix C. As mentioned before, the state S M+d depends entirely on the state S M . So once the probability \(P_{conk}^{{S_{0} }} \left( M \right)\) is obtained, the probability of the equipment in each state after PM, denoted as \(P_{{S_{M + d} }}^{{S_{0} }}\), is already known as well, which is also shown in Appendix C.

Hence the equipment’s availability in case A is given as below. Noticed that the availability from time 0 to time M equals to 1 in this case since it is supposed that the equipment keeps working from time 0 to time M.

$$A_{{caseA}}^{{S_{0} }} \left( {t,M} \right) = \left\{ \begin{array}{lll} 1, &t \in \left[ {0,M} \right) \hfill \\ 0, &t \in \left[ {M,M + d} \right) \hfill \\ \sum\limits_{{S_{{M + d}} = 1}}^{N} {A^{{S_{{M + d}} }} \left( {t - M - d} \right)P_{{S_{{M + d}} }}^{{S_{0} }} } \; \hfill \\ &t \in \left[ {M + d, + \infty } \right) \hfill \\ \end{array} \right.$$
(9)

4.2 Case B: unexpected failure occurs before time M

The probability of this case can be calculated as below.

$$P_{caseB} = H^{{S_{0} }} \left( M \right)$$
(10)

In this case, the scheduled PM will not be implemented as usual since an unexpected contingency occurs. The CM should be carried out immediately after the failure and the original PM will be canceled. After CM, the state of the equipment will return to D 1.

The availability in this case is

$$A_{caseB}^{{S_{0} }} \left( {t,M} \right) = 1 - H_{tru}^{{S_{0} }} \left( {t,M} \right) + A^{1} \left( t \right)*w_{tru}^{{S_{0} }} \left( t \right)$$
(11)

where \(*\) means convolution; \(A^{1} \left( t \right)\) is the transient availability function with initial state S 0 = 1; \(H_{tru}^{{S_{0} }} \left( {t,M} \right)\) is the truncated probability cumulative distribution function of equipment life under the assumption that a failure will occur before time M and \(w_{tru}^{{S_{0} }} \left( t \right)\) means the probability density function of the first renewal period of the equipment. Their expressions of both are given in Appendix B.

Make a synthesis of the two cases, the equipment’s availability in the new model can be obtained as (12). The subscript “tru” represents that the availability is “truncated” by the deterministic PM.

$$A_{tru}^{{S_{0} }} \left( {t,M} \right) = P_{caseA} A_{caseA}^{{S_{0} }} \left( {t,M} \right) + P_{caseB} A_{caseB}^{{S_{0} }} \left( {t,M} \right)$$
(12)

Since the main focus of this paper is on deterioration and the maintenance for eliminating the damage caused by deterioration, the failures caused by random environmental factors are not considered in Fig. 3. When considering the environmental factors, the analytic procedure of the new outage model will be similar, which is given in Appendix D.

5 Numerical examples

Some practical examples are analyzed in this section to show the effect of deterministic PM on the outage model.

Take the transformer as an example. The outage model for transformer in short term period is the most representative model which contains both random and deterministic factors. According to the IEEE standards [19], the states of transformer are usually classified into four categories, which are normal, attentive, abnormal and fault. The normal, attentive and abnormal states are usually treated as working states. The parameters of the outage model shown in Fig. 3 are set in Table 1 [20]. The duration of PM is set as d = 10 days.

Table 1 Basic parameters of outage model

5.1 Effect of deterministic PM on random model

The first example shows the effect of deterministic PM on random model. Suppose the initial state of the equipment is S 0 = 2. Two models are built up as follows for comparison.

1) Model 1: conventional Markov model in Fig. 4 with inspection and PM, both of which are considered as random event and regarded as states in the state transition diagram in the Markov model. The transition rates of inspection and maintenance are set as follows: σ 1 = 0.01 times/day, σ 2 = 0.02 times/day, σ 3 = 0.025 times/day, ξ 1 = 10 times/day, ξ 2 = 5 times/day, ξ 3 = 4 times/day, ω 2 = 0.1 times/day, ω 3 = 0.1 times/day;

2) Model 2: the proposed model which considers the PM as deterministic event. The start time of PM is set as M = 50 days.

The comparison of the transient availability curves for the two models is shown in Fig. 6.

Fig. 6
figure 6

Transient availability curves of Model 1 and Model 2

The comparison in Fig. 6 shows the following conclusions.

1) If the PM is treated as a random event as shown in Model 1, the transition between working and maintenance states may occur at any time, which will smooth the transient process of the transient availability curves. As the time passes, the transient process will be end and the availability will tend to steady value. Obviously, this model is meaningful in the long term problems in which the maintenance can be treated as random events. But when this model is considered in the short term problems, it cannot describe the actual behavior of the power equipment.

2) The addition of deterministic event to a stochastic process will influence the availability of the outage model greatly. If the PM is treated as deterministic event as shown in Model 2, a rapid step change will be caused on the transient availability curves since the PM may extremely likely be implemented in a fixed period. It should be noted that the availability curve of Model 2 drops to a very low level in the PM period [M, M + d], but not equal to zero. That’s because there might be a failure occurring before M which causes the cancellation of the scheduled PM.

3) The transient availability curve of Model 2 is much closer to the real situation in the short term since it reflects the deterministic factors which actually exist in the real world. Meanwhile, the conventional Markov model is not suitable for the short-term maintenance schedule problem in which random and deterministic factors coexist together.

5.2 Availability with different initial state

The equipments with PM are usually the ones operating in inferior states. Three models are built up to show the impact of the initial states on the transient availability curves.

1) Model 1: the proposed model which considers the PM as deterministic event. The start time of PM is set as M = 50 days and the initial state is S 0 = 1, which is normal state;

2) Model 2: the same as Model 1 except that the initial state is S 0 = 2, which is attentive state;

3) Model 3: the same as Model 1 except that the initial state is S 0 = 3, which is abnormal state.

The transient availability curves of the three models are shown in Fig. 7.

Fig. 7
figure 7

Availability curves with different initial state

Figure 7 demonstrates the impact of different initial states on the transient availability curves. If the initial state of equipment is normal, the availability is very close to 1 and the unexpected failure can hardly occur. If inspection result shows that the equipment operating in a very inferior state, such as in 3, the probability that a failure occurs unexpectedly before the scheduled PM will be much larger, which means the PM may very likely be canceled. For the transient availability curve of Model 3, the maximal unavailability in the next few days is nearly 0.2 and the probability that the PM is carried out as usual is only about 0.6. Therefore, it is clear that the equipment with worse working condition should be paid more attention and the PM on these equipments cannot be scheduled too late.

6 Conclusion

In the problems of short-term maintenance schedule, both random and deterministic factors may coexist together and the transient state probability of equipment should be considered. This paper mainly focuses on the outage model used in the short-term problems and a Markov-based transient outage model is proposed, considering the effect of both random CM and deterministic PM. The transient availability function of the outage model is presented and special emphasis is made on the impact of an unexpected failure occurring prior to PM. The results demonstrate that the addition of deterministic PM significantly influences the transient availability of the equipment.

The research discussed in this paper provides a new viewpoint on the outage model used in short-term problems. The model considering the effect of random and deterministic events is applicable to a more extensive field. Future work will focus on the model application in the risk assessment and short term maintenance schedule optimization in the power systems.