# Condition-based maintenance policies under imperfect maintenance at scheduled and unscheduled opportunities

- 448 Downloads

## Abstract

Motivated by the cost savings that can be obtained by sharing resources in a network context, we consider a stylized, yet representative, model for the coordination of maintenance and service logistics for a geographic network of assets. Capital assets, such as wind turbines in a wind park, require maintenance throughout their long lifetimes. Two types of preventive maintenance are considered: planned maintenance at periodic, scheduled opportunities, and opportunistic maintenance at unscheduled opportunities. The latter type of maintenance arises due to the network context: When an asset in the network fails, this constitutes an opportunity for preventive maintenance for the other assets in the network. So as to increase the realism of the model at hand and its applicability to various sectors, we consider the option of not-deferring and of deferring planned maintenance after the occurrence of opportunistic maintenance. We also assume that preventive maintenance may not always restore the condition of the system to ‘as good as new.’ By formulating this problem as a semi-Markov decision process, we characterize the optimal policy as a control limit policy (depending on the remaining time until the next planned maintenance) that indicates on the one hand when it is optimal to perform preventive maintenance and on the other hand when maintenance resources should be shared if an opportunity in the network arises. In order to facilitate managerial insights on the effect of each parameter on the cost, we provide a closed-form expression for the long-run rate of cost for any given control limit policy (depending on the remaining time until the next planned maintenance) and compare the costs (under the optimal policy) to those of suboptimal policies that neglect the opportunity for resource sharing. We illustrate our findings using data from the wind energy industry.

## Keywords

Markov decision processes Condition-based maintenance Opportunistic maintenance## Mathematics Subject Classification

90B25 90C40 60K15 60J20## 1 Introduction

High-value capital assets, such as energy systems (for example, wind turbines), medical systems (for example, interventional X-ray machines), lithography machines in semiconductor fabrication plants, and baggage handling systems at airports require maintenance throughout their (long) lifetimes. Such capital assets are crucial to the primary processes of their users/operators and unexpected failures may have very significant negative impacts and even life threatening consequences. In order to avoid or to minimize failures, asset owners perform preventive maintenance activities, with the objective to retain or to restore a system back to a satisfactory operating condition. The costs of both these maintenance activities, and of their respective unscheduled downtimes, represent one of the key drivers of an organization’s total costs. Such maintenance costs constitute up to 70% of the total value of the end product [4, 22], and this percentage is rapidly increasing [44]. Hence, there is great incentive for asset owners to optimize the maintenance planning.

The most common maintenance practices are the so-called *corrective maintenance* and *planned maintenance*. The former, as the name suggests, proposes the repair of the asset upon failure, while the latter proposes a fixed service schedule for the field service engineers with the objective of ensuring that the asset operates correctly and of avoiding any unscheduled breakdown and downtime. The cost of planned maintenance is relatively low in comparison to that of corrective maintenance, due to its planned, anticipated nature. Planned maintenance is characterized by its scheduled downtimes (contrary to the unscheduled downtime experienced at a failure, which leads to a corrective maintenance) with fixed inter-scheduled instances, say at instances \(\tau ,2\tau ,3\tau ,\ldots \), (for example, \(\tau = 6\) months). Such instances constitute the *scheduled opportunities* of preventive maintenance.

In the context of a network of assets, such as a wind park or a network of hospitals in close geographic proximity (from the viewpoint of the service provider), there is a second type (in addition to the above scheduled instances) of opportunity to perform preventive maintenance. In the event that a failure occurs, its corrective maintenance instance can be viewed as an unscheduled opportunity for preventive maintenance for the other assets in the network. In these instances, *opportunistic maintenance* can take place, with the respective instances constituting the *unscheduled opportunities* of preventive maintenance. This form of network dependency can be viewed on two levels: (i) the economic dependency between the various systems of a network, and (ii) the structural degradation and failure dependencies. Similarly to planned maintenance, opportunistic maintenance has a lower cost in comparison to that of corrective maintenance.

Incorporating opportunistic maintenance may also affect the scheduling of planned maintenance, as it might be beneficial to defer the planned maintenance opportunity to take place after a period of length \(\tau \) after the occurrence of an opportunistic maintenance. This decision of *deferring or not the scheduling of planned maintenance* after the occurrence of opportunistic maintenance may have a positive or negative effect on the total costs.

In maintenance, it is oftentimes assumed that a maintenance activity is perfect, i.e., it restores the system to a state of ‘as good as new.’ However, this assumption may not be true in practice. For instance, a misidentification of the root cause of the (imminent) failure can lead to an erroneous repair not resolving the actual issue, or some minor repair activity (such as exchange of parts, changes or adjustment of the settings, software update, lubrication or cleaning, etc; see [34]) not restoring the system to a state of ‘as good as new.’ In the above-mentioned cases, it is more reasonable to assume that the system is restored to a state between ‘as bad as old’ and ‘as good as new.’ This concept will be referred to as *imperfect maintenance*. Evidently, this assumption impacts the resulting cost. Hence, knowledge regarding the degree of how successful a maintenance activity is should not be ignored in the maintenance planning.

- (i)
What is the advantage of incorporating planned maintenance in comparison to exercising only corrective maintenance?

- (ii)
What is the benefit of sharing resources in the network (in the form of incorporating opportunistic maintenance in addition to the planned maintenance)?

- (iii)
What is the influence of deferring the planned maintenance after the occurrence of opportunistic maintenance?

- (iv)
What is the influence of imperfect maintenance on the maintenance planning and on the costs (long-run rate of cost)?

- (v)
When should preventive maintenance be performed (so as to minimize the long-run rate of cost)?

### 1.1 Main contributions

We consider a stylized, yet representative, model that incorporates the above-mentioned characteristics, and we prove the existence of the optimal maintenance policy and we derive its structure. Furthermore, we compute an explicit expression for the long-run rate of cost, which can be easily used by asset owners and service providers so as to gain further insights into their practice and so as to compute the cost-benefits of changing their maintenance practice. More concretely, the main contributions of the paper are threefold: (1) We consider a semi-Markov decision process that incorporates planned and opportunistic maintenance, as well as imperfect maintenance. From the analysis of the semi-Markov decision process stems the characterization of the optimal policy as a control limit policy (threshold) depending on the time until the next planned maintenance opportunity. Moreover, using this approach, we are able to derive a closed-form expression for this control limit. (2) Considering the class of control limit policies (depending on the remaining time until the next planned maintenance), we derive, using the theory of regenerative processes, an explicit expression for the long-run rate of cost. (3) We consider data from the wind energy industry and provide, based on these values, concrete answers to Questions (i)–(v) mentioned above. More specifically, we analyze the benefit of using planned and opportunistic maintenance compared to only corrective maintenance. We also analyze the influence of deferring planned maintenance after the occurrence of opportunistic maintenance. Finally, we also highlight the cost savings that can be attained by reducing the probability of an imperfect maintenance.

### 1.2 Outline of this paper

The remainder of this paper is structured as follows: In Sect. 2, we review the related literature. In Sect. 3, we describe in detail the model at hand, which captures the condition of the asset and which incorporates imperfect maintenance at scheduled and unscheduled maintenance opportunities. Subsequently, in Sect. 4, we characterize the structure of the optimal policy for condition-based maintenance using the average cost criterion, see Sect. 4.1, and we compute the long-run rate of cost for any policy with the same structure as the optimal policy (i.e., the class of control limit policies depending on the remaining time until the next planned maintenance), see Sect. 4.2. In Sect. 5, we permit the deferral of planned maintenance after the occurrence of opportunistic maintenance, and we compute the long-run rate of cost. A numerical illustration is provided in Sect. 6, where, based on data from the wind energy industry, we compare the long-run rate of cost for various policies, we show the effect of imperfect maintenance, and the effect of deferring planned maintenance. Finally, Sect. 7 contains concluding remarks and highlights directions for future research.

## 2 Literature review

Maintenance optimization models have been extensively studied in the literature. Optimal maintenance policies aim to provide optimal system reliability/availability and safety performance at lowest possible maintenance costs [27]. Due to the fast development of sensing techniques in recent years, the state of a capital asset can be monitored or inspected at a much lower cost and in a continuous fashion, which facilitates condition-based maintenance. Condition-based maintenance recommends maintenance actions based on information collected through online monitoring of the capital asset and it can significantly reduce maintenance costs by decreasing the number of unnecessary maintenance operations; see, for example, Jardine et al. [10], Peng et al. [26] and Lam and Banjevic [18]. The condition-based maintenance model that we propose builds on the delay time model proposed by Christer [6] and Christer and Waller [5]. We refer the reader to Baker and Christer [2], Christer [7] and Wang [38], and, more recently, Wang [39] for an overview on delay time models. Not only are delay time models well-known in the literature, but they also very frequently appear in practice.

Practice-based research with real diagnostic data, such as data related to the spectrometry of oil (for example, [16, 21]) and data related to vibrations (for example, [40]), showed that it is usually sufficient, and even preferable from a modeling and decision-making perspective, to consider only two operational states. The first state is the perfect state, in which the system lasts from newly installed to the point that a hidden defect has been identified. After the occurrence of a hidden defect in the system until the occurrence of a failure (which is typically referred to as the delay time), the system resides in the second state, also referred to as the satisfactory state. Such a classification of the operational states has the property that maintenance actions are initiated only when the system is degraded to the state that can actually lead to a direct failure, i.e., the satisfactory state, but not when the system is functioning perfectly, i.e., the perfect state. The vast majority of the literature on delay time models is restricted to numerical methods or approximations to solve the models at hand, due to their underlying complexity. A few recent exceptions are Maillart and Pollock [20], Kim and Makis [17] and Van Oosterom et al. [36], who study two-state systems under periodic inspection, partial observability, and postponed replacement, respectively, and provide analytical results regarding the structure of the optimal policy. However, none of them consider the option of resource sharing in the network (in the form of opportunistic maintenance), nor do they incorporate the notion of imperfect repair.

Most delay time model analyses assume that the system after a maintenance action is restored to a state of ‘as good as new.’ Contrary to this assumption, in imperfect maintenance it is assumed that, upon preventive maintenance, the system lies in a state somewhere between ‘as good as new’ and ‘as bad as old.’ This is first introduced by Nakagawa [23, 24] and is called the (*p*, *q*)-rule. Under the (*p*, *q*)-rule, the system is returned to an ‘as good as new’ state (perfect preventive maintenance) with probability *p* and it is returned to the ‘as bad as old’ state (minimal preventive maintenance) with probability \(q = 1 - p\) after preventive maintenance. Clearly, the case \(p = 0\) corresponds to having no preventive maintenance. Also, from a practical point of view, imperfect maintenance can describe a large set of realistic maintenance actions [27].

When planning condition-based maintenance strategies, see, for example, Jardine et al. [10], Jardine and Tsang [11] and Prajapati et al. [28], a typical assumption in the literature is that the system at hand is monitored continuously and one can intervene and maintain the system at any given moment. However, due to accessibility reasons (for example, in the case of off-shore wind parks) or for cost reduction purposes, it is cost optimal and more practical to allow only for discrete time opportunities. The simplest among the discrete time opportunities are the periodic planned maintenance instances (also referred to as scheduled downs), with period, say, \(\tau \), that serve as a scheduled opportunity to do maintenance for a network of systems. Furthermore, unplanned maintenance instances (due to opportunistic maintenance) can be modeled as discrete instances occurring according to a multi-dimensional counting process.

For recent works related to opportunistic maintenance, the interested reader is referred to Zhu et al. [42, 43], Arts and Basten [1] and Kalosi et al. [14]. In Zhu et al. [43] and Zhu et al. [42], the authors consider a single-unit system and account for both scheduled and unscheduled opportunities. In these analyses, the authors model the age and the condition, respectively, of the system and derive, based on approximations, the long-run rate of cost under a given policy. In both papers, the arrivals of unscheduled opportunities are modeled according to a homogeneous Poisson process. This approximation is justified by the Palm–Khintchine theorem [15], which states that even if the failure times of some systems do not follow exponential distributions, the superposition of a sufficiently large number of independent renewal processes behaves asymptotically like a Poisson process. Arts and Basten [1] build further on Zhu et al. [42, 43], but they only consider scheduled maintenance opportunities (excluding unscheduled opportunities). Furthermore, Arts and Basten [1] assume that at a scheduled opportunity, the system is restored to a perfect condition (i.e., \(p=1\)), while at a failure they assume that the system is restored to a state which is stochastically identical to the state just prior to the system’s failure. In a recent conference paper, Kalosi et al. [14] looked at a model with both planned and unplanned maintenance opportunities, at which the system is restored to a perfect condition, showing some preliminary results that a control limit policy (depending on the remaining time until the next planned maintenance) is optimal.

In contrast to Arts and Basten [1] and to Zhu et al. [42, 43], in which the long-run rate of cost is computed for a given policy, we first characterize the structure of the optimal policy explicitly and thereafter, for the optimal policy class, we compute the long-run rate of cost. Furthermore, we include both scheduled and unscheduled maintenance opportunities. In contrast to Kalosi et al. [14], we extend the model by incorporating the (*p*, *q*)-rule, making it more generic and realistic. Moreover, we are the first to analyze the influence of deferring planned maintenance and we illustrate the financial effects of the maintenance policy in a realistic context using data stemming from the wind industry.

## 3 Model description

*perfect*,

*satisfactory*and

*failed*. We shall refer to the state of perfect condition as state 2, the state of satisfactory condition as state 1 and the failure state as state 0. Furthermore, we assume that as soon as a system failure occurs, the system is instantaneously replaced by an ‘as good as new’ system. So, in the mathematical formulation of the model, we may assume, due to the instantaneous replacement at failure, that the model evolves between only states 1 and 2. The system spends an exponential amount of time with rate \(\mu _i\) in state

*i*, \(i\in \{1,2\}\). The above model formulation implies that initially the system starts in state 2 (perfect state), then after an exponential amount of time with rate \(\mu _2\), the system deteriorates and the condition of the system goes to state 1 (satisfactory state). The system spends an exponential amount of time with rate \(\mu _1\) in state 1, after which a failure occurs. At a failure, the system is instantaneously replaced by an ‘as good as new’ system and the condition is restored to 2 (perfect state). A schematic evolution of the condition of the component and the corresponding times of transitions is depicted in Fig. 1.

We assume that we have two types of opportunities at which we can perform preventive maintenance (PM) before failure: the scheduled and the unscheduled opportunities. The scheduled opportunities correspond to pre-arranged opportunities occurring according to a fixed schedule. These opportunities can be attributed to either service/maintenance agreements or to regulation imposition checks. We assume that the scheduled opportunities occur at epochs \(\tau ,2\tau ,3\tau ,\ldots \), with \(\tau >0\). This is also in accordance with what happens in practice as maintenance actions, once planned, are typically not rescheduled. The unscheduled opportunities correspond to random opportunities triggered by failures of other systems in close proximity. We assume that these unscheduled opportunities occur according to a Poisson process at rate \(\lambda \).

The unscheduled and scheduled opportunities, abbreviated by USO and SO, respectively, serve as opportunities to perform preventive maintenance. Such preventive maintenance is assumed to cost less than a corrective maintenance (CM) upon failure, which costs \(c_{\text {cm}}\). Moreover, incorporating a planning perspective, we may assume that the preventive maintenance cost at an SO, \(c_{\text {pm}}^{\text {so}}\), is less than or equal to the corresponding cost at a USO, say \(c_{\text {pm}}^{\text {uso}}\), that is \(0<c_{\text {pm}}^{\text {so}}\le c_{\text {pm}}^{\text {uso}}<c_{\text {cm}}\) (however, we also extend our analysis to the case \(c_{\text {pm}}^{\text {so}}> c_{\text {pm}}^{\text {uso}}\)). Following the (*p*, *q*)-rule of Nakagawa [23, 24], we assume that after preventive maintenance a system is returned to the ‘as good as new’ state with probability \(p\in (0,1]\) and returned to the ‘as bad as old’ state (i.e., the amount of time left until the failure has not altered) with probability \(q=1-p\).

Overview of abbreviations

PM | Preventive maintenance |

CM | Corrective maintenance |

USO | Unscheduled opportunity |

SO | Scheduled opportunity |

SC | State change |

## 4 Optimal policy

The goal of this section is twofold: We first characterize the structure of the optimal average cost condition-based maintenance policy. We then derive an explicit form for the long-run rate of cost per time unit for any given policy that has the same structure as the optimal policy.

### 4.1 Average cost criterion

*i*indicates the condition of the system. If \(j=\text {SC}\), then this means that the condition of the system is about to change and there is no decision associated with this epoch, while if \(j=\text {SO}\) or \(j=\text {USO}\), this means that this is a decision moment at either a scheduled (SO) or unscheduled opportunity (USO), respectively. Finally, the third element indicates the remaining time until the SO. Note that if \(j=\text {SO}\) then \(t=0\). The introduction of the remaining time until the upcoming SO in the full description of the condition of the system renders the model inhomogeneous, and for this reason we use techniques that stem from semi-Markov decision processes. Note here that the inclusion of the remaining time until the upcoming SO in the state, although it complicates the analysis, permits us to prove that there is an optimal policy in the class of deterministic stationary policies, cf. Propositions 1 and 3. At each decision epoch (depending on the values of \((i,j,t)\in {\mathcal {S}}\)), we can choose to perform preventive maintenance or do nothing, or in case of a failure to do corrective maintenance (CM), that is \({\mathcal {A}} =\{\text {perform PM, do nothing, perform CM}\}\), where \({\mathcal {A}}\) represents the overall action space.

### Proposition 1

For the model at hand, the deterministic stationary policy is optimal for the average cost criterion.

A formal version of the above proposition, cf. Proposition 3, and its proof can be found in Appendix A, together with a full formal definition of the model in the context of semi-Markov decision processes. In addition to the theoretical validation that the above proposition offers on the existence and nature of the optimal maintenance policy, in the following theorem we compute the optimal policy.

### Theorem 1

For USOs, Theorem 1 establishes a control limit policy depending on the remaining time until the next SO: If the residual time until the next SO is smaller than \({\hat{t}}\), then it is optimal to not take the opportunity to perform preventive maintenance in state 1. This is intuitive in the sense that the urgency for preventive maintenance in state 1 at a USO should decrease as the cheaper opportunity at an SO is approaching.

Note that in the special case when preventive maintenance costs at SOs and USOs are equal, the optimal policy reduces to a stationary control limit policy, which is shown in Proposition 2.

### Proposition 2

Under the assumption that \(c_{\text {pm}}^{\text {so}}=c_{\text {pm}}^{\text {uso}}=c_{\text {pm}}>0\) and given the imperfect preventive maintenance probability \(1-p\in (0,1]\), the optimal policy under the average cost criterion is: For state 2, do nothing. For state 1, perform preventive maintenance at both SOs and USOs if \( \mu _1 c_{\text {cm}}> (\mu _1+\mu _2) \frac{c_{\text {pm}}}{p}\), and do nothing otherwise.

### Proof

The proof of this proposition is identical in structure to the proof of Case (i) in the proof of Theorem 1, and for this reason it is omitted. \(\square \)

One could also argue that the cost for preventive maintenance at a USO is actually less than the cost at an SO since there is already a cost attached to the opportunity at hand (for example, service engineers are already at a wind park and they can, at a small extra cost, repair other systems in close proximity as well). In this case, the optimal control policy also reduces to a stationary control limit policy, which is described in Theorem 2.

### Theorem 2

Under the assumption that \(c_{\text {pm}}^{\text {so}}>c_{\text {pm}}^{\text {uso}}\) and given the imperfect preventive maintenance probability \(1-p\in (0,1]\), the optimal policy under the average cost criterion is: For state 2, do nothing. For state 1, perform preventive maintenance at an unscheduled opportunity if \( \mu _1 c_{\text {cm}} > (\mu _1+\mu _2)\frac{ c_{\text {pm}}^{\text {uso}}}{p} \), and do nothing otherwise, and perform preventive maintenance at an SO if \(\mu _1 c_{\text {cm}} > (\mu _1+\mu _2 )\frac{c_{\text {pm}}^{\text {so}}}{p}+ \lambda ({c_{\text {pm}}^{\text {so}}}-c_{\text {pm}}^{\text {uso}})\), and do nothing otherwise.

### Proof

See Appendix D. \(\square \)

### 4.2 Long-run rate of cost per time unit

In the previous section, we characterized the structure of the optimal policy using the average cost criterion. This policy can be viewed as a control limit policy, with the control limit depending on the time until the next SO. In this section, we consider such a policy and we compute the long-run rate of cost per time unit. More concretely, we consider a policy under which in state 2 we do not perform preventive maintenance (i.e., we do nothing), and in state 1 we always perform preventive maintenance at SOs and we perform preventive maintenance at USOs if the remaining time till the next SO is greater than \({\tilde{t}}\), for some given value \({\tilde{t}}\in (0,\tau )\). The results obtained in this section are directly applicable to the results of Sect. 4.1, by setting \({\tilde{t}}=t^*\), cf. Theorem 1.

For the computation of the long-run rate of cost per time unit, we employ the theory of regenerative-like processes, also called stationary-cycle processes, described in Section 2.19 of Serfozo [33]. For this purpose, we consider the inter-regeneration times created by the SOs \(\{\tau , 2\tau , 3\tau , \ldots \}\). For the cost computation, we assume that, at the SOs, the system is in state 1 or 2 according to a stationary probability \(p_1(0)\) and \(p_2(0)\), respectively. The long-run rate of cost per time unit is calculated as the expected total cost incurred between consecutive SOs divided by \(\tau \).

Let \(p_i(t)\) be the probability that the system is in state \(i \in \{1,2\}\) given that the time until the next SO is \(t\in [0,\tau )\). Then the long-run rate of cost per time unit for this control limit policy (depending on the remaining time until the next planned maintenance) for any given time threshold is given in the next theorem.

### Theorem 3

### Proof

The expected total cost incurred in one cycle consists of three parts (cf. Eq. (2)), which are related to the expected cost associated with preventive maintenance at SOs, with preventive maintenance at USOs and with corrective maintenance, respectively. It is now sufficient to derive \(p_i(t)\) for \(t\in [0,\tau )\), \(i \in \{1,2\}\).

*t*we are in state 1 either due to a transition from state 2 with infinitesimal probability \(\mu _2 {{\,\mathrm{d \!}\,}}t\) or we have remained in state 1 with infinitesimal probability \(1-(\mu _1+\lambda p){{\,\mathrm{d \!}\,}}t\). Subtracting \(p_1(t+{{\,\mathrm{d \!}\,}}t)\) from both sides of Eq. (5), some straightforward computations yield

#### 4.2.1 Special cases

*scheduled opportunities*, which corresponds to the case \({\tilde{t}}\rightarrow \tau \) or, equivalently, to the case \(\lambda \rightarrow 0\), the probabilities \(p_i(t)\) for \(i \in \{1,2\}\) are derived from the system of linear equations in (7) plus the normalizing condition, i.e., \(p_1(t)+p_2(t)=1\) for all \(t\in [0,\tau )\). This yields

*perfect maintenance*, i.e., in the case \(p=1\), the boundary condition at the SOs imposed by the policy and the imperfect maintenance in the proof of Theorem 3 reduces to \(\lim \limits _{t\rightarrow \tau ^-}p_1(t)=0\), as immediately after an SO the system is restored to state 2 with probability 1. This enables us to explicitly solve the system of linear Eqs. (6) and (7), yielding

*unscheduled opportunities*, which is equivalent to considering \(\tau \rightarrow \infty \), the condition of the system can be fully described using a double descriptor \({\mathcal {S}}=\left\{ (i,j):\ i\in \{1,2\}, \ j\in \{\text {SC},\text {USO}\}\right\} \) which is independent of time, and thus the new model formulation falls into the framework of regular Markov decision processes. It can be easily shown that: For state 2, the optimal policy is to do nothing, and, for state 1, the optimal policy is to repair if \(\frac{(\mu _1+\mu _2)c_{\text {pm}}^{\text {uso}}}{p} < \mu _1 c_{\text {cm}}\) and to do nothing otherwise. Furthermore, under the optimal policy the average long-run rate of cost is equal to

*corrective replacements*, the long-run rate of cost is equal to

## 5 Deferring planned maintenance

In this section, we consider that upon a successful maintenance activity (preventive, at an SO or at a USO, or corrective), the upcoming planned maintenance is deferred for a period of length \(\tau \), i.e., at the instances of successful maintenance the remaining time till the next SO is set equal to \(\tau \). We are interested in computing the long-run rate of cost under deferred maintenance and, in Sect. 6.3, using the results of this section and of the previous sections in investigating the economic benefits of deferring planned maintenance.

Analogously to the analysis of Sect. 4.2, we derive the long-run rate of cost using renewal theory; see, for example, [31, Proposition 7.3, page 433]. In this case, we consider the renewal points to be the instances at which there was a successful maintenance activity, i.e., the SOs or USOs at which the preventive maintenance was perfect, or the epochs at which corrective maintenance is performed. Note that the underlying stochastic process that governs the condition of the system regenerates after each successful maintenance activity. That is, after each successful maintenance activity the underlying stochastic process is in state 2 with probability 1. The long-run rate of cost per time unit for a policy in the class of optimal policies is given in the next theorem. As the expressions appearing in the theorem do not simplify upon further computations, we choose to present them in the form of probabilities and expectations associated with the exponential distribution, as these expressions are straightforward (though cumbersome to compute) and shed insight on each of the individual events participating in the final expression, cf. Eq. (8).

### Theorem 4

*Y*is given by

*x*occurs, and it is zero otherwise, \(T_{\mu _1}\sim \text {Exp}(\mu _1)\), \(T_{\lambda p}\sim \text {Exp}(\lambda p)\), \({\mathbb {P}}\left[ \,\cdot \,\right] ={\mathbb {E}}[\mathbb {1}_{\{\cdot \}}]\) for all events in Eqs. (14)–(16), and \(C\!L{\mathop {=}\limits ^{d}}C\!L'\).

### Proof

See Appendix E. \(\square \)

## 6 Numerical results

Using the results and the analyses of the previous sections, in this section we illustrate through a few well chosen examples the effect of the various parameters in the long-run rate of cost. In these examples, we investigate the financial advantage of the optimal policy, when compared to other (suboptimal) policies. Furthermore, we highlight the financial benefit of perfect maintenance by comparing the long-run rate of cost for the perfect maintenance model (\(p=1\)) to that of the imperfect maintenance model (\(p\in (0,1)\)). Here, we also show the influence of imperfect maintenance on the maintenance planning. In addition, we illustrate the change introduced by the action of deferring planned maintenance after the occurrence of a successful maintenance. To illustrate the financial effects in a realistic context and to connect our analysis with practice, we use values and data stemming from the wind industry.

### 6.1 Comparison of the optimal policy to suboptimal policies

In this section we compute, in the context of the wind industry example, the long-run rate of cost under the optimal policy and we examine how it is affected by varying one by one the parameters \(\tau \), \(\lambda \) and \(c_{\text {pm}}^{\text {uso}}\), while keeping all other parameters fixed. For the determination of the values used in the numerical computations of this section, we consider the gearbox of a wind turbine. Statistics from a recent field study by Ribrant and Bertling [29] on Swedish wind parks in the period 1997–2005 showed that the gearbox is the most critical unit of a wind turbine. The notion of criticality is determined by the fact that a failure of the gearbox leads to the highest downtime when compared to all other wind turbine components, but also by the fact that this component has the highest failure rate among all wind turbine components [29, 34, 35]. Due to its extended downtime after a failure (which is captured in the corresponding maintenance cost), the corrective cost of a gearbox is relatively high compared to preventive maintenance costs; see, for example, Nilsson and Bertling [25]. Based on the values reported in the aforementioned studies, we set \(c_{\text {cm}}=300{,}000\), \(c_{\text {pm}}^{\text {so}}=1000\), \(\mu _2=0.31\), \(\mu _1=0.31\) and \(p=0.6\). In this case, the long-run rate of cost (in euros per year) in the case of only corrective replacements is equal to 46,500. Furthermore, motivated by the wind industry practice, we choose three different values for \(\tau \), that is \(\tau \in \{0.25, 0.5, 1\}\) (years). Next, we consider three different values for \(c_{\text {pm}}^{\text {uso}}\), i.e., \(c_{\text {pm}}^{\text {uso}} \in \{ 2000, 3000, 4000 \}\). Finally, with regard to \(\lambda \), we consider four different values, i.e., \(\lambda \in \{0.5, 1, 2, 4\}\).

*p*and it is typically assumed that after a maintenance the component is restored to a perfect state. This policy is denoted by \(\pi '_{\text {opt}}\).

Long-run rate of cost varying \(\lambda \), \(\tau \) and \(c_{\text {pm}}^{\text {uso}}\), while keeping all other parameters fixed for four policies

\(c_{\text {pm}}^{\text {uso}}\) | \(\lambda \) | \(\tau = 0.25\) | \(\tau = 0.5\) | \(\tau = 1\) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

\(\pi _{\text {uso}}\) | \(\pi _{\text {so}}\) | \(\pi _{\text {opt}}\) | \(\pi '_{\text {opt}}\) | \(\pi _{\text {uso}}\) | \(\pi _{\text {so}}\) | \(\pi _{\text {opt}}\) | \(\pi '_{\text {opt}}\) | \(\pi _{\text {uso}}\) | \(\pi _{\text {so}}\) | \(\pi _{\text {opt}}\) | \(\pi '_{\text {opt}}\) | ||

2000 | 0.5 | 31,674 | 7624 | 7193 | 7208 | 31,674 | 12,927 | 11,627 | 11,647 | 31,674 | 20,301 | 17,134 | 17,156 |

1 | 24,139 | 7624 | 6815 | 6840 | 24,139 | 12,927 | 10,583 | 10,614 | 24,139 | 20,301 | 14,855 | 14,886 | |

2 | 16,522 | 7624 | 6183 | 6221 | 16,522 | 12,927 | 9007 | 9049 | 16,522 | 20,301 | 11,794 | 11,828 | |

4 | 10,368 | 7624 | 5258 | 5307 | 10,368 | 12,927 | 7023 | 7067 | 10,368 | 20,301 | 8469 | 8498 | |

3000 | 0.5 | 31,842 | 7624 | 7230 | 7255 | 31,842 | 12,927 | 11,687 | 11,725 | 31,842 | 20,301 | 17,224 | 17,265 |

1 | 24,393 | 7624 | 6883 | 6927 | 24,393 | 12,927 | 10,691 | 10,751 | 24,393 | 20,301 | 15,010 | 15,068 | |

2 | 16,863 | 7624 | 6304 | 6372 | 16,863 | 12,927 | 9188 | 9267 | 16,863 | 20,301 | 12,034 | 12,100 | |

4 | 10,778 | 7624 | 5456 | 5543 | 10,778 | 12,927 | 7294 | 7375 | 10,778 | 20,301 | 8800 | 8855 | |

4000 | 0.5 | 32,011 | 7624 | 7266 | 7299 | 32,011 | 12,927 | 11,748 | 11,800 | 32,011 | 20,301 | 17,314 | 17,374 |

1 | 24,648 | 7624 | 6951 | 7009 | 24,648 | 12,927 | 10,799 | 10,883 | 24,648 | 20,301 | 15,164 | 15,248 | |

2 | 17,203 | 7624 | 6424 | 6513 | 17,203 | 12,927 | 9368 | 9479 | 17,203 | 20,301 | 12,274 | 12,368 | |

4 | 11,189 | 7624 | 5653 | 5766 | 11,189 | 12,927 | 7565 | 7677 | 11,189 | 20,301 | 9132 | 9208 |

In Table 2, we observe, across all instances, that incorporating planned maintenance can significantly reduce costs compared to only corrective maintenance, which can be reduced even further by adding opportunistic maintenance. Intuitively, due to the cost structure, only planned maintenance at SOs can considerably improve the long-term rate of cost when compared to performing only opportunistic maintenance at USOs. Finally, if we compare \(\pi _{\text {opt}}\) with \(\pi '_{\text {opt}}\) we do not, despite the low value for *p*, observe significant differences. From an operational management perspective, this clearly implies that, if decision makers do not have any knowledge about the value of *p*, and given a similar cost structure as in the gearbox case, assuming perfect maintenance will result in a long-run rate of cost that is close to optimal regardless of the true value of *p*. This will be valid as long as the preventive maintenance cost (at both opportunities) is very small in comparison to the corrective maintenance cost, as is the case of the gearbox costs. As a rule of thumb, one can easily compute the expected number of maintenances (planned or opportunistic) required for a successful preventive maintenance and based on this compute the long-run rate of preventive maintenance cost (approximately of the order \(\max \{c_{\text {pm}}^{\text {so}},c_{\text {pm}}^{\text {uso}}\}/p\)) and compare it with the corrective cost. If the corrective cost is significantly higher, then one may assume that there is no significant difference between \(\pi _{\text {opt}}\) and \(\pi '_{\text {opt}}\), and as a consequence there is no significant difference in the values of the optimal policies under the imperfect and perfect maintenance. In the next section, we investigate the savings that can be obtained by improving the performance of a repair when a decision maker has some knowledge regarding the value of *p*.

### 6.2 Influence of imperfect maintenance

*p*and let \(C(\pi ^{(p)}_{\text {opt}})\) denote the long-run rate of cost when the policy is \(\pi ^{(p)}_{\text {opt}}\). To demonstrate the effect of

*p*in the rate of cost, we compute the relative difference in the cost of not having a perfect preventive maintenance as a function of

*p*. This relative difference is denoted by \(\delta (p)\) and is equal to

The optimal policy \(({\tilde{t}})\), denoted by \(t^{1}\) and \(t^{2}\) under the first and second cost structure respectively, is equal to \(t^{1}\approx 0.08\) and \(t^{2}\approx 0.39\) in the case of perfect repairs. In Fig. 4, where we plot \(t^{1}\) and \(t^{2}\) as a function of *p*, we observe the following regarding the influence of *p* on the maintenance planning: If the preventive maintenance cost (at both opportunities) is very small compared to the cost of corrective maintenance, the order of the total preventive maintenance cost incurred until a successful preventive maintenance compared to the corrective maintenance cost is still maintained. Therefore, the maintenance planning does not alter that much regardless of the value of *p*, where the optimal policy is to almost always perform preventive maintenance at USOs for all values of \(p\in [0.5,1]\). This also explains the small discrepancy between \(\pi _{\text {opt}}\) and \(\pi '_{\text {opt}}\) in Table 2. This is different in the case of the second cost structure, where the maintenance planning changes substantially as a function of *p*. Whereas in the perfect case, the optimal policy is to perform preventive maintenance at a USO if the residual time until the next SO is larger than 0.39, for \(p \lessapprox 0.83\), it is optimal to never perform preventive maintenance at a USO. Here, the order of the total preventive maintenance cost incurred until a successful preventive maintenance compared to the corrective maintenance cost is not maintained.

Also, in the opposite cost structure, i.e., \(c_{\text {pm}}^{\text {uso}}<c_{\text {pm}}^{\text {so}}\) (similar examples can be found for \(c_{\text {pm}}^{\text {uso}}=c_{\text {pm}}^{\text {so}}\)), the maintenance planning can be influenced significantly by the imperfect repair probability. For instance, consider the setting with \(\mu _1=1.1, \mu _2 =0.9\), \(c_{\text {pm}}^{\text {so}} = 4500\), \(c_{\text {pm}}^{\text {uso}}=4000\), \(c_{\text {cm}}=10{,}000\), and \(\lambda =0.5\). In case of perfect repairs (i.e., \(p=1\)), the optimal policy is to perform preventive maintenance in state 1 at both SOs and USOs, and to do nothing otherwise (cf. Theorem 2). However, if \(0.72 \lessapprox p \lessapprox 0.83\), the optimal policy is to only perform preventive maintenance at USOs, and if \(p \lessapprox 0.72\), then the optimal policy is to never perform PM. This example illustrates the influence of the imperfect repair probability on the maintenance planning.

### 6.3 Deferring of planned maintenance

In this section, we illustrate the change introduced by the action of deferring planned maintenance after the occurrence of a successful maintenance in three numerical examples that relate to the wind industry, the lithography industry, and to an artificially created example.

## 7 Conclusion

In this paper, we considered the maintenance policy for a three-state component degrading over time with corrective replacements at failures and preventive replacements at both scheduled and unscheduled opportunities under imperfect repair. By formulating this problem as a semi-Markov decision process, we were able to characterize the structure of the optimal maintenance policy as a control limit policy, where the control limit depends on the time until the next planned maintenance opportunity. Using this approach, a closed-form expression for the optimal control limit was derived. Within this class of control limit policies, we derived, using the theory of regenerative processes, an explicit expression for the long-run rate of cost. Using a similar approach based on renewal theory, we derived an expression for the long-run rate of cost in the case when planned maintenance is deferred after the occurrence of a successful opportunistic maintenance.

A cost comparison with other suboptimal policies has been examined, which illustrated the benefits of optimizing the maintenance policy. Specifically, it was found that incorporating planned maintenance can significantly reduce costs compared to only corrective maintenance, which can be reduced even further by adding opportunistic maintenance. Moreover, numerical results indicate that the extent of the impact of the perfect repair probability on the optimal policy depends on the underlying cost structure. It was also shown that substantial cost savings can be obtained by improving the perfect repair probability. Finally, our numerical examples indicate that the deferral of planned maintenance after the occurrence of a successful opportunistic maintenance may impact the total cost in both a negative and positive way.

There are a number of extensions and topics for future research. The most important direction is to consider the network dependency on the level of the structural degradation and failure dependencies, i.e., to consider a multi-dimensional process that captures the degradation of the various assets in the network. Such a future direction would be particularly interesting in the case of a small number of assets for which the Poisson approximation for the opportunistic maintenance may not be accurate. In addition, another very interesting research direction would be to consider a more general model in which the condition of the system degrades through \(N>2\) states. Next, in this analysis we have assumed that the condition of the system is fully observable. However, in many real applications, condition monitoring data such as spectrometric oil data or vibration data give only partial information about the underlying state of the system. From this perspective, it would be interesting to extend the model at hand to a partially observable model in which the condition monitoring data are stochastically related to the true system state. Finally, the results in this paper are valid for systems with hypo-exponentially distributed lifetimes. Future research could relax this assumption by considering a phase-type lifetime distribution.

## Notes

### Acknowledgements

The authors gratefully acknowledge the contribution of S. Kalosi in the early stages of the preparation of the work. The authors would like to thank M. Barbieri, J. Korst, and V. Pronk (all Philips Research), and O. J. Boxma and G. J. van Houtum (both Eindhoven University of Technology) for their time and advice in the preparation of this work. The work of C. Drent is supported by the Data Science Flagship framework, a cooperation between the Eindhoven University of Technology and Philips. The work of S. Kapodistria is supported by the NWO Gravitation Project ‘NETWORKS’ of the Dutch government.

## Supplementary material

## References

- 1.Arts, J., Basten, R.: Design of multi-component periodic maintenance programs with single-component models. IISE Trans.
**50**(7), 606–615 (2018)Google Scholar - 2.Baker, R.D., Christer, A.H.: Review of delay-time OR modelling of engineering aspects of maintenance. Eur. J. Oper. Res.
**73**(3), 407–422 (1994)Google Scholar - 3.Bhattacharya, R.N., Majumdar, M.: Controlled semi-Markov models under long-run average rewards. J. Stat. Plan. Inference
**22**(2), 223–242 (1989)Google Scholar - 4.Bevilacqua, M., Braglia, M.: The analytic hierarchy process applied to maintenance strategy selection. Reliab. Eng. Syst. Saf.
**70**(1), 71–83 (2000)Google Scholar - 5.Christer, A., Waller, W.: Delay time models of industrial inspection maintenance problems. J. Oper. Res. Soc.
**35**(5), 401–406 (1984)Google Scholar - 6.Christer, A.H.: Modelling inspection policies for building maintenance. J. Oper. Res. Soc.
**33**(8), 723–732 (1982)Google Scholar - 7.Christer, A.H.: Developments in delay time analysis for modelling plant maintenance. J. Oper. Res. Soc.
**50**(11), 1120–1137 (1999)Google Scholar - 8.Feinberg, E.A.: Constrained semi-Markov decision processes with average rewards. Math. Methods Oper. Res.
**39**, 257–288 (1994)Google Scholar - 9.Hernández-Lerma, O., Lasserre, J.B.: Further topics on discrete-time Markov control processes, vol. 42. Springer, Berlin (2012)Google Scholar
- 10.Jardine, A.K.S., Lin, D., Banjevic, D.: A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mech. Syst. Signal Process.
**20**(7), 1483–1510 (2006)Google Scholar - 11.Jardine, A.K.S., Tsang, A.H.C.: Maintenance, replacement, and reliability: theory and applications. CRC Press, Boca Raton (2005)Google Scholar
- 12.Jaśkiewicz, A.: An approximation approach to ergodic semi-Markov control processes. Math. Oper. Res.
**54**(1), 1–19 (2001)Google Scholar - 13.Jaśkiewicz, A.: On the equivalence of two expected average cost criteria for semi-Markov control processes. Math. Oper. Res.
**29**(2), 326–338 (2004)Google Scholar - 14.Kalosi, S., Kapodistria, S., Resing, J. A. C.: Condition-based maintenance at both scheduled and unscheduled opportunities. In: Scarf, P., Wu, S., Do, P. (eds), Proceedings of the 9th IMA International Conference on Modelling in Industrial Maintenance and Reliability (2016), ISBN: 978-0-905091-31-0. arXiv:1607.02299
- 15.Khinchin, A.Y.: Sequences of chance events without after-effects. Theory Probab. Appl.
**1**(1), 1–15 (1956)Google Scholar - 16.Kim, M.J., Jiang, R., Makis, V., Lee, C.G.: Optimal Bayesian fault prediction scheme for a partially observable system subject to random failure. Eur. J. Oper. Res.
**214**(2), 331–339 (2011)Google Scholar - 17.Kim, M.J., Makis, V.: Joint optimization of sampling and control of partially observable failing systems. Oper. Res.
**61**(3), 777–790 (2013)Google Scholar - 18.Lam, J.Y.J., Banjevic, D.: A myopic policy for optimal inspection scheduling for condition based maintenance. Reliab. Eng. Syst. Saf.
**144**, 1–11 (2015)Google Scholar - 19.Lippman, S.A.: On dynamic programming with unbounded rewards. Manag. Sci.
**21**(11), 1225–1233 (1975)Google Scholar - 20.Maillart, L.M., Pollock, S.M.: Cost-optimal condition-monitoring for predictive maintenance of 2-phase systems. IEEE Trans. Reliab.
**51**(3), 322–330 (2002)Google Scholar - 21.Makis, V., Wu, J., Gao, Y.: An application of DPCA to oil data for CBM modeling. Eur. J. Oper. Res.
**174**(1), 112–123 (2006)Google Scholar - 22.Mobley, R.K.: An Introduction to Predictive Maintenance. Elsevier, Amsterdam (2002) Google Scholar
- 23.Nakagawa, T.: Optimum policies when preventive maintenance is imperfect. IEEE Trans. Reliab.
**28**(4), 331–332 (1979a)Google Scholar - 24.Nakagawa, T.: Imperfect preventive-maintenance. IEEE Trans. Reliab.
**28**(5), 402–402 (1979b)Google Scholar - 25.Nilsson, J., Bertling, L.: Maintenance management of wind power systems using condition monitoring systems—life cycle cost analysis for two case studies. IEEE Trans. Energy Convers.
**22**(1), 223–229 (2007)Google Scholar - 26.Peng, Y., Dong, M., Zuo, M.J.: Current status of machine prognostics in condition-based maintenance: a review. Int. J. Adv. Manuf. Technol.
**50**(1–4), 297–313 (2010)Google Scholar - 27.Pham, H., Wang, H.: Imperfect maintenance. Eur. J. Oper. Res.
**94**(3), 425–438 (1996)Google Scholar - 28.Prajapati, A., Bechtel, J., Ganesan, S.: Condition based maintenance: a survey. J. Qual. Maint. Eng.
**18**(4), 384–400 (2012)Google Scholar - 29.Ribrant, J., Bertling, L.: Survey of failures in wind power systems with focus on Swedish wind power plants during 1997–2005. IEEE Trans. Energy Convers.
**22**(1), 167–173 (2007)Google Scholar - 30.Ross, S.M.: Average cost semi-Markov decision processes. J. Appl. Probab.
**7**(3), 649–656 (1970)Google Scholar - 31.Ross, S.M.: Introduction to Probability Models. Academic Press, Cambridge (2014)Google Scholar
- 32.Schäl, M.: On the second optimality equation for semi-Markov decision models. Math. Oper. Res.
**17**(2), 470–486 (1992)Google Scholar - 33.Serfozo, R.: Basics of Applied Stochastic Processes. Springer, Berlin (2009). 2nd printing, 2012 editionGoogle Scholar
- 34.Spinato, F., Tavner, P.J., Van Bussel, G.J.W., Koutoulakos, E.: Reliability of wind turbine subassemblies. IET Renew. Power Gener.
**3**(4), 387–401 (2009)Google Scholar - 35.Tavner, P.J., Xiang, J., Spinato, F.: Reliability analysis for wind turbines. Wind Energy Int. J. Prog. Appl. Wind Power Convers. Technol.
**10**(1), 1–18 (2007)Google Scholar - 36.Van Oosterom, C., Elwany, A., Çelebi, D., Van Houtum, G.J.: Optimal policies for a delay time model with postponed replacement. Eur. J. Oper. Res.
**232**(1), 186–197 (2014)Google Scholar - 37.Vega-Amava, O., Luque-Vásquez, F.: Sample-path average cost optimality for semi-Markov control processes on Borel spaces: unbounded costs and mean holding times. Appl. Math.
**27**(3), 343–367 (2000)Google Scholar - 38.Wang, W.: Delay time modelling. In: Kobbacy, K.A.H. & Murthy, D.N.P. (eds.) Complex System Maintenance Handbook, pp. 345–370. Springer, Berlin (2008)Google Scholar
- 39.Wang, W.: An overview of the recent advances in delay-time-based maintenance modelling. Reliab. Eng. Syst. Saf.
**106**, 165–178 (2012)Google Scholar - 40.Yang, M., Makis, V.: ARX model-based gearbox fault detection and localization under varying load conditions. J. Sound Vib.
**329**(24), 5209–5221 (2010)Google Scholar - 41.Yushkevich, A.A.: On semi-Markov controlled models with an average reward criterion. Theory Probab. Appl.
**26**(4), 796–803 (1982)Google Scholar - 42.Zhu, Q., Peng, H., Timmermans, B., Van Houtum, G.J.: A condition-based maintenance model for a single component in a system with scheduled and unscheduled downs. Int. J. Prod. Econ.
**193**, 365–380 (2017)Google Scholar - 43.Zhu, Q., Peng, H., Van Houtum, G.J.: An age-based maintenance policy using the opportunities of scheduled and unscheduled system downs. Beta report, Eindhoven University of Technology (2016)Google Scholar
- 44.Zio, E., Compare, M.: Evaluating maintenance policies by quantitative modeling and analysis. Reliab. Eng. Syst. Saf.
**109**, 53–65 (2013)Google Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.