1 Introduction

The units or systems such as machines used in construction, chemical plants, power plants, heavy electrical and mechanical engineering, parts of vehicles, etc., are often subject to shocks in the course of their operation. These shocks may be assumed to appear at random points in time according to a point process and each shock causes some random amount of damage to the operating unit. The unit or system may fail at some sudden shock or it may withstand the shocks until the total damage caused by the shocks exceeds a critical level. The latter one is often encountered in practical situations and can be studied using a cumulative damage model (Zhao and Nakagawa 2018). In this model, the damage caused in the form of crack growth, creep, fatigue, wear, etc., is accumulated until it becomes greater than a pre-specified threshold level. Some real life scenarios where this model turns out to be very helpful are discussed in the following.

Crack in a vehicle axle caused by overload, jerk, etc., grows as long as it is above a certain depth and the axle breaks after that. Scarf et al. (1996) used a stochastic model under periodic inspection to study crack growth. Stochastic models were applied to study fatigue damage of materials by Sobczyk (1987) and Sobczyk and Spencer (1992). The electric power of a cell-phone battery, initially stored by chemical energy, is weakened by normal functionalities of a cell-phone and is subject to frequent calls leading to accumulated damage or energy loss (Bhuyan and Dewanji 2017a). Similarly, as a result of frequent updation of a database system, un-accessed data accumulates as garbage and the system collapsed as soon as it exceeds the tolerance level (Nakagawa 2007, p-131). As for example, the mailbox becomes full as a result of accumulation of emails over time and the account fails to receive any further email (Bhuyan and Dewanji 2017a). Keeping the unit or system functional until its failure may turn out to be cost-ineffective and lead to hazardous situations. If the axle of an automobile breaks in the course of its journey, then it may cost in terms of human lives, the goods it carries and extra money to repair. It creates a havoc among the users when servers in large systems such as banks, railways, online application programmes, etc., become unresponsive which often happens due to garbage created inside the database. Failure of units in nuclear power plants has proven its fatality in some events in the recent past. Hence, there is a need for preventive maintenance of the units before failure occurs (Nakagawa 2005).

There has been ample research on the optimum replacement strategy assuming cumulative damage model with a constant strength or threshold level (Zhao and Nakagawa 2018, Ch-2). See also Taylor (1975), Zuckerman (1977), Chikte and Deshmukh (1981), and Zhao et al. (2013) for work on replacement policies under similar damage accumulation models. All these works have assumed constant strength which may not be realistic in many practical situations. Zhou et al. (2016) proposed periodic preventive maintenance for leased equipment with continuous internal degradation and stochastic external shock damage. An operating unit is affected by human errors, material quality and operating conditions, etc., and the unit’s capacity to withstand damage due to shocks may decrease as its operating time increases (Satow et al. 2000). Hence, the strength of a unit may reasonably be described by a deterministic curve which is decreasing in time. Recently, computation and estimation of reliability under such cumulative damage model has been considered by Bhuyan and Dewanji (2017a; b).

In this article, we have discussed the replacement policies for the cumulative damage model having strength that is continuously non-increasing over time. In principle, we introduce a quantity called ‘expected cost per unit time’ for each set of replacement (design) variables and minimize the same over the design variables to obtain the ’optimal’ replacement policy. Note that this expected cost per unit time depends on the distributions of the successive shock arrival times and also of the corresponding damages and the deterministic strength degradation curve in addition to the different cost components and the design variables (See Sects. 2 and 3 for details). The computation of this expected cost per unit time is, however, often very challenging even for constant strength. Even if the distribution functions of both inter-arrival time between successive shocks and damage due to each shock possess closure property under convolution, the expression for the expected cost per unit time involves integrals and infinite sums, numerical evaluation of which is difficult. Complexity of computation increases if closed form expressions for the convolution of the associated distribution functions are not available and/or the strength is time dependent (see Sect. 3). In order to avoid such difficulty, Nakagawa (1976) and Endharta and Yun (2014) assumed constant strength and independent and identically distributed (iid) Exponential distributions for the successive inter-arrival times (that is, the successive shock arrivals follow a homogeneous Poisson process) and damages so that the related convolutions follow the respective Gamma distributions. See also Satow et al. (2000), however, for linearly decreasing strength curve. In this article, we propose a simulation based method for evaluation of the expected cost per unit time which provides flexibility in choosing the distribution functions for both inter-arrival time between successive shocks and damage due to each shock. Therefore, the domain of application of the proposed method is much wider.

In the next section, we discuss the preliminaries which include the notation and assumptions regarding the proposed modeling framework. In Sect. 3, we present the mathematical formulations for the basic replacement policies with different optimization criteria. Section 4 deals with the different computational methods and the issues therein. Some numerical results for different choices of the damage and inter-arrival time distributions, strength degradation for the unit, etc., are presented in Sect. 5. In Sect. 6, we consider some generalizations of the damage distribution and present some numerical results in those cases. We illustrate the proposed method using real case studies in Sect. 7. Finally, we conclude the paper with some remarks in Sect. 8.

2 Preliminaries

We assume that the operating unit starts working at time 0 and its initial damage level is 0. As time progresses, it is subject to shocks and suffers from some amount of damage due to each shock. These damages caused by the successive shocks are accumulated over time. Let N(t) represent the number of shocks by time t. It is assumed that the shocks arrive according to a renewal process. Let \(X_{1}, X_{2},\ldots \) be the sequence of independent and identically distributed random variables which denote the inter-arrival times between successive shocks having the common distribution function \(F(\cdot )\). Then \(S_{j} = \sum _{i=1}^{j} X_{i},~j \ge 1\), represents the arrival time of the jth shock and has the distribution function \(F^{(j)}(\cdot )\), where \(F^{(j)}(\cdot )\) is the j-fold convolution of \(F(\cdot )\) with itself. The successive damages \(W_{1}, W_{2},\ldots \) are assumed to be independent and identically distributed and also independent of the shock arrival process N(t) (that is, the \(X_i\)’s). Let \(W_{j},~j\ge 1\), have a common distribution function \(G(\cdot )\). Then the total damage at the jth shock will have the distribution function \(G^{(j)}(\cdot )\), the j-fold convolution of \(G(\cdot )\) with itself.

The strength of the unit is described by K(t) which is continuous and decreasing in time t. Note that, under the present stress-strength interface, there are two different types of failure modes, either due to strength degradation at or below the existing level of accumulated stress, or due to arrival of a shock resulting in the increased stress exceeding or equaling the strength at that time (See Bhuyan and Dewanji 2017b). Then a unit fails when its strength reduces to zero even if no shock arrives by that time. One needs to consider corrective replacement of the unit with a new one immediately after failure. According to the existing basic replacement policies, the unit is preventively replaced before failure at a planned time T \((0<T<t_0=\inf \{t:K(t)=0\})\), or a shock number N \((N=1,2,\ldots )\), or a damage level Z \((0<Z<K(0))\), whichever occurs first; otherwise it is replaced at failure (corrective replacement). In our work, we have adopted the basic replacement policies with an additional condition \(Z \le K(T)\) so that the damage level Z has some relevance in deciding the replacement policies. If the total damage at the Nth shock exceeds the pre-specified damage level Z, or the strength at that time of shock arrival, then it is assumed that the replacement of the unit is due to damage, or failure, as is the case, instead of the shock number N. This assumption is reasonable if both the replacement costs, due to damage Z and due to failure, are higher than that due to shock number N, in order to safeguard the worse situation. Similarly, if the total damage at the Nth shock exceeds both the damage level Z and the strength at that time of shock arrival, we assume that the replacement is due to the failure, since that is presumably the most expensive of the three.

Let us denote the probabilities that the unit is replaced at scheduled time T, shock number N, damage level Z and at failure, by \(p_T\), \(p_N\), \(p_Z\) and \(p_K\), respectively. We assume that all replacements are instantaneous. There is cost associated with each replacement with the cost of corrective replacement being higher than those of the preventive replacement. If \(c_{T}\), \(c_{N}\), \(c_{Z}\), \(c_{K}\) are the costs incurred from replacement at time T, shock number N, damage level Z and at failure, respectively, then \(c_{K}\) is higher than each of \(c_{T}\), \(c_{N}\) and \(c_{Z}\). The expected cost for replacement can be obtained as a function of the design variables T, N and Z, denoted by \({\tilde{C}}(T, N, Z)\), which upon division by the mean time to replacement gives the expected replacement cost per unit time, termed as the ‘expected cost rate’ for brevity. The expected cost rate as defined above is known as ‘long run mean cost’ in the context of renewal process theory which requires the process to be regenerative, or renewed, after each replacement, preventive or corrective. In the stress-strength interference leading to the cumulative damage model, this regenerative or renewal property does not hold in general, since the shock arrivals may not start anew after each replacement. Nevertheless, if the shock arrival is modeled by a homogeneous Poisson process (HPP), it behaves like starting anew at each replacement time due to the memoryless property; therefore, the whole stress-strength interference starts anew at each replacement time ensuring the renewal property. For non-HPP shock arrivals, one can think of three alternatives. First, depending on the situation, the shock arrivals may be linked with the functioning of the device, like a particular type of stressful uses [See both the real examples in Bhuyan and Dewanji (2017a)], in which case the renewal property at each replacement time is a clear consequence. Secondly, one may be interested in the case of only the first replacement in which case the expected cost rate may be interpreted as the average cost per unit time until the first replacement (Rafiee et al. 2015). Thirdly, the system can only undergo a limited number of replacements in practice after which the system becomes outdated. Noting that only the first shock arrival time after a replacement has a different (in fact, residual life) distribution, the expected cost rate may be taken as an approximation to the ‘long run mean cost’ under this non-renewal point process and, hence, a reasonable objective function to minimize. Notwithstanding this difficulty associated with the definition of the expected cost rate, we henceforth consider this as the objective function in view of the above discussion. Shue et al. (2019) considered a similar objective function, termed as ‘long term expected cost per unit time’, for optimization of two replacement policies in k-out-of-n systems. See also Lee and Cha (2018), Zhao et al. (2018) and Eryilmaz (2017) among others for consideration of similar objective function to obtain optimal replacement policies in different contexts and modeling scenarios.

3 Optimal replacement policies

As described in the previous section, a preventive replacement is to be carried out at a planned time T, or at a shock number N, or at a damage level Z, whichever occurs first. As in Satow et al. (2000), we first consider these three design variables T, N and Z one at a time and consider the corresponding expected cost rates as the objective function to minimize. However, the expressions for the expected cost rates are different because of the time-dependent strength degradation. Thereafter, we deal with all these three variables simultaneously. For this purpose, we derive the expected cost rates for replacement separately as a function of T, N and Z and then all taken together. In the following, for the ease of understanding, we present simply the respective expressions for the expected cost rates with some reference to the materials in the Appendix, where details of the derivations are presented.

We first discuss the preventive replacement of the unit only at a planned time T. The unit is replaced either at T or at failure, whichever occurs first. There is no replacement at the Nth shock or the cumulative damage reaching Z. As discussed in the previous section, we assume that the replacement is corrective rather than preventive, if failure happens at time T. Then, the expected cost rate \(C_{1}(T)\), when the unit is replaced either at T or at failure, can be obtained, dividing (B1) by (B2), as

$$\begin{aligned} C_{1}(T) = \frac{c_{K} - \left( c_{K} - c_{T}\right) \left[ {\bar{F}}(T) + \sum _{j=1}^{\infty } \left[ F^{(j)}(T) - F^{(j+1)}(T) \right] G^{(j)}(K(T)) \right] }{\int _{0}^{T}{\bar{F}}(t)dt + \sum _{j=1}^{\infty } \int _{0}^{T} \left[ F^{(j)}(t) - F^{(j+1)}(t)\right] G^{(j)}(K(t)) dt}. \end{aligned}$$
(1)

When \(K(T)=K\), for the case of constant strength, Eq. (1) simplifies to that of Eq. (3.11) of Nakagawa (2007, p-42).

Next we consider the case when the operating unit is replaced either at the planned shock number N or at failure, whichever occurs first. There is no replacement at a planned time T or due to reaching a damage level Z. As discussed before, we assume that the replacement is corrective rather than preventive, if failure happens at the arrival of the Nth shock. The expected cost rate \(C_{2}(N)\) for replacement is, dividing (B3) by (B4), given by

$$\begin{aligned} C_{2}(N) = \frac{c_{K} - \left( c_{K} - c_{N}\right) \int _{0}^{\infty } G^{(N)}(K(s)) dF^{(N)}(s)}{\mu _{F} + \sum _{j=1}^{N-1} \int _{0}^{\infty } \left[ F^{(j)}(t) - F^{(j+1)}(t)\right] G^{(j)}(K(t)) dt}. \end{aligned}$$
(2)

When \(K(s)=K\), for the case of constant strength, Eq. (2) simplifies to that of Eq. (3.20) of Nakagawa (2007, p-44).

Now we consider the problem of replacement at a planned cumulative damage level Z or at failure, whichever occurs first. There is no replacement at the planned time T or at the Nth shock. Here, the expected cost rate for replacement, denoted by \(C_{3}(Z)\), is obtained, dividing (B5) by (B6), as

$$\begin{aligned} C_{3}(Z)= & {} c_{K} - \left( c_{K} - c_{Z}\right) \Big [ \int _{0}^{T_{0}} \left[ G(K(s)) - G(Z)\right] dF(s) \nonumber \\&+ \sum _{j=1}^{\infty } \int _{0}^{T_{0}} \int _{0}^{Z} \left[ G(K(s) - x) - G(Z - x) \right] dG^{(j)}(x) dF^{(j+1)}(s) \Big ]\nonumber \\&\Bigg / \Big [ \mu _{F} + \sum _{j=1}^{\infty } \int _{0}^{\infty } \left[ F^{(j)}(t) - F^{(j+1)}(t)\right] G^{(j)}({\tilde{K}}(t)) dt \Big ], \end{aligned}$$
(3)

where \(T_0\) and \({{\tilde{K}}}(t)\) are as defined in the Appendix A.3. When \(K(s)=K\), for the case of constant strength, Eq. (3) simplifies to that of Eq. (3.24) of Nakagawa (2007, p-45).

Finally, we consider preventive replacement under simultaneous consideration of T, N and Z. Replacement of the unit takes place at a planned time T, shock number N, at a damage level Z, or at failure, whichever occurs first. As discussed before, if the cumulative damage at the Nth shock exceeds Z as well as the strength at that time, we assume that the replacement is corrective, since that is more expensive compared to preventive replacement. The expected cost rate of replacement in this case, denoted by C(TNZ), is obtained, dividing (B7) by (B8), as

$$\begin{aligned} C(T, N, Z)= & {} \Bigg [c_{K} - \left( c_{K} - c_{T}\right) \Big [{\bar{F}}(T) + \sum _{j=1}^{N-1} G^{(j)}(Z)\left\{ F^{(j)}(T) - F^{(j+1)}(T) \right\} \Big ] \nonumber \\&- \left( c_{K} - c_{N}\right) F^{(N)}(T) G^{(N)}(Z) - \left( c_{K} - c_{Z}\right) \Big \{\int _{0}^{T} \left[ G(K(s)-G(Z))\right] dF(s) \nonumber \\&+ \sum _{j=1}^{N-1} \int _{0}^{T} \int _{0}^{Z} \left[ G(K(s) - x) - G(Z - x) \right] dG^{(j)}(x) dF^{(j+1)}(s)\Big \} \Bigg ] \nonumber \\&\big / \Bigg [\int _{0}^{T}{\bar{F}}(t)dt + \sum _{j=1}^{N-1} G^{(j)}(Z) \int _{0}^{T} \left[ F^{(j)}(t) - F^{(j+1)}(t)\right] dt\Bigg ]. \end{aligned}$$
(4)

When \(K(s)=K\), for the case of constant strength, Eq. (4) simplifies to that of Eq. (3.8) of Nakagawa (2007, p-42).

Our objective is to find the optimum choices of T, N and Z which minimize the respective expected cost rates, under the four different design considerations described above, in the corresponding design space. Theoretically optimizing the expected cost rates leads to complicated expressions and requires imposing more conditions which are practically less important. Moreover, no analytical solution of the optimum replacement policy is available, even if the damage distribution and that of the inter-arrival times possess closure property under convolution like, for example, the Exponential distribution. See Nakagawa (2007, Ch-3) for details. Thus, there is a need to go for numerical investigation for finding an approximation of the optimum replacement policy, denoted by \({\hat{T}}\), \({\hat{N}}\), and \({\hat{Z}}\), respectively. Henceforth, we refer to this approximation as the ‘optimum replacement policy’ although this is only approximately optimal. The methods and the issues associated with this investigation are discussed in the following section.

4 Computational issues

As remarked in Sect. 1, the computation of the expected cost rates, given by Eqs. (1), (2), (3), and (4), is extremely challenging. The expressions for expected cost rates involve infinite sums or infinite integrals, or both in some cases. Evaluation of the integrals can be carried out using numerical integration. Any standard software package equipped with numerical integration (e.g.- integrate in the package R) can be used for this purpose. The details of the algorithm and its precision are discussed in Piessens et al. (1983). On the other hand, the infinite sums can be approximated by taking large number, say 10,000, of terms and ignoring the terms after that. Evaluation of the expected cost rate using this approach is computationally challenging but feasible if both of the inter-arrival time between successive shocks and damage due to each shock follow Exponential distributions. However, as mentioned before, analytical solution of the optimum replacement policy is not available even for this simple scenario. The complexity of computation increases if the distribution functions do not have closure property under convolution (e.g., Weibull, Log-normal, etc.). To address this difficulty in numerically obtaining the expected cost rates in other situations, we resort to a method of simulation, as described below, to obtain the expected cost rates approximately.

In this method, the whole process of shock arrivals and accumulation of damages as against the degradation of strength is virtually created. For a fixed T, N and Z, the proposed algorithm gives as output one realization each for the time to replacement \(T_{R}\) and a variable \(I_{R}\) indicating whether the replacement is due to failure or due to one of \(N,\ T\) and Z taking values 0, 1, 2 and 3, respectively. The mean time to replacement and the probabilities of replacement can be estimated by simulating a large number, say 10,000, of realizations of \(T_{R}\) and \(I_{R}\). The algorithm for simulating a realization for each of \(T_{R}\) and \(I_{R}\) is given below :

  • \(X_{0} = 0\), \(W_{0} = 0\); For \(i = 1, 2, \ldots \),

  • Step 1 Simulate \(X_{i} \sim F(\cdot )\) and \(W_{i} \sim G(\cdot )\);

  • Step 2 Calculate \(S_{i} = \sum _{j=0}^{i} X_{j}\) and \(L_{i} = \sum _{j=0}^{i} W_{j}\);

  • Step 3 If \(L_{i} < {\tilde{K}}(S_{i})\), then next i (i.e. repeat Step 1 and Step 2); else, if \(L_{i-1} \le {\tilde{K}}(S_{i}) < L_{i}\), then set \({\tilde{T}} = S_{i}\); else, find \({\tilde{T}} = t\) by solving \({\tilde{K}}(t) = L_{i-1}\);

  • Step 4 If \(N({\tilde{T}}) > N\), then \(T_{R} = \min \left\{ T, S_{N} \right\} \) and \(I_{R} = 11(T_{R}=S_{N})+21(T_{R}=T)\);

    else, if \({\tilde{T}} < T_{0}\) and \(L_{i} < K(S_{i})\), then \(T_{R} = \min \left\{ {\tilde{T}}, T \right\} \) and \(I_{R} = 21(T_{R}=T)+31(T_{R}={\tilde{T}})\);

    else, if \({\tilde{T}} < T\), then \(T_{R} = {\tilde{T}}\) and \(I_{R} = 0\);

    else, \(T_{R} = T\) and \(I_{R} = 2\).

Now, the expected cost rate C(TNZ) given in (4) is approximated as \({\hat{C}}(T,N,Z)=\frac{c_{T}{\bar{I}}_{T} + c_{Z}{\bar{I}}_{Z}+ c_{N}{\bar{I}}_{N}+ c_{K}(1-{\bar{I}}_{T}-{\bar{I}}_{Z}-{\bar{I}}_{N}) }{{\bar{T}}_{R}}\), where \({\bar{I}}_{T}\), \({\bar{I}}_{Z}\), \({\bar{I}}_{N}\) and \({\bar{T}}_{R}\) are mean of \(I_{T}=1(I_{R}=2)\), \(I_{Z}=1(I_{R}=3)\), \(I_{N}=1(I_{R}=1)\) and \(T_{R}\), respectively, based on 10000 simulated observations. The proposed algorithm evaluates the expected cost rate as a function of T, N and Z which can then be minimized for finding the optimal values of T, N and Z for replacement. The optimal replacement, while considering one of T, N and Z (See Sect. 3), can be determined by minimizing \(\hat{C_{1}}(T)={\hat{C}}(T, \infty ,\infty )\), \(\hat{C_{2}}(N)={\hat{C}}(\infty ,N,\infty )\), and \(\hat{C_{3}}(Z)={\hat{C}}(\infty ,\infty ,Z)\), respectively. The minimum expected cost rate can be obtained by using the methods of grid search, simulated annealing, etc. Note that the approximated expected cost rate obtained by the aforementioned simulation algorithm may consist of some local minimums. The method of simulated annealing has been implemented to escape a local minimum with certain probability in order to search for the global minimum. Interested readers can see Kirkpatrick et al. (1983) and Dowsland (1995) for more details on simulated annealing. Since the number of design parameters is small, sequential grid search also preforms efficiently. It is important to note that the domain of application of the proposed simulation method is much wider providing flexibility in choosing both the distribution functions for inter-arrival time between successive shocks and damage due to each shock.

5 Numerical results

The computations have been done under different distributional assumptions with several sets of values for the associated parameters, different strength degradation and the cost incurred from replacement at failure. In all of the computations, the costs incurred from preventive replacements at T, N or Z are assumed to be 1, i.e. \(c_{T} = c_{N} = c_{Z} = 1\). The inter-arrival time between successive shocks has been assumed to follow (i) Exponential distribution with mean \(1 / \lambda \), denoted by \(Exp(\lambda )\), and (ii) Log-normal distribution with Normal parameters \(\mu \) and \(\sigma \), denoted by \(LN(\mu , \sigma )\), with mean being \(\exp \left( \mu +\frac{1}{2}\sigma ^{2}\right) \). The distribution functions for the damage caused by each shock has been assumed to be either (i) Exponential with mean \(1 / \mu \), denoted by \(Exp(\mu )\), or (ii) Weibull with scale parameter \(\alpha \) and shape parameter \(\beta \), denoted by \(Wei(\alpha , \beta )\), with mean damage being \(\alpha \Gamma \left( 1+\frac{1}{\beta }\right) \). The strength degradation curve K(t) is assumed to be exponential, linear or constant over time.

In Table 1, we present the optimum values \({\hat{T}}\), \({\hat{N}}\) and \({\hat{Z}}\) which minimize the approximate expected cost rates \({\hat{C}}_{1}(T)\), \({\hat{C}}_{2}(N)\) and \({\hat{C}}_{3}(Z)\), respectively, along with the corresponding minimum expected cost rates. Then, in Table 2, we present the optimum values \({\hat{T}}\), \({\hat{N}}\) and \({\hat{Z}}\) by minimizing the approximate expected cost rate \({\hat{C}}(T, N, Z)\) as a function of T, N and Z, along with the corresponding minimum expected cost rate. In Table 3, a different set of cost components (\(c_T=0.5, c_N=1.5, c_Z=1.0\) and \(c_K=6\)) is considered for the optimum values \({\hat{T}}, {\hat{N}}\) and \({\hat{Z}}\), corresponding to simultaneous optimization as in Table 2, to study the impact of differential cost component. When both F and G are Exponential, then one can compute the expected cost rates, given by Eqs. (1), (2), (3), and (4), directly (as remarked at the beginning of Sect. 4) and the optimization results are obtained by implementing the grid search method. This method, used only when both F and G are Exponential, is termed as ‘Direct: GS’ in the tables (GS meaning ‘grid search’). Otherwise, when the expected cost rates are approximated by simulation, optimization results are obtained by using the grid search and/or the simulated annealing algorithm, termed as ‘Approx: GS’ and ‘Approx: SA’ (SA meaning ‘simulated annealing’), respectively, in the tables. These latter two methods have sometime been used for comparison even when both F and G are Exponential. Note that the ‘Direct: GS’ method is similar to the approach proposed by Endharta and Yun (2014). It is clearly seen that, when both F and G are Exponential, the results based on the approximate expected cost rates using the simulation method are similar to those obtained by the direct method (See Tables 1, 2, 3). As expected, one can observe that the optimal values \({\hat{T}}\), \({\hat{N}}\) and \({\hat{Z}}\) decrease as cost of corrective replacement \(c_{K}\) increases (See Table 1). Also, as expected, the optimal values of T, N and Z in Table 2 are larger compared to those in Table 1, since the condition of replacement in Table 2 is more stringent (any of the design variables T or N or Z exceeds the respective threshold). Interestingly, the minimum expected cost rate is smaller for the simultaneous optimization of T, N and Z compared to those of individual cases, as expected, since the domain of minimization is smaller in the individual cases. Note that, in Table 3, \({\hat{N}}\) and \({\hat{Z}}\) are larger compared to those in Table 2, as expected, since the costs \(c_N\) and \(c_Z\) are higher. So, it conforms with the natural trend that, if a cost component is higher, the corresponding threshold tends to be higher to safeguard against that cost.

Table 1 Optimal \({\hat{T}}\), \({\hat{N}}\), \({\hat{Z}}\) and the corresponding minimal expected cost rates \(C_{1}({\hat{T}})\), \(C_{2}({\hat{N}})\) and \(C_{3}({\hat{Z}})\) with \(c_{T} = c_{N} = c_{Z} = 1\). Means of the relevant distributions are given in parentheses
Table 2 Optimal \({\hat{T}}\), \({\hat{N}}\), \({\hat{Z}}\) and the corresponding minimal expected cost rate \(C({\hat{T}}, {\hat{N}}, {\hat{Z}})\) with \(c_{T} = c_{N} = c_{Z} = 1\). Means of the relevant distributions are given in parentheses
Table 3 Optimal \({\hat{T}}\), \({\hat{N}}\), \({\hat{Z}}\) and the corresponding minimal expected cost rate \(C({\hat{T}}, {\hat{N}}, {\hat{Z}})\) with \(c_{K}=6, c_{T} =0.5 , c_{N} =1.5 , c_{Z} =1 \). Means of the relevant distributions are given in parentheses

6 Some generalizations

In this section, we consider some generalizations in the assumption related to the successive damage distributions which may be more realistic in some situations. The damages due to shocks may be either dependent or independent but not identically distributed. As we move on to these generalized scenarios, the computational difficulty associated with the direct method also increases. In such situations, the simulation method turns out to be more effective. The algorithm for simulation remains similar to that described in Sect. 4 except for the damage distributions for simulating the \(W_i\)’s which change accordingly. The optimal values of T, N and Z and the corresponding minimum expected cost rates are evaluated in the same manner.

6.1 Independent but non-iid damage distributions

Here we assume that the damages caused by the successive shocks may be independent but not identically distributed. For instance, there may be situations where the successive shocks cause damages which are stochastically larger than those due to the preceding ones. Note that when the damages \(X_{1}, X_{2},\ldots \) are independent but not identically distributed, then \(Y = \sum _{i = 0}^{n} X_{i}\) under these assumptions may not fall into any known class of distributions. As mentioned before, there are several difficulties in evaluating the expected cost rates directly since the expressions are not in a closed form. Interestingly, the algorithm for the simulation method remains the same except that the successive damages are now generated from the non-identical distributions and can be easily implemented.

The optimal values of T, N and Z and the corresponding minimum values of expected cost rates \(C_{1}(T)\), \(C_{2}(N)\) and \(C_{3}(Z)\) under different distributional assumptions are presented in Table 4. The shocks are assumed to arrive according to a renewal process, i.e. the inter-arrival time between successive shocks are iid with a common distribution function \(F(\cdot )\). We have chosen the inter-arrival time distribution to be (i) Exponential distribution with mean \(1 / \lambda \), denoted by \(Exp(\lambda )\), (ii) Log-normal distribution with Normal parameters \(\mu \) and \(\sigma \), denoted by \(LN(\mu , \sigma )\). Unlike the case of iid damages, here it is assumed that the damage due to ith shock has a distribution function \(G_{i}(\cdot )\). The choices for \(G_{i}(\cdot )\) are (i) Gamma with scale parameter \(\theta _{i}\) and shape parameter \(\delta \), denoted by \(Ga(\theta _{i}, \delta )\), with mean being \(\delta \theta _{i}\) or (ii) Weibull with scale parameter \(\alpha _{i}\) and shape parameter \(\beta \), denoted by \(Wei(\alpha _{i}, \beta )\). The computations are done for the cases when the strength of the system K(t) is decreasing with time both exponentially and linearly. As before, the values of \(c_{T}\), \(c_{N}\) and \(c_{Z}\) are kept unchanged, i.e. \(c_{T} = c_{N} = c_{Z} = 1\), and different choices for the costs incurred from replacement at failure have been considered.

Under similar distributional assumptions, we have calculated the optimum values \({\hat{T}}\), \({\hat{N}}\) and \({\hat{Z}}\) corresponding to the minimum value of the expected cost rate C(TNZ). The results are presented in Table 5.

Table 4 Optimal \({\hat{T}}\), \({\hat{N}}\), \({\hat{Z}}\) and the corresponding minimal expected cost rates \(C_{1}({\hat{T}})\), \(C_{2}({\hat{N}})\) and \(C_{3}({\hat{Z}})\) for independent but not identically distributed damages using the Approx: GS method. Means of the relevant distributions are given in parentheses
Table 5 Optimal \({\hat{T}}\), \({\hat{N}}\), \({\hat{Z}}\) and the corresponding minimal expected cost rate \(C({\hat{T}}, {\hat{N}}, {\hat{Z}})\) for independent but not identically distributed damages. Means of the relevant distributions are given in parentheses

6.2 Dependent damage distribution

In order to model dependent damages, a multivariate damage distribution needs to be considered. We consider a model in which the damage \(W_{i}\) due to the ith shock can be expressed as \(W_{i} = Z_{0} + Z_{i}\), where \(Z_0\) is a random variable representing the minimum damage that arrival of a shock can cause to the unit and \(Z_i\) is the additional damage caused by the ith shock depending on its severity, etc.. Then the successive damages \(W_{1}, W_{2},\ldots \) become dependent because of the common minimum damage \(Z_{0}\). If the minimum damage \(Z_{0}\) and the additional damages \(Z_{i}\)’s are assumed to be independent \(Ga(\theta _{i}, 1)\) random variables for \(i = 0, 1, 2,\ldots \), then the joint distribution of \(W_{1},\ldots ,W_{n}\), for given n, is known as the Cheriyan and Ramabhadran’s multivariate Gamma distribution (Kotz et al. 2000). The distribution function of the cumulative damage \(U = \sum _{i=1}^{n}W_{i}\) under these assumptions do not fall into any known class of distributions. As we have frequently mentioned, there are several other difficulties in evaluating the expected cost rates since the expressions are not in a closed form. By using the simulation method, we can overcome these complications while having less computational burden. In this dependent modeling, in particular, the generation of successive damages is simple due to the additive form of the \(W_i\)’s. The objective, similar to the previous cases, is to find the optimal values of T, N and Z, which result in minimum expected cost rates.

In the following illustrations, as before, we consider the shocks to arrive according to a renewal process with inter-arrival time distribution being (i) Exponential distribution with mean \(1 / \lambda \), denoted by \(Exp(\lambda )\), and (ii) Log-normal distribution with Normal parameters \(\mu \) and \(\sigma \), denoted by \(LN(\mu , \sigma )\). The dependent damages are assumed to follow Cheriyan and Ramabhadran’s multivariate Gamma distribution with parameters \(\theta _{0}\) and \(\theta _{j} = \theta \) for all \(j = 1, 2,\ldots \), denoted by \(MVGa(\theta _{0}, \theta )\), with mean damage equal to \(\theta _{0} + \theta \). The strength of the operating unit can be either exponentially or linearly degrading and the assumptions on the costs incurred from preventive replacement of the unit remains same. The expected cost rate C(TNZ) is also minimized as a function of T, N and Z taken simultaneously. The computational burden in the simulation method does not increase much because of the dependent damages. The numerical results for finding \({\hat{T}}\), \({\hat{N}}\) and \({\hat{Z}}\), separately and simultaneously, are presented in Tables 6 and 7, respectively.

Table 6 Optimal \({\hat{T}}\), \({\hat{N}}\), \({\hat{Z}}\) and the corresponding minimal expected cost rates \(C_{1}({\hat{T}})\), \(C_{2}({\hat{N}})\) and \(C_{3}({\hat{Z}})\) for dependent damage distributions using the Approx: GS method. Means of the relevant distributions are given in parentheses
Table 7 Optimal \({\hat{T}}\), \({\hat{N}}\), \({\hat{Z}}\) and the corresponding minimal expected cost rate \(C({\hat{T}}, {\hat{N}}, {\hat{Z}})\) for dependent damages. Means of the relevant distributions are given in parentheses

7 Case studies

In Sect. 1, we have discussed some application in database management systems for its efficient operation. It is a common practice among email users to forward emails automatically from various email accounts to a preferred email account for ease of operation. In this process, users normally do not clean the mailbox of the secondary email accounts. As a result of accumulation of emails over time, the secondary mailbox becomes full and the account fails to receive any further email. Bhuyan and Dewanji (2017a) collected data on 22 such identical systems and observed failure time (in hours) data and the number of emails received up to the time of failure. The mailbox limit (that is, the strength of the system s(t)) is kept fixed at 5 MB. In a preliminary data analysis, the average number of arriving shocks seems to increase with time in a linear fashion. Therefore, we assume that emails arrive according to a homogeneous Poisson process and the estimated mean inter-arrival time is 3 hours 27 minutes. We find that the Log-normal distribution fits the size (in MB) of the successive emails well and the corresponding parameter estimates are \({\hat{\mu }}=-7.32\) and \({\hat{\sigma }}=3.16\) with mean 97.57 KB. The optimal values of T, N, and Z are obtained by minimizing \(\hat{C_{1}}(T)\), \(\hat{C_{2}}(N)\), and \(\hat{C_{3}}(Z)\), respectively, and plotted against \(c_K\) in Figs. 1, 2, 3, keeping the other cost components \(c_T, c_N\) and \(c_Z\) for replacement fixed at unity. As expected, the optimal value of T decreases as \(c_{K}\) increases and sharp decline is observed up to \(c_{K}=5\) . Similar patterns are also observed for optimal values of N and Z. The expected cost rates corresponding to optimal replacement strategies are plotted against \(c_{K}\) in Fig. 4. It is observed that the optimal strategy based on Z is better compared to the same based on T and N with respect to the expected cost rate. As for illustration by considering all the design parameters together, the optimal replacement strategy \(({\hat{T}},{\hat{N}},{\hat{Z}})=(708.89,183,3.86)\) and the associated expected cost rate \(3.82\times 10^{-3}\) are obtained by minimizing \({\hat{C}}(T,N,Z)\) for \(c_K=2\) with \(c_T=c_N=c_Z=1\).

Fig. 1
figure 1

The optimal T for mailbox experiment keeping cost of replacement fixed at unity

Fig. 2
figure 2

The optimal N for mailbox experiment keeping cost of replacement fixed at unity

Fig. 3
figure 3

The optimal Z for mailbox experiment keeping cost of replacement fixed at unity

Fig. 4
figure 4

Comparison of expected cost rates corresponding to optimal T, N, Z

As discussed before, the electric power of a dry cell or battery, initially stored by chemical energy, is weakened by continuous oxidation process and is subject to frequent use leading to accumulated damage or energy loss. Similar phenomenon happens in cell-phone battery after each and every recharge. Once fully charged, cell-phone battery looses its energy over time due to normal functionalities of the cell-phone in the switch-on mode and frequent incoming and outgoing calls leading to accumulated damage or energy loss. Bhuyan and Dewanji (2017a) analysed data on 11 identical cell-phone batteries based on failure time (in hours) data and the number of calls (incoming and outgoing) up to the time of failure. We assume that incoming and outgoing calls take place according to a homogeneous Poisson process with the estimated rate \({\hat{\lambda }}=0.29\). We also assume that \(K(t)=A\exp (-Bt)\) and the damages due to successive calls follow iid Gamma distribution. For identifiability, the initial strength A is fixed at 100. See Bhuyan and Dewanji (2017a) for more details. Nevertheless, we consider the estimated scale parameter \({\hat{\theta }}=1.54\) and shape parameter \({\hat{\delta }}=0.193\) of the Gamma damages and \({\hat{B}}=0.041\) to carry out our analysis. The optimal values \({\hat{T}}\) and \({\hat{N}}\) are obtained by minimizing \(\hat{C_{1}}(T)\) and \(\hat{C_{2}}(N)\), respectively, and plotted against \(c_K\) in Figs. 5, 6, keeping the cost of replacement fixed at unity. As expected, \({\hat{T}}\) decreases as \(c_{K}\) increases. A sharp decline is observed for \({\hat{N}}\) up to \(c_{K}=5\). Note that the accumulated energy consumption due to incoming and outgoing calls are not observable. Therefore, we do not provide any optimal replacement strategy based on Z. The expected cost rates corresponding to the replacement strategies based on T and N are plotted against \(c_{K}\) in Fig. 7. It is observed that the optimal strategy based on T is much better compared to that based on N. Now, considering these two design parameters together, the optimal replacement strategy \(({\hat{T}},{\hat{N}})=(73.41\, {\text {h}}, 28)\) and the associated expected cost rate \(1.458\times 10^{-2}\), are obtained by minimizing \({\hat{C}}(T,N,\infty )\) for \(c_K=2\) with \(c_T=c_N=1\).

Fig. 5
figure 5

The optimal T for cell-phone battery experiment keeping cost of replacement fixed at unity

Fig. 6
figure 6

The optimal N for cell-phone battery experiment keeping cost of replacement fixed at unity

Fig. 7
figure 7

Comparison of expected cost rates corresponding to optimal T and N

8 Concluding remarks

The cumulative damage model with strength degradation unlike that with a fixed strength is more common and realistic. However, the replacement problem under such model with decreasing strength has not yet been addressed. The unit is preventively replaced before failure at a scheduled time T, shock number N and a damage level Z whichever occurs first and correctively replaced at failure. Under this replacement policy, we have obtained the expressions for the expected cost rates of replacement at T, N and Z individually, or all taken together. These expressions are not in closed form which makes it extremely difficult to analytically derive the optimum policy. Besides, evaluating the convolutions of the distribution functions itself is a complicated process. In this work, probably for the first time, the computational issues associated with the replacement problem for cumulative damage model with degrading strength has been discussed. We have proposed a simulation algorithm for evaluating the expected cost rates. The method of simulation reduces the computational burden while providing room for a wider range of distributional choices. We have also considered some generalized cases where the damages caused by shocks can either be dependent or independent but not identically distributed. In fact, even for a general (that is, non-renewal) point process modeling for the shock arrivals, the simulation method can be readily implemented as long as the shock arrival process can be simulated.

In many real life scenarios, shocks appear from multiple sources thereby causing damages with different distributions depending on the source of the corresponding shock. One can then ideally model the damage distribution corresponding to a shock to have a mixture distribution. This mixture damage distribution does not generally have closed form expression for convolutions and, therefore, finding optimal replacement strategy is computationally difficult. The proposed algorithm can handle such cases easily. Furthermore, the proposed algorithm is readily generalised to find optimal strategy based on Z and/or N as a function of time, which may be useful in real implementations under dynamic stress-strength interference.

Again, in many real situations, initial strength or its path of deterioration over time is random. Sometimes, deterioration of strength over time is due to various environmental causes changing stochastically at every instant. Another possible scenario is that the strength of the operating unit degrades in a non-monotonic fashion. The unit may go through some auto-repairing process that will cause some ups and downs in its strength (Ebrahimi and Ramallingam 1993). Evaluation of the expected cost rates for replacement in those cases are complicated which adds to the reasons why simulation method should be preferred over other competing methods.

Note that, in view of obtaining the optimal T, N and Z for preventive replacement, one can also determine, at least in theory, the expected time for corrective replacement at failure in the presence/absence of any of the four preventive replacement plans. This may be of interest to the industry concerned in view of the huge cost involved with a system failure. One can design a simulation algorithm for this purpose in line of that described in Sect. 4. The assumption of immediate replacement used in the present work may not be a reasonable one in practice. Incorporation of delayed replacement will require not only the modeling of the distribution of duration of replacement, but also the information on the cost of down time for replacement due to loss of profit, service, etc., during this period of random length. This may be theoretically challenging even when all this information is available. A simulation based method may be the solution in such a complicated scenario.