Introduction

Maintenance costs compose 15–70% of production costs (Alrabghi and Tiwari 2015). This includes capacity losses and repair costs. Researches mostly deal with scheduling problems either without preventive maintenance or with preventive maintenance (PM) when no breakdown can happen (Guo et al. 2007). However, in practical situations, preventive maintenance does not put an end to breakdowns.

In cases where interruptions cause setups, the processes must restart after the interruptions, or the materials are spoiled when the process is not completed, the start time of the preventive maintenance actions must be flexible. A periodic maintenance policy in which the maintenance intervals are flexible is called the periodic flexible maintenance policy. In this paper, the flexible maintenance interval and the periodic flexible maintenance policy are denoted by FI and PFM, respectively. A schematic PFM is illustrated in Fig. 1.

Fig. 1
figure 1

A schematic PFM

Chen addressed the single-machine scheduling problems with given FIs and the mean flow time (Chen 2006b) and makespan (Chen 2008) as criteria. In these two studies, it was shown that the problems of scheduling jobs and maintenance activities within FIs are strongly NP-hard and some efficient heuristics were developed. Chen also presented some mathematical programming formulations for single- and parallel-machine cases with a single flexible maintenance on each machine and total tardiness as criterion (Chen 2006a). Xu and Yin (2011) considered the online version of the single-machine scheduling problem with given FIs. They proved that with makespan criterion, the classical list scheduling algorithm is the best possible approximation algorithm. Jin et al. (2009) tackled a machine scheduling problem regarding random failures. Their problem includes positioning maintenance activities in predefined FIs. They proposed a mixed continuous-discrete genetic algorithm in order to minimize the total weighted expected completion times. Sbihi and Varnier (2008) addressed a machine scheduling problem in which the maximum allowed continuous working time between two maintenance activities is predefined. They proposed a heuristic and a branch and bound algorithm to minimize the maximum tardiness. Cui and Lu addressed the joint single-machine and flexible maintenance scheduling problem with the makespan as criterion. They assumed that there is a maximum allowed interval between any pair of consecutive PMs (Cui and Lu 2017). Wang et al. addressed a machine scheduling problem that the processing time of jobs increases since machine speed degrades and the maintenance activities change the machine back to its normal rate. They assumed that the time between each pair of consecutive maintenance activities cannot be longer than a pre-specified threshold (Wang et al. 2018). Mosheiov et al. designed approximation algorithms for two-machine flow shop and open shop scheduling problems in which a flexible maintenance activity must be performed on one of the machines. They assumed that the start time of the maintenance activity must be within a given interval (Mosheiov et al. 2018). Zhang et al. tackled the problem of scheduling maintenance activities and jobs in a nonidentical parallel-machine environment. They considered the makespan criteria, the expected costs of performing preventive maintenance, and the expected costs of stochastic failures at the same time. They designed a metaheuristic approach in order to find the Pareto optimal solutions (Zhang et al. 2019).

The optimization of the maintenance plans by using simulation is gaining an increasing attention in recent years (Alrabghi and Tiwari 2015). More simplicity and higher flexibility have made simulation models superior to analytical and mathematical models in the maintenance optimization problems (Alrabghi and Tiwari 2015). The most reported approach to simulation in scheduling studies involving maintenance activities is discrete event simulation (Alrabghi and Tiwari 2015). To name a few, the readers are referred to the following studies: selection of operational variables and production schedule (Zhang et al. 2013b), maintenance scheduling at a multi-component production system (Arab et al. 2013), robust and stable production and preventive maintenance scheduling when machine is subject to failure (Cui and Lu 2013), and production and preventive maintenance scheduling when machines’ failure rates are not constant during time (Mokhtari and Dadgar 2015).

According to our extensive literature review, all of the relevant previous studies assume that the timing of the flexible maintenance intervals is known and fixed in advance, while no published paper has studied the problem of establishing PFM which includes determination of time between FIs and length (i.e., flexibility) of each FI. This paper proposes a simulation–optimization approach to this novel problem.

The rest of the paper is structured as follows. Section 2 explains the problem more formally and introduces the notations. Section 3 derives the estimated optimal due dates in any arbitrary sequence of jobs when the PFM is known. These values are used to evaluate the alternative maintenance policies. Section 4 illuminates the proposed simulation–optimization approach. For the optimization part, two mixed continuous-discrete metaheuristic approaches are suggested. Section 5 compares the performance of these algorithms and explains the potential advantage of flexibility in maintenance intervals. Finally, Sect. 6 concludes the paper.

Problem definition

This paper tackles the problem of the periodic flexible maintenance planning. Flexible maintenance refers to the case where maintenance intervals are longer than maintenance action times. A PFM includes two variables: FI length and time between two subsequent FIs. Maintenance policy affects production and delivery performance. Without precise consideration of production environment and its parameters, the maintenance plans neither can be established efficiently nor even evaluated. In this paper, a single-machine production environment is considered. The assumptions are:

The machine is subject to random failures

The time between random failures follows Weibull distribution

Corrective maintenance (CM) is minimal and does not change the age of the machine

Preventive maintenance brings the state of the machine back to the as good as new state

All jobs are available to process at time zero

The minimal repair time follows exponential distribution

The jobs are nonresumable, and the process of the interrupted jobs at failure times must be repeated

Preventive maintenance is not allowed to interrupt the processing of the jobs

The machine is shut down, while corrective or preventive maintenance is being performed

Each preventive maintenance is performed at the latest possible point in its interval

Setup times are included in the processing times and are sequence independent

Main decision variables are:

TFI: length of each FI

δFI: coefficient which determines TFI

TF2F: length of time between the end of one FI to the start of the subsequent FI

TFI is defined as follows:

$$T_{{\rm FI}} = \delta_{\rm FI} M$$

where M is the required time to perform a preventive maintenance action. Figure 2 shows these main decision variables.

Fig. 2
figure 2

The description of the main decision variables

As shown in this figure, when δFI= 1, maintenance intervals are not flexible. It is clear that δFI≥ 1. Moreover, auxiliary decision variables are defined in Fig. 3.

Fig. 3
figure 3

The nomenclature of the auxiliary decision variables

The exponential distribution is widely used to model both the repair times and times between the breakdowns. For instance, the following studies have used the exponential distribution for these purposes: machine scheduling with deteriorating jobs (Cai et al. 2011), flexible flow shop scheduling problem with sequence dependent setup times (Gholami et al. 2009), and job shop rescheduling regarding new job arrivals and machine breakdowns (Zhang et al. 2013a). However in the exponential distribution, failure rate remains constant during time. In such a situation, performing preventive maintenance seems not necessary (Lu 2015). In practical settings however, the failure rate typically increases for instance because mechanical parts wear out. So, preventive maintenance is used to reduce the risk of the unexpected machine failures. Weibull distribution can model time between failures in increasing failure rate situations (Montgomery and Runger 2010). Therefore, in the current paper, it is assumed that the time to perform the corrective maintenance follows an exponential distribution with mean μR. However, similar to (Cui and Lu 2013) and (Jin et al. 2009), the time between failures is modeled by a Weibull distribution.

Figure 4 shows the notations used for sets, indices, and parameters.

Fig. 4
figure 4

The nomenclature of sets, indices, and parameters

Total cost is the most popular and pragmatic objective for maintenance optimization (Alrabghi and Tiwari 2015). The costs are not limited to maintenance actions but also include capacity loss and missed due date penalties (Alrabghi and Tiwari 2015). Thus, the objective function in this paper is the minimization of the total due date and tardiness costs of jobs, the expected costs of the preventive and corrective maintenance, and the undesirability of the uncertainty of the preventive maintenance start times (see expression (1)).

$$\sum\nolimits_{j \in J} {\gamma_{j} d_{j} } + \sum\nolimits_{j \in J} {\beta_{j} E\left[ {T_{j} } \right]} + W_{\text{PM}} E\left[ {N_{\text{PM}} } \right] + W_{\text{CM}} E\left[ {N_{\text{CM}} } \right] + W_{\text{FI}} M\left( {\delta_{\text{FI}} - 1} \right)E\left[ {N_{\text{PM}} } \right]$$
(1)

where E[] is the mathematical expectation function.

Estimation of optimal due dates

In order to calculate job-related costs of any PFM, a reasonable sequence of jobs with determined due dates should be available. In this paper, metaheuristic approaches are used to search for good quality sequences. In any arbitrary sequence of jobs with known PFM, the estimated optimal due dates of jobs are calculated as follows.

Theorem 1

Consider an arbitrary sequence of jobs and a given PFM. Consider the estimation of expectations in expression (1) based on generated time to failures and repair times in simulation runs as follows.

$$\sum\nolimits_{j \in J} {\gamma_{j} d_{j} } + \sum\nolimits_{j \in J} {\beta_{j} \bar{T}_{j} } + W_{PM} \bar{N}_{PM} + W_{CM} \bar{N}_{CM} + W_{FI} M\left( {\delta_{FI} - 1} \right)\bar{N}_{PM}$$
(2)

where\(\bar{X}\)implies for the sample mean of stochastic variable X.

For job j, there are Ns completion times. Suppose that the completion times of job j are sorted in ascending order so that Cj,1 represents the smallest completion time and Cj,Ns represents the largest one. By setting dj equal to \(C_{{k_{j} }}\) where kj is calculated through expression (3), the estimated objective function (2) is minimized.

$$k_{j} = \hbox{max} \,\left( {0\,,\,\left\lceil {{{N_{s} (\beta_{j} - \gamma_{j} )} \mathord{\left/ {\vphantom {{N_{s} (\beta_{j} - \gamma_{j} )} {\beta_{j} }}} \right. \kern-0pt} {\beta_{j} }}} \right\rceil \,} \right)$$
(3)

Proof

Since the maintenance policy and the sequence of jobs are known, \(\bar{N}_{\text{PM}}\), \(\bar{N}_{\text{CM}}\), and δFI are fixed. So, the last three terms of expression (2) are constant and hence do not depend on djs. Further, the due date of each job affects its own costs and is irrelevant to the cost components of other jobs. So, any due date value of job j which minimizes expression (4) is optimal.

$$\,\gamma_{j} d_{j} + \beta_{j} \overline{{T_{j} }} = \gamma_{j} d_{j} + \left( {{{\beta_{j} } \mathord{\left/ {\vphantom {{\beta_{j} } {N_{s} }}} \right. \kern-0pt} {N_{s} }}} \right)\sum\limits_{r = 1}^{{N_{s} }} {T_{j,r} }$$
(4)

Let d * j be such an optimal due date. First, it is proved that d * j  = Cj,m wherein m ∈{0,…,Ns} and Cj,0 = 0. Assume that Cj,m-1 ≤ d * j  ≤ Cj,m, m∈{1,…,Ns}. Let \(\Delta = d_{j} - C_{j,m - 1}\). Hence, Tj,rs are calculated as follows:

$$T_{j,r} = 0,\quad r = 1, \ldots ,m - 1$$
$$T_{j,r} = \left( {C_{j,r} - C_{j,m - 1} } \right) - \Delta ,\quad r = m, \ldots ,N_{s}$$

So, expression (4) can be rewritten as

$$\gamma_{j} \left( {C_{j,m - 1} + \Delta } \right) + \left( {{{\beta_{j} } \mathord{\left/ {\vphantom {{\beta_{j} } {N_{s} }}} \right. \kern-0pt} {N_{s} }}} \right)\sum\limits_{r = m}^{{N_{s} }} {\left( {\left( {C_{j,r} - C_{j,m - 1} } \right) - \Delta } \right)}$$
(5)

Expression (5) is a linear function of Δ. Hence, its minimum occurs either at Δ = 0 or at Δ = CmCm-1 which results in \(d_{j}^{*} = C_{j,m - 1}\) and \(d_{j}^{*} = C_{j,m}\), respectively. In either case, d * j coincides with one of the completion times of job j.

Now, it is shown that \(d_{j}^{*} = C_{{j,k_{j} }}\) where kj is computed through expression (3). This is proved using the small perturbation technique introduced by Panwalkar et al. (1982). Suppose that the optimal due date of job j is located at \(C_{j,m}\). By shifting the due date of job j at amount of Δ units of time to the left, the change in expression (4) will be

$$- \gamma_{j} \Delta + \left( {{{\beta_{j} } \mathord{\left/ {\vphantom {{\beta_{j} } {N_{s} }}} \right. \kern-0pt} {N_{s} }}} \right)\sum\limits_{r = m}^{{N_{s} }} \Delta = - \gamma_{j} \Delta + \left( {{{\beta_{j} } \mathord{\left/ {\vphantom {{\beta_{j} } {N_{s} }}} \right. \kern-0pt} {N_{s} }}} \right)\left( {N_{s} - m + 1} \right)\Delta$$
(6)

By shifting the due date of job j at amount of Δ units of time to the right, the change in expression (4) will be

$$\gamma_{j} \Delta - \left( {{{\beta_{j} } \mathord{\left/ {\vphantom {{\beta_{j} } {N_{s} }}} \right. \kern-0pt} {N_{s} }}} \right)\sum\limits_{r = m + 1}^{{N_{s} }} \Delta = \gamma_{j} \Delta - \left( {{{\beta_{j} } \mathord{\left/ {\vphantom {{\beta_{j} } {N_{s} }}} \right. \kern-0pt} {N_{s} }}} \right)\left( {N_{s} - m} \right)\Delta$$
(7)

Since dj is optimal, expressions (6) and (7) are both nonnegative which results that

$$m = \hbox{max} \left\{ {0,\left\lceil {{{N_{s} \left( {\beta_{j} - \gamma_{j} } \right)} \mathord{\left/ {\vphantom {{N_{s} \left( {\beta_{j} - \gamma_{j} } \right)} {\beta_{j} }}} \right. \kern-0pt} {\beta_{j} }}} \right\rceil } \right\}.$$

Description of proposed simulation–optimization approach

In this paper, a simulation–optimization approach is used to establish PFMs. The optimization part tries to find better solutions in each iteration. The simulation part is used to estimate the quality of the generated solutions. This approach is depicted in Fig. 5.

Fig. 5
figure 5

Flowchart of the simulation–optimization approach to PFM

Every solution is encoded as a string with two parts. As shown in Fig. 6, the first part indicates the sequence of jobs, while the second part determines the maintenance policy. Note that without knowing the sequence of jobs, the calculation of expression (2) is impossible.

Fig. 6
figure 6

Encoding of solutions as a two-section array

The first part of each solution array is combinatorial, while the second part is continuous. Hence, at the optimization part, a mixed continuous-discrete algorithm is required. Such an encoding has been used in (Jin et al. 2009) for an integrated job sequencing and scheduling preventive maintenance within predefined flexible intervals. Sections 4.1 and 4.2 describe two proposed continuous-discrete metaheuristic approaches.

Mixed continuous-discrete ant colony optimization

Ant colony optimization (ACO) has been applied satisfyingly to various academic and practical combinatorial optimization problems (Dorigo and Stützle 2004). For an instance, which is close to the problem under study, Berrichi et al. (2010) applied ACO to a bi-objective production and maintenance scheduling problem. In original discrete ACO, the search is guided by laying more pheromone on components of good quality solutions. However, Socha and Dorigo (2008) extended ACO for continuous domains. In this variation, search is directed by defining an archive solution set. This set is used to direct the generation of new continuous solutions based on normal probability density functions (PDFs). Thus, in mixed ACO (denoted by ACOco-cn), both approaches are joined. For the discrete part, MAX–MIN rank-based ant system is used. The pseudocode of ACOco-cn is shown in Fig. 7.

Fig. 7
figure 7

Pseudocode of ACOco-cn

Mixed continuous-discrete particle swarm optimization

Particle swarm optimization (PSO) directs search of each particle (i.e., a solution) by changing its position toward its best experience as well as the best position of overall swarm (Poli et al. 2007). PSO was initially proposed to address continuous domain problems (Kennedy and Eberhart 1995). However, many successful applications of PSO for discrete domains have been reported. For instance, Kashan and Karimi proposed a discrete PSO to minimize makespan in a parallel-machine scheduling problem (Kashan and Karimi 2009). Following Liu et al. (2010) and Tasgetiren et al. (2004), in the current paper, a ranked-order-value rule is used to transform the array of continuous position values to a feasible sequence of jobs. In this approach, the jobs are sequenced according to their position values so that the job with minimum position is sequenced as the first job. Thus, in mixed PSO (denoted by PSOco-cn), both approaches are combined. As shown in Fig. 8, the PSOco-cn algorithm is the same as the original PSO except that the discrete part is interpreted using the ranked-order-value rule.

Fig. 8
figure 8

Pseudocode of PSOco-cn

Numerical analysis

In this section, first the performance of ACOco-cn and PSOco-cn is compared and the selected one is suggested for application. An important question is that whether flexibility in maintenance intervals can reduce total costs significantly. Obviously, the answer depends on problem parameters. However, in this section, the average effect of flexibility on total costs over a wide range of parameters is presented. In the numerical analysis, the parameters of the problem are randomly selected from the following ranges.

$$\begin{aligned} & n \in \left[ {20,100} \right],p_{j} \in \left[ {20,100} \right],\delta_{\text{FI}}^{\text{UB}} \in \left[ {1,4} \right],\gamma_{j} \in \left[ {1,5} \right],\beta_{j} \in \left[ {1,5} \right],W_{\text{PM}} \in \left[ {1,5} \right],W_{\text{CM}} \in \left[ {3,10} \right],W_{\text{FI}} \in \left[ {1,3} \right], \\ & M \in \left[ {10,50} \right],\mu_{R} \in \left[ {10,100} \right],K_{F} \in \left[ {200,1000} \right],\lambda_{F} \in \left[ {2,6} \right] \\ \end{aligned}$$

Further, simulation sample size is Ns= 1000. Moreover, the stopping criterion for every simulation–optimization process is set at 10 s.

Selection of optimization algorithm

To compare the performance of the proposed metaheuristic approaches, 50 instances of five problem sizes were generated randomly using the abovementioned parameter ranges. The following expression is used to compare the total costs:

$${\text{difference}} = \frac{{{\text{TC}}_{\text{PSO}} - {\text{TC}}_{\text{ACO}} }}{{\left( {{{\left( {{\text{TC}}_{\text{PSO}} + {\text{TC}}_{\text{ACO}} } \right)} \mathord{\left/ {\vphantom {{\left( {{\text{TC}}_{\text{PSO}} + {\text{TC}}_{\text{ACO}} } \right)} 2}} \right. \kern-0pt} 2}} \right)}}$$
(8)

where TC is the estimated total cost calculated through expression (2).

Figure 9 shows the results of the paired-t hypothesis test of comparing total costs by Minitab software for all problem sizes.

Fig. 9
figure 9

Paired-t hypothesis test for differences between total costs

The minimum values of all confidence intervals are positive, so ACOco-cn outperforms PSOco-cn. Figure 10 shows the confidence intervals for percentage of difference in total costs.

Fig. 10
figure 10

Confidence intervals for differences in total costs

Evaluation of the effect of flexibility on total costs

The question that this subsection tries to answer is to what extent the total costs are changed if the flexibility is not allowed in the start times of PM actions. Although the answer depends on the problem parameters, the wide range of parameters can represent an average view.

Due to the superiority of ACOco-cn over PSOco-cn, the former is selected for proceeding this numerical study. Inflexible maintenance policy occurs when the length of the preventive maintenance intervals equals the time required to perform a PM. According to the notations, this requires that δFI= 1 which is achieved by setting \(\delta_{\text{FI}}^{U} = 1\).

Several problem instances are randomly generated with the parameters ranges introduced at the beginning of this section. Each problem is solved by ACOco-cn two times: first with allowed flexibility (in which \(\delta_{\text{FI}}^{U} = 4\)) and then with forbidden flexibility (i.e., \(\delta_{\text{FI}}^{U} = 1\)). Figure 11 depicts the percentage of reduction in total costs and best found flexibility coefficient (δFI) for 100 randomly generated instances.

Fig. 11
figure 11

Percentages of reduction in total costs resulted from allowed flexibility in maintenance intervals

As shown in Fig. 12, on this wide range of parameters, flexibility can cut down 2% of total costs on average.

Fig. 12
figure 12

Paired-t test for cost reduction as a result of flexibility

Concluding remarks

Machine parts usually wear out while working. Hence, preventive maintenance is required in order to reduce the risk of unexpected failures. When the jobs are nonresumable, maintenance intervals should be flexible in order to prevent undesirable machine idle times. All the previous studies considering flexible maintenance intervals assume that the flexible maintenance intervals are given. For the first time, this paper presents a holistic approach to determine the time between flexible maintenance intervals and the length of each maintenance interval in order to minimize the maintenance and production costs. Two mixed continuous-discrete metaheuristic approaches equipped with discrete event simulation were proposed. Numerical studies were used to compare the quality of solutions found by proposed approaches. Finally, the average possible improvement of total costs as a result of flexibility of maintenance intervals on a wide range of parameters was reported.