Simultaneous optimization of design and maintenance for systems using multi-objective evolutionary algorithms and discrete simulation

Cacereño, Andrés; Greiner, David; Galván, Blas

doi:10.1007/s00500-023-08922-2

Simultaneous optimization of design and maintenance for systems using multi-objective evolutionary algorithms and discrete simulation

Application of soft computing
Open access
Published: 21 July 2023

Volume 27, pages 19213–19246, (2023)
Cite this article

Download PDF

You have full access to this open access article

Soft Computing Aims and scope Submit manuscript

Simultaneous optimization of design and maintenance for systems using multi-objective evolutionary algorithms and discrete simulation

Download PDF

1011 Accesses
2 Citations
Explore all metrics

Abstract

When projecting and building new industrial facilities, getting integrated design alternatives and maintenance strategies are of critical importance to achieve the physical assets optimal performance, which is needed to be competitive in the actual global markets. Coupling Evolutionary Algorithms with Discrete Event Simulation has been explored both in relation to systems design and their maintenance strategy. However, it was not simultaneously considered when both the corrective and the preventive maintenance—consisting of achieving the optimum period of time to carry out a preventive maintenance activity—are taken into account before being considered by the authors of the present paper. This work couples Multi-objective Evolutionary Algorithms with Discrete Event Simulation in order to enhance the knowledge and efficiency of the methodology presented, which consists of exploring and optimizing simultaneously systems design alternatives and their preventive maintenance strategies. The aim consists of finding the best set of non-dominated solutions by using the system availability (first maximized objective function) with taking into consideration associated operational cost (second minimized objective function), while automatically selecting the system devices. Each solution proposed by the Multi-Objective Evolutionary Algorithm is analyzed by using Discrete Event Simulation in a procedure that looks at the effect of including periodic preventive maintenance activities all along the mission time. An industrial application case study is solved, and a comparison of the performance of five state-of-the-art and three more recently developed Multi-objective Evolutionary Algorithms is handled; moreover, the gap in the literature reviewed about the analysis regarding the effect of the discrete event simulation sampling size is faced with useful insights about the synergies of Multi-objective Evolutionary Algorithms and Discrete Event Simulation. Finally, the methodology is expanded to more complex systems which are successfully solved.

Solving Multi-objective Optimal Design and Maintenance for Systems Based on Calendar Times Using GDE3

Multi-Objective Optimal Design and Maintenance for Systems Based on Calendar Times Using MOEA/D-DE

Solving Multi-objective Optimal Design and Maintenance for Systems Based on Calendar Times Using NSGA-II

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The main target of companies that base their production on the effectiveness of their repairable systems consists of maximizing the availability of such systems. When a repairable system is not available, the system enters an unproductive phase (Boliang et al 2019) where not only resources are not generated, but also they are consumed until recovering the system’s available state. In order to enhance systems availability, techniques such as including redundant devices or considering the schedule of preventive maintenance tasks have been widely analyzed. However, the simultaneous consideration of both techniques has not received sufficient attention.

In previous studies (Cacereño et al 2021a, b), the authors of the present paper considered the simultaneous optimization of design (this consists of including redundant devices) and maintenance (this consists of including preventive maintenance tasks) of a system considered as a case study. It allows finding particularized maintenance strategies in relation to specific alternatives of the structural design, which avoid designing maintenance strategies after deciding such structural designs. This has the result of achieving a compact maintenance strategy, which is absolutely customized to the structural design. The authors coupled the Non-dominated Sorting Genetic Algorithm II (NSGA-II) (Deb et al 2002) and Discrete Event Simulation (DES) in order to solve such a problem. Each solution proposed by the Multi-objective Evolutionary Algorithm is evaluated through Discrete Event Simulation; a technique used to describe the behavior of the system by building the system Functionability Profile, a concept presented by Knezevic (1996). Both availability and operational cost were considered as objective functions from a multi-objective point of view. As an Optimization Evolutionary Algorithm, the NSGA-II method is considered a standard state-of-the-art solver to find feasible solutions to real-world multi-objective optimization problems (Emmerich and Deutz 2018).

In order to enhance the knowledge and efficiency of the methodology, such a methodology is thoroughly explored, exploited and analyzed in the present paper. The contributions are presented as follows:

Being the resolution and optimization of real-world problems with evolutionary algorithms/metaheuristics a current hot topic (Osaba et al 2021), it is still a research focus the analysis of state-of-the-art multi-objective algorithms for solving real-world engineering problems, as the one in this research. Therefore, as a case study, the methodology is applied to a containment spray injection system for which five state-of-the-art Multi-objective Evolutionary Algorithms (taking into account the state-of-the-art algorithms of reference of the three selection type paradigms: dominance-based selection, indicator-based selection and aggregation-based selection) are used. The target consists of analyzing the efficacy of such state-of-the-art Multi-objective Evolutionary Algorithms where two crossover strategies are taken into account, the Simulated Binary Crossover (SBX) (Deb and Agrawal 1995) and the Differential Evolution (DE) (Storn and Price 1997). Their performances are compared in detail by using the Hypervolume and statistical significance tests (including post-hoc analysis when necessary), demonstrating the success of achieving a set of non-dominated solutions and determining the features of the methods displaying the best performance. Next, the performance results are compared with some recently developed algorithms. The benefit of using the methodology with the indicator-based selection algorithm SMS-EMOA is demonstrated.
The methodology consists of carrying out an unique Discrete Event Simulation in order to emulate the behavior of the system all along its mission time. When Evolutionary Algorithms have been coupled with Discrete Event Simulation, some authors use several discrete simulations in order to emulate the behavior of the system. On the other hand, some authors employ an unique Discrete Event Simulation. Therefore, a gap exists in order to determine the effect due to such a circumstance. In the present paper, the power of the Multi-objective Evolutionary Algorithms is used to minimize simultaneously unavailability and operational cost while considering automatic design of the system. The unique discrete simulation procedure is further compared in the case study with a Monte Carlo Simulation (analyzed sampling sizes of 10, 100 and 1000) where the minimal extreme value is taken as representative of the achieved distribution (either: minimal unavailability, minimal operational cost, or minimal weighted unavailability–operational cost—equivalent to minimal Manhattan distance—were considered). An hypothesis test is shown where the proposed methodology was firstly ordered and statistically significant differences were found when equivalent total number of fitness evaluations were run. Insights about the competitiveness and computational efficiency of the methodology were given.
The methodology is applied to two more complex industrial systems (with up to 36 devices), supporting its scalability and generalization. Evidences of the benefit of automatic selection of devices were given when compared versus systems with mandatory all devices chosen, and also two types of chromosome codifications related with those automatic selection of devices were compared in the biggest industrial application case showing benefits in the performance.

Summarizing the main contributions of this manuscript, the authors have developed a thorough study in which the multi-objective problem of simultaneous optimization of the availability of the system and the cost of the system by defining the maintenance strategy (through their maintenance times) and alternative designs (through automatically defining system devices) is handled, coupling discrete event simulation with a single sample size and multi-objective evolutionary algorithms: a) a study of the performance of several state-of-the-art multi-objective evolutionary algorithms is presented (8 algorithms); b) a study of the influence of sampling size is presented, demonstrating the benefits of the here proposed single sample size case; c) a set of three reliability problem applications with increasing complexity is solved with the abovementioned proposal showing its capability and scalability.

The paper is organized as follows: Sect. 2 explores the related literature. Section 3 summarizes the methodology and Sect. 4 introduces the multi-objective optimization by using Evolutionary Algorithms. Sect. 5 presents the case study. Its results are presented and discussed in Sect. 6. In Sect. 7, the methodology is applied to more complex systems, and finally, Sect. 8 introduces the conclusions.

2 Literature review

Optimization is useful in practically all areas of life. Engineers must find optimal solutions to achieve the best performance when they are solving engineering problems. Optimization has been widely used in many fields of engineering (aeronautical, civil engineering, electrical networks, transport, logistics, etc.), and its limits are in the engineer’s fancy. Hence, when solving complex problems is needed, the employment of optimization methods is a suitable course of action. Optimization is particularly useful when the number of potential solutions is high and achieving the best solution is very difficult. Instead of the best solution, some sufficiently good solutions can be obtained (Simon 2013). Systems reliability optimization has been widely studied, however, because of technological advances, increases in system complexity and consumer demand (among other aspects), it is an ever-changing and developing problem (Coit and Zio 2019).

One strategy commonly used in order to improve the availability of repairable systems consists of adding redundant devices. Including a redundant device in a system increases the number of alternative paths so its probability of keeping on available state is improved. Including redundant devices in a system requires the modification of its design. Several approaches have been used in order to achieve optimal designs of systems such as Dynamic Programming (Fyffe et al 1968), Integer Programming (Misra and Sharma 1991) or Nonlinear Programming (Tillman et al 1977). However, the use of Evolutionary Algorithms has been taking importance due to their power when multiple objectives must be handled. In this sense, several authors considered using Genetic Algorithms (Zoulfaghari et al 2014; Ghorabaee et al 2015), the Non-dominated Sorting Genetic Algorithm II (NSGA-II) (Greiner et al 2003; Kayedpour et al 2017; Sharifi et al 2019; Chambari et al 2021), the Ant Colony Algorithm (Zhao et al 2007), the Artificial Bee Colony Algorithm (Jiansheng et al 2014) or Particle Swarm Optimization (Samanta and Basu 2018).

In order to improve the availability of repairable systems, another one strategy consists of applying preventive maintenance tasks. The main reasons why a continuous operation system stops are a failure (after that, a recovery time in relation to corrective maintenance tasks is required) or a scheduled shutdown to perform preventive maintenance tasks. When a preventive maintenance task is performed, the unproductive phase is better controlled than when repairs have to be performed because of a failure. Therefore, it is important to identify the optimum moment at which to stop the system and perform a preventive maintenance task. This should be done before the occurrence of the failure but as close as possible to such a failure, in order to maximize the total system available time. Various approaches have been employed in order to schedule preventive maintenance tasks such as Integer Programming (Kralj and Petrovic 1995), Mixed Integer Linear Programming (Charest and Ferland 1993; Fathollahi-Fard et al 2021) or Evolutionary Algorithms, where several authors employed Genetic Algorithms (An et al 2020; Bressi et al 2021; Wang et al 2021), the Non-dominated Sorting Genetic Algorithm II (NSGA-II) (Piasson et al 2016; Zang and Yang 2021), the Ant Colony Algorithm (Berrichi et al 2010) or the Bee Colony Algorithm (Li et al 2014). In all of the above scenarios it is possible to improve the availability of repairable systems either through the design modification or the maintenance strategy. However, there will likely be consequences of such a modification or strategy in terms of operational costs.

Although both the design and preventive maintenance strategy of a system have an influence in its performance and are affected by each other, the joint management of these strategies has not received significant attention yet. There are relatively few works in which the design and preventive maintenance strategy for technical systems have been jointly optimized. Levitin and Lisnianski (1999) presented the first formulation of the joint redundancy and maintenance optimization problem for multi-state systems by using a Genetic Algorithm as an optimization technique. Nourelfath et al (2012) formulated a joint redundancy and imperfect preventive maintenance planning optimization model based on Markov processes and universal moment generating function, in order to evaluate availability and cost for multi-state systems by using Genetic Algorithms and Tabu search. Bei et al (2017) presented an approach to designing the configuration of a multiple component system and determining a maintenance plan with uncertain future stress exposure by using a two-stages stochastic programming model. However, when a new system is designed, the Discrete Event Simulation arises as a powerful modeling technique, which allows complex systems to be analyzed much more accurately due to a more realistic representation of their behavior in practice.

Using Evolutionary Algorithms coupled with Discrete Simulation has produced good results in the reliability field. Regarding the design optimization, Cantoni et al (2000) presented an approach which couples a Single-objective Evolutionary Algorithm and the Monte Carlo simulation for optimal plant design. Marzaguerra et al (2005) proposed a similar approach from a multi-objective perspective where the presence of uncertainty is considered. Regarding the systems optimal maintenance through the preventive maintenance strategy, Tan and Kramer (1997) proposed a general framework for preventive maintenance optimization, which combines the Monte Carlo simulation with a Genetic Algorithm and Oyarbide-Zubillaga et al (2008) determined the optimal preventive maintenance frequencies for multi-equipment systems by using Discrete Event Simulation and the Non-dominated Sorting Genetic Algorithm II (NSGA-II). Recently, Azevedo et al (2020) proposed a multi-objective approach to model a replacement policy problem applicable to equipment that undergoes failures with several levels of severity. To do that, they used a Multi-objective Genetic Algorithm coupled with Discrete Event Simulation. However, although coupling Multi-objective Evolutionary Algorithms and Discrete Event Simulation presented good performance when the optimization of the design considered corrective maintenance tasks (Lins and Droguet 2009; Lins and López 2011), a few studies that consider preventive maintenance tasks have been developed. A study to mention was developed by Galván et al (2007), where a methodology for Integrated Safety System Design and Maintenance Optimization from a bi-level evolutionary process was developed. They coupled the Non-Sorting Genetic Algorithm II and Monte Carlo Simulation in order to achieve the optimum design and surveillance test intervals. Therefore, no studies covering both corrective and preventive maintenance—consisting of achieving the optimum period of time to carry out preventive maintenance tasks—were developed before the proposed by the authors of the present paper ( Cacereño et al 2021a, b). A deeper study of the methodology is shown in the present paper. In this case, the performance of using different types of Multi-objective Evolutionary Algorithms is thoroughly studied. Furthermore, the influence of the sampling size is analyzed regarding the number of simulations to conduct, in order to describe the behavior of the system. Finally, the methodology is applied to more complex systems to demonstrate its flexibility.

3 Methodology and description of the proposed model

3.1 Extracting availability and cost from the functionability profile

Reliability is an intrinsic characteristic of systems and it is related to the way in which they have been designed and built. Maintainability can be intrinsic when it is related to design conditions (a piece that is difficult to access will be more complex to maintain) or extrinsic, e.g., when it is related to available spares or human teams who must perform maintenance operations. Whereas Reliability is a concept in relation to the Time to Failure, Maintainability is a concept in relation to the Time To Repair. In availability, these two concepts are related to define the way in which the system is able to achieve the function for which it was designed, over a period of time. Availability can be computed through the unconditional failure w(t) and repair v(t) intensities, as was explained by Andrews and Moss (2002). A device which is continuously subjected to the failure and repair process, presents a failure probability in the time interval $[t,t+dt)$ given it was working at $t=0$ represented by w(t)dt. Two situations lead to failure in $[t,t+dt)$: The device works continuously from 0 to t until the first failure in $[t,t+dt)$ (the probability of this is given by f(t)dt, where f(t) is the failure density function) or the device fails in $[t,t+dt)$ but this is not the first failure. In this second situation, the device has experienced one or more repairs prior to the failure and the last one was carried out in the interval $[u,u+du)$ (the probability of this is given by $v(u)du \times f(t-u)dt$). The repair time u can occur at any point between 0 and t and so adding all possibilities gives the Eq. 1.

$$\begin{aligned} w(t)dt = f(t)dt + \int _{0}^{t}f(t-u)v(u)du\,dt \end{aligned}$$

(1)

Repair can only occur in $[t,t+dt)$ in case of failure has occurred at some interval $[u,u+du)$ prior to t. The probability of this is $g(t-u)dt \times w(u)du$, where w(u)du is the probability of failing in $[u,u+du)$ given it was working at $t=0$, $g(t-u)dt$ is the probability of repair in $[t,t+dt)$ given it has been in failed state since last failure in $[u,u+du)$ and it was working at $t=0$ and knowing that g(t) is the repair density function. Since u can vary between 0 and t, Eq. 2 can be obtained.

$$\begin{aligned} v(t)dt = \int _{0}^{t}g(t-u)w(u)du\,dt \end{aligned}$$

(2)

Canceling dt from Eqs. 1 and 2, the simultaneous integral equations defining the unconditional failure and repair intensities, which are shown in Eqs. 3, are obtained.

$$\begin{aligned} \begin{aligned}&w(t) = f(t) + \int _{0}^{t}f(t-u)v(u)du \\&v(t) = \int _{0}^{t}g(t-u)w(u)du \end{aligned} \end{aligned}$$

(3)

The opposite of availability (A(t)) is unavailability (Q(t)). The Eq. 4 shows how to compute Q(t) from Eq. 3.

$$\begin{aligned} Q(t) = \int _{0}^{t}[w(u)-v(u)]du \end{aligned}$$

(4)

When a device has exponential failure and repair intensities (constant failure and repair rates), its availability can be found through the solutions of Eqs. 3 for w(t) and v(t), which can be carried out by using Laplace transforms. In this case, the Eq. 5 is obtained so availability can be computed by using the Mean Time To Failure (MTTF) and the Mean Time To Repair (MTTR).

$$\begin{aligned} A = \frac{{{\textrm{MTTF}}}}{{{\textrm{MTTF}}} + {\textrm{MTTR}}} \end{aligned}$$

(5)

When a device has not exponential failure and/or repair intensities, finding the device availability through the solutions of Eqs. 3 for w(t) and v(t) is really hard so a simulation approach can be suitable. In this paper, the system availability is characterized in a simulation approach by using the system Functionability Profile. Technical systems are developed to fulfill specific functions so “functionality” is an important feature which is related to the system’s capacity to achieve its mission. The system must not only achieve its mission but also satisfy a set of requirements called “satisfactory features” (e.g., volume or flow). Moreover, it is necessary to specify the “operation conditions” under which the system must be able to operate (e.g., temperature or humidity). These three aspects come together under the umbrella concept of “Functionability Profile” introduced by Knezevic (1996), which is defined as the inherent capacity of the system to achieve the required function under specific features when are used as specified. In general, all systems achieve their function at the beginning of the their lives. However, irreversible changes take place over time and variations in the system behavior occur. The deviation of the variations in relation to the satisfactory features reveals the occurrence of the system failure which causes a transition from the operation state to the failure state. After failing, if the system is repairable, its capacity to fulfill the required function can be recovered through recovery activities or corrective maintenance.

Aside from corrective maintenance activities, systems can require additional tasks to maintain them in operation. These are generally less complex and are called preventive maintenance activities or maintenance prior to failure. From the Functionability Profile’s point of view, the states of a repairable system fluctuate between operation and failure over the course of the mission time. The shape of cited fluctuations is called the Functionability Profile as it tracks the states through the overall mission time. An example of a Functionability Profile is shown in Fig. 1. Functionability Profiles depend on the operation times (either Time To Failure or Time To Start following a scheduled Preventive Maintenance activity) $(t_{f1}, t_{f2},..., t_{fn})$ and recovery times (either Time To Repair after the failure or Time To Perform a Preventive Maintenance activity) $(t_{r1},t_{r2},..., t_{rn})$. When the Functionability Profile is set to logical 1, it is considered that the device is operating. Conversely, when the Functionability Profile is set to logical 0, it is considered that the device is stopped (it may be being repaired after a failure or maintained). It is possible to deduce from Fig. 1 that, after an operation time, a recovery time is needed.

Users need to be sure that the system Functionability Profile satisfies the desired function. Hence, users are interested in the shape of the system Functionability Profile with special emphasis on the where the system is available. As previously mentioned, availability is tightly related to Functionability Profiles. Availability is characterized by the relationship between the system operation times and the total mission time. In the present paper, some considerations in relation with the model to compute the availability are taken as follows:

The state of each device at any point of time is either one of the “operation” or “failed” state.
The devices are independent of each other.
A repair starts just after the failure of the device.
A repair returns the device to the as-good-as-new state.

The system will be able to fulfill its purpose during $t_f$ times, so it is possible to evaluate its availability at mission time by using Eq. 6.

$$\begin{aligned} A = \frac{\displaystyle \sum _{i=1}^n{t_{fi}}}{\displaystyle \sum _{i=1}^n{t_{fi}+\sum _{j=1}^mt_{rj}}} \end{aligned}$$

(6)

where n is the total number of operation times, $t_{fi}$ is the i-th operation time in hours (Time To Failure or Time to Start following a scheduled Preventive Maintenance Activity), m is the total number of recovery times and $t_{rj}$ is the j-th recovery time in hours (due to repair or preventive maintenance activity). Therefore, availability is a variable with value between 0 (minimum availability) and 1 (maximum availability), so adding availability and unavailability the value of 1 is obtained.

Operation and recovery times behave as random variables so they are not previously known. If a historical record of both times is compiled and a statistical analysis is performed, operation and recovery times could be defined as probability density functions through their respective parameters. Those functions can arise from a specific typology (e.g., Exponential, Weibull, Normal). There are several Data Bases on the market (OREDA 2009; CCPS 1998) that supply characteristic parameters for the referred functions, so operation and recovery times can be characterized for different failure modes of system devices.

When systems are operating, earnings are generated in relation to their availability. Conversely, when systems have to be recovered, economic cost is generated to return the operation status. In this paper, the economic cost is a variable directly associated with recovery times. Such recovery times are related to corrective and preventive maintenance activities; quantities can be computed by Eq. 7.

$$\begin{aligned} C = \sum _{i=1}^{q}{cc_{i}}+\sum _{j=1}^{p}{cp_{j}} \end{aligned}$$

(7)

where C is the system operational cost quantified in economic units, q is the total number of corrective maintenance activities, $cc_i$ is the cost due to the i-th corrective maintenance activity, p is the total number of preventive maintenance activities and $cp_j$ is the cost due to the j-th preventive maintenance activity. Maintenance activity costs depend on the respective fixed quantities per hour (corrective and preventive) so the global cost is directly related to recovery times. Preventive maintenance activities are scheduled shutdowns, so recovery times will be shorter and more economical than recovery times due to corrective maintenance activities (for reasons such as willing and trained human personnel, or available spare parts). If it is desirable to avoid long recovery times, it will be necessary to carry out preventive maintenance activities ideally just before the failure but otherwise as close as possible to it. Therefore, the system normal Functionability Profile must be modified (i.e., to modify the system Functionability Profile, which represents the continuous cycle failure and recovery after corrective maintenance over the time mission) including preventive maintenance activities for system devices, with the aim of maximizing the system availability and minimizing costs due to recovery times.

3.2 Building functionability profiles to compute the objective functions

In order to optimize the system design and it preventive maintenance strategy, it will be necessary to characterize both the system availability and the cost from the system Functionability Profile. The system Functionability Profile is built from the Functionability Profiles of the system devices which are built by using Discrete Event Simulation. To this end, information about how to characterize the operation $(t_{fi})$ and recovery $(t_{rj})$ times is needed. Operation times are composed by Times To Failure (TF) and Times To Start following a Preventive Maintenance activity (TP), while recovery times are composed by Times To Repair after failure (TR) and Times To Perform a Preventive Maintenance Activity (TRP). In the present paper, the failure modes for each device are grouped in a unique failure mode. The Functionability Profiles for system devices are built by generating random times, which are obtained from the respective probability density functions for times to failure (TF) and times to repair (TR). Such Functionability Profiles must be modified by including Times To Start following a Preventive Maintenance task (TP) and Times To Perform a Preventive Maintenance task (TRP), which are characterized based on the limits supplied by expert judgment. In order to build the Functionability Profile of a device, each Time To Failure (TF) generated for the device must be compared with the Time To Start following a scheduled Preventive Maintenance activity (TP) related to such a device. This information is identified via each solution provided by the Multi-objective Evolutionary Algorithm (each individual of the population). Depending on what happens before, the failure or the Preventive Maintenance activity, a Time To Repair (TR) or a Time To Perform a Preventive Maintenance activity (TRP) will be used, respectively, in order to complete such a section of the Functionability Profile. The process (shown both in Fig. 2 and in pseudo-code Algorithm 1) is explained below:

1.
System mission time (life cycle) must be fixed and then, the process continues for each device.
2.
The device Functionability Profile (PF) must be initialized.
3.
The Time To Start following a scheduled Preventive Maintenance activity (TP) proposed by the Multi-objective Evolutionary Algorithm (previously limited by the minimum $(TP_{min})$ and the maximum $(TP_{max})$ set value) is extracted from the respective decision variable of the individual of the population evaluated and a Time To Perform a Preventive Maintenance activity (TRP) is randomly generated, within the minimum $(TRP_{min})$ and maximum $(TRP_{max})$ previously fixed limits.
4.
An operation Time To Failure (TF) is randomly generated within the minimum $(TF_{min})$ and maximum $(TF_{max})$ previously fixed limits with reference to the failure probability density function related to the device.
5.
If $TP < TF$, the preventive maintenance activity is performed before the failure. In this case, as many logical “ones” as TP units followed by as many logical “zeros” as TRP units are added to the device Functionability Profile. Each time unit represented in this way (both as a logical “one” and as a logical “zero”) is equivalent to one hour of real time. In this case, the cost is computed by multiplying (TRP) and the preventive maintenance cost. Finally, it is added to the global preventive maintenance cost $(cp_{j})$.
6.
If $TP>TF$ , the failure occurs before carrying out the preventive maintenance activity. In this case, attending to the repair probability density function related to the device, the Time To Repair (TR) is randomly generated, within the minimum $(TR_{min})$ and maximum $(TR_{max})$ previously fixed limits. Then, as many logical “ones” as TF units followed by as many logical “zeros” as TR units are added to the device Functionablity Profile. Each time unit represented in this way (both as a logical “one” and as a logical “zero”) is equivalent to one hour of real time. In this case, the cost is computed by multiplying (TR) and the corrective maintenance cost. Finally, it is added to the global corrective maintenance cost $(cc_{i})$.
7.
Steps 4 to 6 must be repeated until the end of the device mission time.
8.
Steps 2 to 7 must be repeated until the Functionability Profile has been built for all devices.
9.
After building all the Functionability Profiles, the system Functionability Profile is made by referring to the logical structure of the system. Several techniques might be used to do this depending on the complexity of the system, such as logical operators or fault tree.

Once the system Functionability Profile is built, the values of the objective functions can be computed by using both Eq. 6 (showing the system availability in relation to the time in which the system is operating and being recovered) and Eq. 7 (showing the system operational cost depending on the cost of the time units in relation to the development of corrective or preventive maintenance).

4 Multi-objective evolutionary optimization approach

The optimization methods that are used in this paper belong to the Evolutionary Algorithms (EA) paradigm. This kind of method uses a population of individuals with a specific size. Each individual is a multidimensional vector called a chromosome, which represents a possible candidate solution to the problem. The vector components are called genes or decision variables. Extended information on Evolutionary Optimization Algorithms was supplied by Simon (2013). In the present paper, a real-world engineering multi-objective problem is afforded, where a set of non-dominated solutions constitutes the set of equally optimum final designs. Evolutionary algorithms are population-based search methods that have been established as state-of-the-art methods to solve those kind of design multi-objective optimization problems. See, e.g., reference books as Coello et al (2007) and Deb (2001), or more recent scientific works as Coello (2015) and Emmerich and Deutz (2018). Among their advantages are that they are stochastic global optimizers, no requirements are requested to the fitness function, and they are able to attain the whole non-dominated set of solutions in a single run; among their disadvantages the most critical is the necessity to execute a high number of fitness function evaluations (at least in the order of thousands). In this work, each individual in the population consists of a real numbers string in which the system design alternatives and the periodic times to start a preventive maintenance activity related to each device included in the system design are codified as it is detailed later, in the case study scenario. Optimization problems can be minimized or maximized for one or more objectives. In most cases, real-world problems requires optimizing various objectives at the same time and these objectives are frequently in conflict with each other. These problems are called “multi-objective” problems and their solutions arise from a solutions set that represents the best compromise among the objectives (Pareto-optimal set) ( Coello 2015; Emmerich and Deutz 2018). These kind of problems are described by Eq. 8 (considering a minimization problem) ( Simon 2013).

$$\begin{aligned} \displaystyle \min _xf(x)= \displaystyle \min _x[f_1(x), f_2(x),..., f_k(x)] \end{aligned}$$

(8)

In optimization, when problems are defined by this way, the k functions must be simultaneously minimized. In the present paper, the objective functions are, on the one hand, the system availability (objective function to maximize and mathematically expressed by Eq. 6, it is similar to minimize unavailability) and, on the other hand, the operational cost (objective function to minimize and mathematically expressed by Eq. 7). The optimization problems usually are subjected to some constraints. In this case, the problem is subjected to constraints related to maximum and minimum values for Times To Failure, Times To Repair, Times To Start following a scheduled Preventive Maintenance activity and Times To Perform a Preventive Maintenance activity. Classical optimization methods suggest converting the multi-objective optimization problem to a single-objective optimization problem by emphasizing one particular Pareto-optimal solution at time. Due to their ability to find multiple Pareto-optimal solutions in one single simulation run, a number of Multi-objective Evolutionary Algorithms (MOEAs) were subsequently suggested. Nowadays, Multi-objective Evolutionary Optimizers (EMO) can be classified in three groups ( Emmerich and Deutz 2018; Greiner et al 2017):

Indicator-based selection EMO; Methods based on some unary indicator to guide the search.
Decomposition/Aggregated-based selection EMO; Methods based on decomposition of the search space, optimizing a set of scalarizing functions in parallel.
Dominance-based selection EMO; Methods that use the concept of Pareto dominance as the basis of their selection.

In this paper, a thorough study of the use of Multi-objective Evolutionary Algorithms applied to the field of Reliability is conducted. Firstly, several representative EMOs of each of the selection criteria defined above are used to optimize a case study:

The S-Metric Selection Evolutionary Multi-objective Optimization Algorithm (SMS-EMOA) (Beume et al 2007), which uses the multi-objective selection based on Dominated Hypervolume, as representative of the indicator-based selection EMO.
The Multi-objective Evolutionary Algorithm Based on Decomposition (MOEA/D) (Zhang and Li 2007) and its extension (MOEA/D-DE) (Li and Zhang 2009), which uses the differential evolution (DE) as an operator, as representative of the Decomposition/Aggregated-based selection EMO,
The Non-dominated Sorting Genetic Algorithm II (NSGA-II) ( Deb et al 2002), and the Generalized Differential Evolution (GDE3) ( Kukkonen and Lampinen 2005), which use the Pareto dominance criterion to perform multi-objective optimization, as representative of the dominance-based selection EMO.

Methods such as SMS-EMOA, MOEA/D and NSGA-II are state-of-the-art standard solvers when it comes to solving real-world multi-objective optimization problems (Emmerich and Deutz 2018). These methods use Simulated Binary Crossover (Deb and Agrawal 1995) to create new individuals. The study is then extended to methods that use Differential Evolution (Storn and Price 1997) to create new individuals such as MOEA/D-DE. In addition, Kukkonen and Lampinen (2005) showed as GDE3 outperformed NSGA-II in a set of different types of test problems. It is intended to compare the performance of these two methods, which share the use of the Pareto dominance criterion, in a real-world problem. Therefore, these five Multi-objective Evolutionary Algorithms are used, looking for the joint optimization of the system design and its preventive maintenance strategy. Moreover, once the previous study is concluded, some recently developed algorithms are used in order to compare the performance results. Such methods are:

The Adaptative Non-dominated Sorting Genetic Algorithm III (ANSGA-III) (Himanshu and Kalyanmoy 2014), which uses the Pareto dominance criterion to perform multi-objective optimization.
An approach to adapt weights during the decomposition-based evolutionary multi-objective optimization (AdaW) (Li and Yao 2020).
The Efficient Large-Scale Multi-objective Optimization Based on a Competitive Swarm Optimizer (LMOCSO) (Tian et al 2020).

5 The case study

The case study consists of optimizing the design and preventive maintenance strategy for an industrial system based on two conflicting objectives: availability and operational cost. Maximum availability and minimum operational cost are desirable. The more investment in preventive maintenance, the greater the system availability. Conversely, this policy implies the growth of unwanted cost and constitutes a conflict between objectives. The proposed methodology is applied to a containment spray injection system (CSIS) of a nuclear power plant, whose simplified model is shown in Fig. 3 and it is based on a case presented by Greiner et al (2003) and previously studied by the authors of the present paper (Cacereño et al 2021a, b). The model is formed by using cut valves $(V_i)$ and impulsion pumps $(P_i)$. The CSIS mission is the injection of borated water into the containment in order to wipe radioactive contamination that may be released after a loss of coolant accident. In this case, the number of redundant devices is limited as it is shown in Fig. 3.

Table 1 Data set for system devices

Simultaneous optimization of design and maintenance for systems using multi-objective evolutionary algorithms and discrete simulation

Abstract

Similar content being viewed by others

Solving Multi-objective Optimal Design and Maintenance for Systems Based on Calendar Times Using GDE3

Multi-Objective Optimal Design and Maintenance for Systems Based on Calendar Times Using MOEA/D-DE

Solving Multi-objective Optimal Design and Maintenance for Systems Based on Calendar Times Using NSGA-II

1 Introduction

2 Literature review

3 Methodology and description of the proposed model

3.1 Extracting availability and cost from the functionability profile

3.2 Building functionability profiles to compute the objective functions

4 Multi-objective evolutionary optimization approach

5 The case study

6 Results and discussion

6.1 Comparing the standard state-of-the-art methods

6.2 Overall optimum design results

6.3 Comparing the best ordered standard state-of-the-art method and recently developed methods

6.4 Discussion

6.4.1 Discrete event simulation combined with multi-objective evolutionary algorithms: the effect of sampling size

6.4.2 Quantification of the operational cost saved

7 Applications

7.1 Application case A: the case study with double branch

7.2 Application case B: an extended model for the containment spray system of a nuclear power plant

7.3 Discussion

8 Conclusions

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 328 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation