Multi-objective robust optimization using adaptive surrogate models for problems with mixed continuous-categorical parameters

Explicitly accounting for uncertainties is paramount to the safety of engineering structures. Optimization which is often carried out at the early stage of the structural design offers an ideal framework for this task. When the uncertainties are mainly affecting the objective function, robust design optimization is traditionally considered. This work further assumes the existence of multiple and competing objective functions that need to be dealt with simultaneously. The optimization problem is formulated by considering quantiles of the objective functions which allows for the combination of both optimality and robustness in a single metric. By introducing the concept of common random numbers, the resulting nested optimization problem may be solved using a general-purpose solver, herein the non-dominated sorting genetic algorithm (NSGA-II). The computational cost of such an approach is however a serious hurdle to its application in real-world problems. We therefore propose a surrogate-assisted approach using Kriging as an inexpensive approximation of the associated computational model. The proposed approach consists of sequentially carrying out NSGA-II while using an adaptively built Kriging model to estimate the quantiles. Finally, the methodology is adapted to account for mixed categorical-continuous parameters as the applications involve the selection of qualitative design parameters as well. The methodology is first applied to two analytical examples showing its efficiency. The third application relates to the selection of optimal renovation scenarios of a building considering both its life cycle cost and environmental impact. It shows that when it comes to renovation, the heating system replacement should be the priority.


Introduction
Engineering systems are nowadays systematically designed for optimal efficiency in a resourcescarce environment.The optimization process is most often carried out using computational models predicting the behavior of the system for any given configuration.While these models are becoming more and more accurate thanks to algorithmic advances and the availability of low-cost computing power, the parameters required to describe the system are not always accurately known.This is due to the inherent variability of the system parameters or simply to the designer's lack of knowledge.Regardless of their nature, uncertainties need to be accounted for in the design process as they affect the prediction made by the computational model and henceforth the design solution.Within the probabilistic framework, various techniques have been developed for optimization under uncertainties (Schuëller and Jensen, 2008;Beck et al., 2015;Lelievre et al., 2016).The optimization problem of interest in this paper comprises two components often treated separately in the literature, namely robust design optimization and multi-objective optimization.The first one, which directly deals with uncertainties, entails finding an optimal solution which is at the same time not sensitive to perturbations in its vicinity.This technique has been widely studied in the literature and state-of-the-art reviews can be found in Zang et al. (2005); Beyer and Sendhoff (2007), among others.Following the pioneering sigma-to-noise ratio laid by Taguchi and Phadke (1989), the most widely used metric of robustness combines the performance first-and second-order moments (Doltsinis and Kang, 2004;Beck et al., 2015).This naturally leads to a multi-objective optimization problem.More recently, quantiles of the performance function have been considered (Pujol et al., 2009;Razaaly et al., 2020;Sabater et al., 2021).
Owing to its simplicity and straightforward account of uncertainties through a unique scalar, this approach is considered in the present work.
The second component, i.e., multi-objective optimization, relates to the presence of multiple performance functions.Very often the objectives described by these functions are conflicting and a compromise needs to be found.The associated methods may be classified into a priori and a posteriori approaches, according to when the decision maker sets preferences among the different objectives.In the former, the designer reformulates the multi-objective problem into one or a series of single objectives by aggregation (Miettinen, 1999;De Weck, 2014;Ehrgott, 2005;Liang and Mahadevan, 2017).In contrast, in a posteriori methods no preference is set before the optimization is carried out.Trade-off solutions are found first and only then are the decision makers given possible design solutions to choose from (Marler and Arora, 2014).
Evolutionary algorithms are another class of methods that have emerged and have shown to be powerful and particularly suited for multi-objective optimization, as they possess several desirable characteristics (Zitzler et al., 2004;Emmerich and Deutz, 2018).These are basically metaheuristics that evolve a set of candidate designs by various mechanisms so as to converge to an approximation of the so-called Pareto front, i.e., a series of optimal design solutions describing a trade-off between competing objectives.Examples of such methods include the vector evaluated genetic algorithm (Schaffer, 1985), the multi-objective genetic algorithm (Fonseca and Flemming, 1993), the strength Pareto evolutionary algorithm 2 (SPEA2) (Zitzler and Thiele, 1999) the Pareto archived evolution strategy (Knowles and Corne, 2000) or the non-dominated sorting genetic algorithm II (NSGA-II) (Deb et al., 2002).The latter is considered in this work as it is efficient and easy to implement.
A commonly known drawback of evolutionary algorithms, and in particular of NSGA-II, is their high computational cost.This is even more problematic when the objective function stems from the propagation of uncertainties through a possibly expensive-to-evaluate computational model.Surrogate modelling is a proven solution to address these cost issues and has been widely used in both uncertainty quantification and design optimization fields.
A surrogate model is an inexpensive approximation of a model that aims at replacing it in analyses requiring its repeated evaluation.One of the most popular methods, namely Kriging a.k.a.Gaussian process (GP) modelling (Sacks et al., 1989;Santner et al., 2003;Rasmussen and Williams, 2006), is considered in this work.Kriging has been extensively used for both multi-objective and robust optimization problems, as the brief literature review in Section 4.2 shows.
Finally, the developments in this paper were motivated by an application related to the optimal selection of renovation scenarios for building renovation under uncertainties (Galimshina et al., 2020(Galimshina et al., , 2021)).The formulated optimization problem is not only multi-objective and robust but also contains a subset of parameters which are categorical.Categorical variables are characterized by the fact that they are composed of a discrete and finite set of values which do not have any intrinsic ordering.There are also often referred to as qualitative or nominal.In the case of building renovation, a typical example is the heating system, which may be selected from technical solutions such as oil, gas, heat pump, etc.To address this aspect, we consider slight adaptations of both NSGA-II and Kriging which allow us to use general implementation of both methods without resorting to new developments.
Based on these premises, the goal of this paper is to develop a general purpose multiobjective algorithm for robust optimization problems using an adaptive Kriging model and capable of handling mixed categorical-continuous variables.To achieve this, we propose formulating the multi-objective and robust optimization problem using quantiles of the performance functions, which are computed by Monte Carlo simulation.The resulting optimization problem is solved using NSGA-II and the computational cost is reduced by the use of an adaptively built Gaussian process model.Focus is put on the exploration/exploitation balance of this adaptive scheme, which allows us to solve complex problems with a relatively small computational cost.
The paper is organized as follows.In Section 2, the quantile-based formulation of the optimization problem is presented considering separately the robust and multi-objective components.Section 3 presents a nested level solution scheme which couples optimization and quantile estimation.Section 4 introduces Kriging to the framework following a short literature review.Section 5 presents the adaptations of the proposed methodology to handle mixed categorical-continuous variables.Finally, Section 6 presents two applications: an analytical example and a real case study which deals with the optimal renovation of a building under uncertainties using life cycle assessment.

Problem formulation
In this paper, we are interested in solving multi-objective and robust optimization problems.Even though robust design often involves minimizing two conflicting objectives (e.g., a measure of performance and a measure of dispersion), these two classes of problems are most often treated separately in the literature.As briefly mentioned in the introduction, the state-of-the-art in robust optimization comprises various formulations.In this work, we are interested in minimizing conservative quantiles of the objective function given the uncertainties in the input.Mathematically, the following problem is solved: where Q α , which represents the α-quantile of the cost function c and reads This quantile is minimized with respect to the design parameters d ∈ D ⊂ R M d , where D denotes the design space which generally defines lower and upper bounds of the design variables.
A set of c constraint functions {f j , j = 1, . . ., c} are additionally considered and are assumed to be simple and easy-to-evaluate analytical functions further restricting the design space.No other hard constraints are considered and to simplify the following developments, we will denote the subset of the design space that is feasible by In contrast, the cost function is assumed to result from the evaluation of a complex, and possibly expensive-to-evaluate, computational model M.This model takes as inputs the uncertain random parameters of the analysis, which are here split into two groups: the random where {α 1 , . . ., α m } ∈ [0, 1] m are the levels of the quantiles as introduced in Eq. ( 2).It is assumed here without loss of generality that the m objective functions are derived from the same computational model M.
3 Problem solution

Nested levels solution scheme
The direct solution of the optimization problem stated in Eq. ( 3) classically involves two nested levels.In the outer level, the design space is explored whereas in the inner one the quantiles corresponding to a given design choice are computed.Even though variancereduction simulation techniques can be used for the computation of quantiles (Glynn, 1996;Dong and Nakayama, 2017), we consider here crude Monte Carlo simulation (MCS).Due to the fact that relatively large values of α are considered for robust design optimization, MCS may indeed provide accurate estimates of the quantiles with relatively few sample points.
The first step in the inner level is to draw a Monte Carlo sample set for a given design choice d (i) : where x (j) and z (j) are realizations of X ∼ f X|d (i) and Z ∼ f Z , respectively.The computational model is then evaluated on each of these points and henceforth the corresponding cost function values are obtained.After ordering the latter, the quantiles corresponding to each output are eventually empirically estimated and denoted by q k d (i) .
By their very nature, quantiles are random variables and plugging Monte Carlo estimates in Eq. ( 1) or (3) would lead to a stochastic optimization problem.Solving such a problem may be cumbersome, and even more so in the presence of multiple objectives.To avoid dealing with the resulting issues, the concept of common random numbers is considered (Spall, 2003).This essentially translates into using the same stream of random numbers within iterations of the optimization problem.More specifically, the random variables x d (1) , x d (2) , x d (3) , ... generated at each iteration of the optimization procedure share the same seed.For the environmental variables, the same set of realizations of z (j) , j = 1, . . ., N is used throughout the optimization.Using this trick, the mapping i) becomes deterministic.As such, Eq. ( 3) can be simplified into This problem can then be solved using any classical multi-objective optimization solver.
Such problems are however ill-defined in a sense, as there is generally no single solution that minimizes all the objective functions simultaneously.Therefore, the traditional approach is to seek for a set of solutions representing a compromise.To define such solutions, a widely-used concept is that of Pareto dominance (Miettinen, 1999;Ehrgott, 2005).Given two solutions a, b ∈ D, a dominates b (denoted by a ≺ b) if and only if the following two conditions hold: The goal of the search is then to find solutions which are not dominated by any other feasible point.A set of such solutions is said to be Pareto optimal and defined as follows: The Pareto front Typically, a discrete approximation of the Pareto front is sought.As mentioned in the introduction, we will consider in this paper an elitist evolutionary algorithm, namely the non-dominated sorting genetic algorithm II (NSGA-II) (Deb et al., 2002).This algorithm is one of the most popular in the field of multi-objective optimization and is briefly described in Appendix A.
NSGA-II is a powerful algorithm for multi-objective optimization.However, as all evolutionary algorithms, its effectiveness comes at the expense of a high computational cost, in the order of hundreds to thousands of evaluations of the fitness function.On top of that, each evaluation of the fitness, i.e., the quantiles for a given design, requires thousands of evaluations of the computational model.The overall computational cost is then prohibitive, especially when the computational model is expensive (e.g., results from a finite element analysis).To alleviate this burden, we consider surrogate models, more precisely Kriging, as described in the next section.
4 Use of Kriging surrogates

Basics of Kriging
Several types of surrogate models have been used to reduce the computational cost of multiobjective evolutionary algorithms (Díaz-Manríquez et al., 2016).These surrogates are often embedded in an active learning scheme where they are iteratively updated so as to be especially accurate in areas of interest.Gaussian process modelling a.k.a.Kriging naturally lends itself to such approaches as it features a built-in error measure that can be used to sequentially improve its own accuracy.
Kriging (a.k.a.Gaussian process modelling) is a stochastic method that considers the model to approximate as a realization of a Gaussian process indexed by w ∈ R M and defined by (Sacks et al., 1989;Santner et al., 2003;Rasmussen and Williams, 2006) where β T f = p j=1 β j f j (w) is a deterministic trend described here in a polynomial form (universal Kriging) and Z (w) is a zero-mean stationary Gaussian process.The latter is completely defined by its auto-covariance function Cov [Z (w) , Z (w )] = σ 2 R (w, w ; θ), where σ 2 is the process variance, R is an auto-correlation function and θ are hyperparameters to calibrate.The auto-correlation function, which is also referred to as kernel, encodes various assumptions about the process of interest such as its degree of regularity or smoothness.
In this paper, we consider the Gaussian auto-correlation function, which is defined in its anisotropic form as The calibration of the Kriging model consists in training it on an experimental design (ED) and hence finding estimates of the three sets of hyperparameters β, θ, σ 2 .This can be achieved using methods such as maximum likelihood estimation or cross-validation (Santner et al., 2003;Bachoc, 2013;Lataniotis et al., 2018).Once the parameters are found, it is assumed that for any new point w, M (w) follows a Gaussian distribution N µ M , σ 2 M defined by its mean and variance as follows: where Furthermore, r (w) is a vector gathering the correlation between the current point w and the experimental design points and finally, u for convenience.
The prediction for any point w is then given by the mean µ M (w) in Eq. ( 9).On top of that, Kriging also provides a prediction variance σ 2 M (w) which is a built-in error measure that can be used to compute confidence intervals about the prediction.This variance is the main ingredient that has favored the development of adaptive methods such as Bayesian global optimization, as shown in the next section.

Robust optimization
In robust optimization, one aims at finding an optimal solution which shows little sensi-tivity to the various uncertainties in the system.This is generally achieved by minimizing a measure of robustness, e.g. the variance and/or mean or a quantile of the cost function (See Eq. ( 1)).The computation of the robustness measure coupled to the optimization incurs a prohibitive computational cost to the analysis.Hence, surrogate-assisted techniques, which are a means to reduce this cost, have received a lot of attention lately.However, due to the fact that the robustness measure that is minimized is only a by-product of the performance function which is usually approximated by a Kriging model, Bayesian optimization techniques are seldom considered.Instead, most of the literature focuses on non-adaptive schemes.In an early contribution, Chatterjee et al. ( 2019) performed a benchmark using several surrogate models for robust optimization.Lee and Park (2006) proposed approximating the logarithm of the variance of the performance function, which is then minimized using a simulated annealing algorithm.Some contributions however addressed the problem while considering active learning in a nested loop scheme, e.g., Razaaly et al. (2020) where a quantile or the mean of the response is minimized.In this work, the authors built local surrogate models for each design and used them to estimate the quantiles.Similarly, Ribaud et al. (2020) proposed to minimize the Taylor series expansions of the performance function's expectation and variance.They considered various configurations for which they used either the expected improvement (or its multi-points version) of the quantities of interest.Finally, Sabater et al. (2021) developed a method based on Bayesian quantile regression to which they added an infill sampling criterion for the minimization of the quantile.

Multi-objective optimization
The naive and direct approach to leverage the optimization CPU cost with Kriging is to calibrate the latter using the experimental design D and then to use it in lieu of the original model throughout the optimization process.This approach is however expensive as it would require an accurate Kriging model in the entire design space.A more elaborate technique has been developed under the framework of efficient global optimization (EGO, Jones et al. (1998)) where the experimental design is sequentially enriched to find the minimizer of the approximated function.This enrichment is made by maximizing the expected improvement (EI), which is a merit function indicating how likely a point is to improve the currently observed minimum given the Kriging prediction and variance.The EGO framework has shown to be very efficient and new merit functions have been proposed to include other considerations such as a better control of the exploration/exploitation balance and the inclusion of constraints (Schonlau et al., 1998;Bichon et al., 2008).The first extension for multi-objective optimization was proposed by Knowles (2005) through the ParEGO algorithm.In ParEGO, the expected improvement is applied on the scalarized objective function derived using the augmented Tchebycheff approach.By varying weights in the scalarization, a Pareto front can be found.Zhang et al. (2010) proposed a similar approach using both the Tchebycheff and weighted sum decomposition techniques with the possibility of adding multiple well-spread points simultaneously.However, such approaches inherit the pitfalls of the underlying decomposition methods.Keane (2006) proposed new extensions of EI that allows one to directly find a Pareto front without transforming the multi-objective problem into a mono-objective one.Another adaptation of a mono-objective criterion has been proposed by Svenson and Santner (2010) with their expected maximin improvement function.This is based on the Pareto dominance concept and involves a multi-dimensional integration problem which is solved by Monte Carlo simulation, except for the case when m = 2 for which an analytical solution is provided.Apart from such adaptations, some researchers have proposed new improvement functions directly built for a discrete approximation of the Pareto front.This includes the contribution of Shu et al. (2021) where a new acquisition function based on a modified hypervolume improvement and modified overall spread is proposed.Moving away from the EGO paradigm, Picheny (2015) proposed a stepwise uncertainty reduction (SUR) method for multi-objective optimization.Alternatively, the hypervolume measure of a set, which is the volume of the subspace dominated by the set up to a reference point, has been used to derive infill criteria.Namely, Emmerich et al. (2006) proposed a hypervolume-based improvement criterion.Originally based on Monte Carlo simulation, Emmerich et al. (2011) proposed a procedure to compute the criterion numerically.This method, however, does not scale well with the number of objective functions or samples in the approximated Pareto set.
Efforts have been put in reducing this computational cost.This includes the contributions of Couckuyt et al. (2014); Hupkens et al. (2015); Yang et al. (2019).They all involve solving numerically the underlying integration problems by partitioning the space into slices using various methods.Finally, another attempt at decreasing the cost has been made by Gaudrie et al. (2020) where the authors proposed to focus the search of the Pareto front to a subspace according to some design preference.By doing so, the size of the discrete Pareto set can be reduced and the CPU cost of estimating the expected hypervolume improvement altogether.
In this work, we rather use a two-level nested approach as described in Section 4.3.We do not consider a direct Bayesian global optimization approach for two reasons.First, as shown above, such techniques in the context of multi-objective optimization are computationally intensive.Second, and most of all, the functions to optimize are not the direct responses of the model that would be approximated by a surrogate because of the robustness component of the problem.More specifically, in Bayesian optimization, the cost function c is approximated by the GP model and minimized at the same time.In contrast, we consider here as objective function the quantile Q α (c; X (d) , Z).Using this quantile directly in Bayesian optimization would require approximating the mapping d → Q α (c; X (d) , Z).However, computing the quantile for a given value of d requires running a Monte Carlo simulation where the underlying expensive computational model M is repeatedly evaluated.The overall computational cost would therefore be prohibitive.For this reason, we resort to a nested two-level approach where the approximation of the computational model and the optimization problems are decoupled.

Motivation
In the previous section, we have very briefly reviewed the literature for multi-objective and robust optimization.The two topics were treated separately because, to the authors' best knowledge, very little to no research has been done for the combined problem.It is important to stress here that by "multi-objective and robust optimization" we disregard robust optimization methods that are formulated by minimizing the mean and the variance of a single performance function using a multi-objective optimization scheme.Such methods are often coined multi-objective robust optimization in the literature but are not the object of the present paper.We instead consider problems which are defined by multiple and conflicting performance functions, regardless of the robustness measure considered.
There have been some recent works combining multi-objective robust optimization as described here with Gaussian process modelling.Most notably, Zhang and Taflanidis (2019) proposed a framework for the solution of multi-objective problems under uncertainties using an adaptive Kriging model built in an augmented space.The latter is introduced as a means to combine the design and random variables space and hence to allow for the construction of a unique surrogate model that could simultaneously support optimization and uncertainty propagation (Kharmanda et al., 2002;Au, 2005;Taflanidis and Beck, 2008).Earlier works considering the augmented random space for optimization include Dubourg et al. (2011); Taflanidis and Medina (2014); Moustapha et al. (2016).
In this work, we also consider building a surrogate model in an augmented space, however using a two-level approach instead of direct Bayesian optimization as proposed in Zhang and Taflanidis (2019).The latter proposed to use the ε-constraint method to solve the multiobjective problem and devised a hybrid enrichment scheme enabled by the fact that their robustness measure is the expectation of the objective function.This is a more restrictive definition as the variance or dispersion within the performance function is not accounted for.Ribaud et al. (2020) estimated the expected improvement for the variance of the objective function by Monte Carlo simulation, which can be computationally expensive.
In this paper, we consider a nested level approach, hence allowing us to rely on a surrogate of the objective function itself rather than that of the robustness measure, herein the quantile.
The latter is then estimated using Monte Carlo simulation with the obtained surrogate and the resulting optimization problem is solved using an evolutionary algorithm, namely NSGA-II.Coupling the optimization and the surrogate model construction in the augmented space, an enrichment scheme is devised as described in Step 6 of the algorithm described in the next section.

Worflow of the proposed method
The workflow of the algorithm is detailed in Figure 1 and summarized in the following: 1. Initialization: The various parameters of the algorithm are initialized.This includes: Initialization: W, {W, Y}, η(g) q , Gmax, K, ...
Estimate the accuracy of the surrogate on the Pareto solutions using Eq. ( 12) Is η (g) q ≤ η(g) q ?
Enrich the experimental design sing Step 5.
Build outer surrogate models on the D Figure 1: Flowchart of the proposed approach.
• the augmented random space, which is defined as in Moustapha and Sudret (2019), where Z is the support of the random variables Z ∼ f Z and X is the design space extended to account for the variability in the extreme values of the design variables.
This confidence space, which is defined considering the cumulative distribution of the random variables at the lower and upper design bounds, respectively denoted by where that if there were no uncertainties on the design parameters, we would simply use i) , where the inputs  ) corresponding to one of the objective functions; • the NSGA-II related parameters and convergence threshold such as the maximum number of generations G (j) max , and • the enrichment parameters, such as the number of enrichment points K per iteration.
2. Surrogate model construction: m Kriging models, denoted by M (j) k , are built in the augmented space W using D, as described in Section 4.1.This corresponds to building a separate Kriging model for each of the m performance functions.

Optimization:
The NSGA-II algorithm is then run to solve the problem in Eq. ( 5) where the quantiles are computed using the surrogate model M (j) in lieu of the original model.Apart from the actual convergence criteria of NSGA-II, a maximum number of generations G (j) max is set.Its value is chosen arbitrarily low at the first iterations.The idea is that at first, emphasis is put more on the exploration of the design space.
Since NSGA-II starts with a space-filling LHS over the entire design space, the first generations are still exploring.By stopping the algorithm then, we can enrich the ED by checking the accuracy of the quantiles estimated by the early sets of design samples.This allows us to direct part of the computational budget for enrichment in areas pointing towards the Pareto front without skipping some intermediate areas of interest.

Accuracy estimation:
The local accuracy of the quantiles corresponding to the points in the current Pareto front F (j) is estimated by considering the variance of each of the m Kriging models.More specifically, the relative quantile error for each objective in every point of the Pareto set is estimated as follows: where q ± α k are upper and lower bounds of the k-th quantile estimated using the predictors µ M (j) k ± 1.96 σ M (j) k .These are not actual bounds per se but a "kind of" 95%confidence interval indicating how locally accurate are the Kriging models.Note that when the quantiles are close to 0, it is possible to replace the denominator in Eq. ( 12) by another normalizing quantity, such as the variance of the model responses at the first iteration.

Convergence check:
The convergence criterion is checked for all the points of the Pareto set with outliers filtered out to accelerate convergence.Samples with values larger than η 90 q k + 1.5 (η 90 q k − η 10 q k ) are considered to be outliers, where η 90 q k and η 10 q k are respectively the 90-th and 10-th percentile of the convergence criterion for the k-th objective.A usual definition of outliers is based on the interquartile range (McGil et al., 1978) but we consider here a more conservative definition by rather using the interdecile range.Convergence is assumed if the relative quantile errors η q k for all the points of the Pareto front (potential outliers excluded), as computed by Eq. ( 12), are below a threshold η(j) q .The latter can also be set adaptively so as to be loose in the initial exploratory cycles of the algorithm.The algorithm then goes to Step 8 if convergence is achieved or proceeds with the next step otherwise.6. Enrichment: K points are added per iteration.The enrichment is carried out in two steps.The idea is to add multiple points to the ED within a single iteration by splitting them into two sets.While the first set consists of the points that directly maximize the learning function, the second set is spread out evenly among the best candidates for enrichment.In practice, they are obtained as follows: (a) First set: For each objective that does not satisfy the convergence criterion, the point in the Pareto set with the largest error is identified and denoted by d max k .
The corresponding enrichment sample is then chosen as the one which maximizes the local Kriging variance, i.e.,: where q .This set is then reduced into K 2 evenly distributed points using Kmeans clustering.For each of these points, the corresponding enrichment sample is chosen as the one maximizing the Kriging variance among the samples used to compute the corresponding quantiles, similarly to Eq. (13).

Experimental design update:
The original computational model is then evaluated on the K enrichment points identified in the previous step, hence leading to K new pairs of samples D To accelerate the whole procedure, we propose to optionally build at each iteration, and for each objective, an outer surrogate model to be used by the optimizer.The corresponding experimental design reads: where q α k is the quantile estimated using the current surrogate model M (j) k .Since this surrogate is built using another surrogate model, it is not necessary to use active learning at this stage.Instead, we simply draw a unique and large space-filling experimental design of size n (j) out .To increase the accuracy around the Pareto front in the outer surrogate, the experimental design inputs are updated after each cycle considering two different aspects.
First, the samples in the Pareto set of the previous cycle are added to the space-filling design.
Second, the accuracy of the outer surrogate w.r.t. the inner one in estimating the quantiles in the Pareto front is checked after each cycle.The related error is monitored and calculated as follows: where µ q k is the quantile predicted by the outer surrogate model.If this error is larger than a threshold ηq,out the size of the ED for the outer surrogate is increased before the next cycle.

The case of categorical variables
The methodology presented in the previous section assumes that all variables are continuous.
However, in some cases, and particularly in the applications considered in this paper, some of the design variables may be categorical.Categorical variables are defined in a discrete and finite space and have the particularity that they are qualitative and cannot be meaningfully ordered.As such, the Euclidean distance metric does not apply to such variables.
However, NSGA-II, Kriging and K-means clustering rely on such a metric since it is used in the definition of the cross-over and mutation operators, and in the evaluation of the kernel function.We will thus consider adaptations of these methods to handle the mixed categorical-continuous problems treated in this paper.The main idea of these adaptations is to remain within the general implementation of these methods, so as to use existing tools without further developments.
To highlight the nature of the variables, we introduce the notations w = (w con , w cat ) where

NSGA-II with mixed continuous-categorical variables
Prompted by their reliance on heuristics, a large variety of both cross-over and mutation operators have been proposed in the literature throughout the years.The ones developed for NSGA-II, i.e., the simulated binary crossover (SBX) and polynomial mutation, are dedicated to continuous variables only (See Appendix A for more details).Various adaptations have been proposed but they may only be used for discrete variables as they involve numerical rounding.For operators dedicated to categorical variables, we look into those developed for binary genetic algorithm (Umbarkar and Sheth, 2015).Since GA allows for treating each variable separately, we first split the continuous and categorical variables into two groups.For the continuous variables, the original operators for NSGA-II are used without any modification.For the categorical variables, we consider two operators typically used in binary genetic algorithms, namely the one-point crossover operator and a simple random mutation.
In practice, let us consider two parents whose categorical components are denoted by w (1) cat and w (2) cat .One-point cross-over is achieved by randomly choosing an integer p such that 1 ≤ p < M cat which will serve as a cross-over point where the parents are split and (16) As for the mutation, the components to be mutated are simply replaced by another level of the same variables drawn with equi-probability.Let us assume that the component w offspring,(1) cat,j = q of an offspring has been selected for mutation.It is then replaced by drawing uniformly from the set { 1 , . . ., Mcat } \ q , where each level has a probability of 1/(M cat − 1) to be selected.It should be noted at this point that mutation is only performed with a probability of 1/M , meaning that on average only one variable (both categorical and continuous) is mutated per iteration.

Kriging with mixed continuous-categorical variables
One of the most important features of Kriging is the auto-correlation or kernel function, which encodes assumptions about the centered Gaussian process Z (w).Most Kriging applications rely on continuous variables and there is a rich list of possible kernel functions for such variables.Examples include the polynomial, Gaussian, exponential or Matérn kernels (Santner et al., 2003).Such kernels are built exclusively for quantitative variables and need some tedious adaptations or pre-processing such as one-hot encoding when it comes to qualitative variables.An alternative yet cumbersome technique is to build multiple models, one associated to each category.At an intermediate level, Tran et al. (2019) proposed using a Gaussian mixture based on clusters corresponding to combination of categorical variables.A more convenient way of handling mixed categorical-continuous variable problems is through the definition of adapted kernels.This is facilitated by an important property which allows one to build valid kernels by combining other valid kernels through operators such as product, sum or ANOVA (Roustant et al., 2020).Considering the product operator and splitting the input variables into their continuous and categorical components, a kernel can be obtained as follows: where k con and k cat are kernels defined on the space of continuous and categorical variables, respectively.
Thanks to this property, it is possible to build a different kernel for each variable separately.For the continuous variables, the traditional kernels can be used without any modification.As for the categorical variables, a few approaches have been proposed in the literature.
One of the earliest contributions is that of Qian et al. (2008) which however involved a tedious calibration procedure.Zhou et al. (2011) proposed an enhancement based on hypersphere decomposition with a much more simplified calibration procedure.However, they use the most generic parametrization approach which does not scale well with the overall number of categories b.
More parsimonious representations have been proposed, e.g., the so-called compound symmetry which assumes the same correlation among different levels of the same categorical variable.In this work, we consider this approach combined with a simple dissimilarity measure for each categorical variable, i.e., The corresponding general form of the compound symmetry kernel for one categorical variable is (Pelamatti et al., 2020): where 0 < c < 1.In this work, the same compound symmetry kernel is built but embedded within usual stationary kernels such as the Gaussian, exponential or Matérn correlation functions.Considering the Gaussian kernel for instance, the corresponding uni-dimensional kernel reads: Combining all dimensions, the following kernel is eventually obtained: This formulation allows us to build and calibrate the Kriging model using the same tools as for the continuous case, since the hyperparameters θ cat k are defined similarly to θ cont k and both can be calibrated in the same set-up.
Finally it should be noted, as also mentioned in Roustant et al. (2020), that despite using different constructions, this is nearly identical to the one proposed by Halstrup (2016), where the Gower distance, a distance in the mixed variable space, is embedded within the Matérn auto-correlation function.The only difference is that the Euclidean distance is used for the continuous part, instead of a measure based on the range of the variables.
This kernel is flexible enough and can be implemented easily within a generic software such as UQLab (Marelli and Sudret, 2014).However, as shown in Pelamatti et al. (2020), such kernels may be limited when there are a high number of levels or categories.In fact, a kind of stationary assumption is embedded in this construction as the kernel only depends on the difference between two levels, regardless of their actual values.More advanced techniques such as group kernels (Roustant et al., 2020) or latent variables Gaussian process (Zhang et al., 2020;Wang et al., 2021) have been proposed in the literature but they are not considered in this work.
Let us eventually note that the Gower distance measure is also used to compute distances in the K-means clustering algorithm in the enrichment scheme.In our implementation, the cluster centers are updated by calculating, in each cluster, the mean for the continuous variables and the mode for the categorical values.

Applications
In this section, we will consider three applications examples.The first two ones are analytical and will serve to illustrate and validate the proposed algorithm.The last one is the engineering application related to optimal building renovation strategies.
For each NSGA-II cycle, we consider a population size of L = 100 and a maximum number of iterations G max = 100.The probability of mutation and crossover are respectively set to 0.5 and 0.9.
To assess the robustness of the proposed methodology, the analysis is repeated 10 times   The hypervolume (here area) up to the reference point R is approximated using the trapezoidal rule for integration.By denoting the one obtained from the i-th repetition by i) , the resulting relative error measure reads where A ref is the hypervolume estimated using the reference solution.
To estimate the part of the error due to the outer surrogate model, we compute the same error again, but this time using the original model and the Pareto set.Denoting the corresponding hypervolume by A , the following relative error is estimated: 6.1 Example 1: 7-dimensional analytical problem The first example considered here is an analytical problem built for illustration and validation purposes.The original problem, referred to as BNH, is popular in benchmarks for multiobjective optimization and was introduced in Binh and Korn (1997).It is a two-dimensional problem which is however deterministic and only contains continuous variables.We therefore add two categorical variables and three random variables so as to build a multi-objective and robust optimization problem.The original problem, which reads is modified using the following two steps: • First, the two objective functions are modified to include two categorical variables d 3 and d 4 , which can take each three possible levels: • Then the random variables are added as follows: where Z 5 ∼ Lognormal(5, 0.5 2 ), Z 6 ∼ Lognormal(4, 0.4 2 ) and Z 7 ∼ Gumbel(1, 0.2 2 ).
In this notation, the two parameters are the mean and variance of Z 5 , Z 6 and Z 7 , respectively.
The final optimization problem therefore reads: For each of the 10 repetitions, the analysis is started with an initial experimental design of size n 0 = 3M = 21.We consider five values of the stopping criterion, namely ηq = {0.1,0.05, 0.03, 0.01, 0.001} (See Eq. ( 12)). Figure 3 shows the relative error on the hypervolumes (as in Eq. ( 23) and ( 24)) for each of these cases.As expected, the accuracy of the results increases as the convergence criterion is made tighter.This is however true only up to a given point.With ηq = 10 −3 , the criterion ∆ HV , which is based on the Pareto set keeps decreasing while it increases noticeably when considering ∆ HV which is directly computed using the estimated Pareto front.This discrepancy can be attributed to the outer surogate model that becomes less accurate, probably due to overfitting, as the number of samples sizes is increased.It should be reminded here that the outer surrogate is built using a unique experimental design whose size may increase together with the iterations.Finally, we note that the number of evaluations of the original model increases rapidly together with the threshold for convergence, as shown in Figure 4.  To further explore these results, we consider the run with the median accuracy at the threshold of ηq = 0.03. Figure 5 shows the convergence of the selected analysis with the boxplots representing the relative error on the quantile for each point of the Pareto fronts.
The green lines correspond to the worst quantile relative error after excluding outliers.The vertical dashed lines together with the triangular markers show where the algorithm would stop for various thresholds of the convergence criterion.After 20 iterations, the rate of convergence decreases considerably, which explains the large added number of model evaluations required to reach the target of ηq = 10 −3 .Figure 6 shows the corresponding Pareto front (Figure 6a) and set (Figure 6b) together with the reference ones.As expected, given the way the categorical variables were introduced, there are two combinations of the categorical variables in the Pareto set: (d 3 , d 4 ) = (2, 2) or (2, 3).The two fronts cover the same volume and are spread in a similar way.In the input space, we can also see that the solutions cover roughly the same area.For this highlighted example, convergence is achieved in 16 cycles with a total of 101 evaluations of the original model.This contrasts with the total of 5 • 10 7 model evaluations (100 iterations of NSGA-II with 100 samples per generation and 5, 000 samples for the estimation of the quantile for each design) required for the reference solution using a brute force approach.

Example 2: Analytical problem with discontinuous Pareto front
This second analytical example is adapted from Manson et al. (2021).The Pareto front for this example is concave and presents two discontinuities.Further it only features design variables (i.e., there is no environmental variables), some of which are random.This allows us to showcase the versatility of the proposed algorithm.The two deterministic cost functions read  We then add some random variables which we associate to the design variables d 1 and d 2 .
Both are assumed normal, i.e.
The initial experimental design is set to n 0 = 3 M = 9.For this example, the variations in accuracy due to tighter convergence criteria are not so noticeable, as can be seen in Figure 7.In fact, the resulting Pareto front is already accurate enough with ηq = 0.1.
It should be noted that some of the variability in estimating the relative error is due to the approximations inherent to computing the hypervolume using the trapezoidal rule for integration and the limited set of points it relies upon.This rapid convergence may also  be seen on the small increase in number of model evaluations and cycles to convergence as shown in Figures 8 and 9.
Finally, we show in Figure 10 the Pareto front and sets obtained for the median solution at

Example 3: Application to building renovation
This third example deals with building renovation, which is actually the application that has motivated this work.Because buildings are responsible for 40% of energy consumption and 36% of greenhouse gas emissions from energy in Europe, the European union has recently pledged to renovate 35 million buildings in the next 10 years (European Commission, 2020).
Building renovation is indeed an important lever since current buildings are not energyefficient but yet are expected for the most part to still stand by 2050.
Renovation thus needs to take into account the entire life cycle of the building, which may span over decades.This implies accounting for various uncertainties, be it in socio-  economic and environmental conditions or in the variability of the parameters of the selected renovation strategies.This can be achieved using life cycle analysis where two quantities of interest are often considered: the life cycle cost (LCC) and the life cycle environmental impact (LCEI).The former includes various costs such as the cost of production of new materials, the related cost of replacement or repair, the labor cost, etc.The latter refers to the overall greenhouse gas emissions over the entire life cycle of the building posterior to the renovation.
The stakeholders need to weigh these two quantities to decide which are the optimal renovation strategies for a given building while accounting for the various sources of uncertainty.
To this aim, robust multi-objective optimization may be used as a reliable way of exploring the extremely large design space (i.e., the combination of all possible choices available to the stakeholders).Using c 1 = LCC and c 2 = LCEI, the problem may be formulated as in Eq. ( 5) and the proposed methodology may be used to solve it.
As an application, we consider in this paper, a building situated in Western Switzerland and constructed in 1911 (See Figure 11).The LCA is carried out using a model developed in Galimshina et al. (2020).The computational model is implemented in Python and a single run lasts a few seconds.The model contains more than a hundred parameters.However, using expert knowledge and global sensitivity analysis, screening allowed us to reduce the input to 23 parameters, among which 6 are design parameters and 13 are environmental variables (Galimshina et al., 2020).The design parameters include 4 categorical variables as shown in Table 1, which lead to 3, 600 levels.This includes the 6 types of heating system: oil, gas, heat pump, wood pellets, electricity and district heating.Various types of walls and windows with different characteristics that are selected from a publicly available catalog are also considered.The remaining two design parameters are the insulation thickness associated to the selected walls and slabs.The environmental variables are all random and their distributions are shown in Table 2.They are split into three groups which pertain to the occupancy usage, economic parameters and renovation components variability.The analysis is initialized with an experimental design of size n 0 = 3M = 69 points drawn using an optimized Latin hypercube sampling scheme.The threshold for convergence is set to ηq = 0.03, which resulted in 57 cycles of the algorithm as shown in Figure 12.This corresponds to a total of 271 model evaluations.The Pareto front is shown in Figure 13.It features some discontinuities which are due to some noticeable changes in the properties of the categorical variables.The main driver to decrease both LCC and LCEI is the heating system.The upper part of the Pareto front corresponds to a heat pump which leads to small values of LCC and larger values of LCEI.This is in contrast with the wood pellets which correspond to the lower part of the Pareto front.For this example, we eventually select one solution, which is in the upper part and is highlighted by the red diamond in Figure 13.This choice reflects a preference on the cost with respect to the environmental impact.C val = d * , z (i) , i = 1, . . ., 500 , where d * is the selected solution.Figure 14 12.This value could be reduced by selecting a smaller value of ηq .This would however lead to a larger computational cost and analysis run time.

Conclusion
In this paper, we proposed a methodology for the solution of multi-objective robust optimization problems involving categorical and continuous variables.The problem was formulated using quantiles as a measure of robustness, the level of which can be set to control the desired degree of robustness of the solution.A nested solution scheme was devised, where optimization is carried out in the outer loop while uncertainty propagation is performed in the inner loop.This however results in a stochastic optimization problem which may be cumbersome to solve.The concept of common random numbers was introduced to approximate this stochastic problem by a deterministic one, which is much easier to solve.This allows us then to use a general-purpose multi-objective solver, namely the non-dominated sorting genetic algorithm II (NSGA-II).
To reduce the computational burden of this nested solution scheme, Kriging was introduced in the proposed framework using an adaptive sampling scheme.This is enabled by checking the accuracy of the quantile estimates in areas of interest, namely the area around the Pareto front.Finally, the proposed approach was adapted to account for mixed categorical-continuous parameters.Two validation examples were built analytically while considering all the characteristics of the problem at hand.
The methodology was then applied to the optimization of renovation scenarios for a building considering uncertainties in its entire life cycle post-renovation.This has shown that a driving parameter for such a renovation is the replacement of the heating system by either wood pellets or heat pump.This result was validated by running the original model around the selected solution, hence showing that the final surrogate model was accurate.
This methodology was eventually applied in a detailed analysis presented in Galimshina et al. (2021).

Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Replication of results
The results presented in this paper were obtained using an implementation of the proposed methodology within UQLab (www.uqlab.com)(Marelli and Sudret, 2014), a Matlab-based framework for uncertainty quantification.The codes can be made available upon request.
has indeed proved to accelerate convergence (Zitzler and Thiele, 2000).The general workflow of NSGA-II is illustrated in Figure 15.The following section details the important steps of NSGA-II as described in Algorithm 1: 1. Children generation: Following the initialization where the population P g is sampled, the first step in Algorithm 1 is binary tournament selection, which is used to pre-select good individuals for reproduction.The idea is to form a new population P g by selecting the best among two (or more) randomly selected individuals.The new population is then of the same size as the original but comprises repetitions.Individuals in this population constitute the basis for the generation of children using cross-over and mutation operators.The former consists in combining two parents to generate two offsprings.There exists a wide variety of cross-over operators (Umbarkar and Sheth, 2015) and NSGA-II uses the so-called simulated binary cross-over (SBX).Mutation is used to prevent premature convergence to a local minimum.Following a predefined probability, a new offspring is mutated, i.e., some of its components are randomly perturbed, thus allowing for more exploration of the design space.The polynomial mutation operator has been developed for NSGA-II.The resulting set of offsprings is denoted by Q g .

2.
Non-dominated sorting: Once the new population R g is generated by assembling P g and Q g , it is subjected to a recursive non-dominated sorting.This means that the set of all non-dominated individuals are identified and assigned rank one.This set, denoted by F 1 , is then set aside and non-dominated sorting is again performed on R g \ F 1 to find the front of rank two F 2 .The process is repeated until all individuals have been assigned a rank.The population for the next generation is then composed of the individuals of the first ranks whose cumulated cardinality is smaller than L.

Crowding distance assignment:
To reach L offsprings, it is necessary to select only a subset of the population in the following front.To do so, the crowding distance, which measures the distance of a given solution to the closest individuals in its rank, is used.Diversity in the Pareto front is hence enforced by choosing for the same rank solutions with the largest crowding distance.Once the remaining solutions are chosen according to their crowding distance ranking, the new population P g+1 is defined.
variables X ∼ f X|d represent the variability associated to the design parameters d whereas Z ∼ f Z are the so-called environmental variables which affect the system response without necessarily being controlled by the designer.Manufacturing tolerances (design) and loading (environmental) are typical examples of the two categories of uncertain parameters in the context of structural engineering design.As in the applications considered in this work, engineering systems sometimes require the simultaneous optimization of multiple quantities of interest.Considering a set of m cost functions denoted by {M k , k = 1, . . ., m}, the optimization problem of interest now becomes d * = arg min d∈S

i
are bounds on the design random variable space with respect to confidence levels of α d − i and α d + i .Note sampled using Latin hypercube sampling (LHS, McKay et al. (1979)) and N is the sample set used to compute the quantiles for the design point d max k .The number of added points is denoted by K 1 and corresponds to the number of objectives for which the convergence criterion is not respected.(b) Second set: The remaining K 2 = K − K 1 points are identified by first selecting all the design solutions in the Pareto front that produce errors larger than the threshold η(j) added to the current experimental design, i.e., D ← D ∪ D The outliers identified at Step 5, if any, are removed and the remaining points in the Pareto front and set are returned.
is a vector gathering the continuous variables while w cat ∈ R Mcat gathers the categorical parameters.Each component w catj can take one of b j values, called levels, and denoted by 1 , . . ., bj .Hence, there is a total of b = Mcat j=1 b j categories.
for the first two examples.The results are summarized using boxplots, where the central mark indicates the median, the bottom and top edges of the bars indicate the 25th and 75th percentiles, respectively, and the outliers are plotted using small circles.The reference solution is obtained by solving this problem using NSGA-II and without surrogates.The accuracy is assessed by evaluating the closeness of the obtained Pareto front with the reference one.The hypervolume, which is the volume of the space dominated by the Pareto front, is chosen as basis for comparison.It is often computed up to a given reference point as illustrated in Figure 2. In this work, the reference point is considered as the Nadir point, i.e., where D * ref is the Pareto set corresponding to the reference Pareto front F ref .

Figure 2 :
Figure 2: Illustration of the hypervolume as the red shaded area.The reference Pareto front F ref is repesented by the blue line while the Nadir point R is shown by the black dot.

( a )Figure 3 :
Figure 3: Example 1: Relative errors w.r.t. the reference hypervolume for various thresholds of the stopping criterion.

Figure 4 :
Figure 4: Example 1: Number of model evaluations for various thresholds of the stopping criterion.
(a) Convergence for c1 (b) Convergence for c2 5: Example 1: Relative error of the 90% quantiles of the costs c 1 and c 2 for the entire Pareto front at the end of each cycle.The upper convergence limit is shown by the continuous line.

Figure 6 :
Figure 6: Example 1: Comparison of the Pareto fronts and sets for the results with median relative hypervolume error and the reference solution.

( a )
Relative error w.r.t.surrogate model (b) Relative error w.r.t.original model

Figure 7 :
Figure 7: Example 2: Relative errors w.r.t. the reference hypervolume for various thresholds of the stopping criterion.

Figure 8 :
Figure 8: Example 2: Number of model evaluations for various thresholds of the stopping criterion.

Figure 9 :
Figure 9: Example 2: Relative error of the quantiles of the costs c 1 and c 2 for the entire Pareto front at the end of each cycle.The upper convergence limit is shown by the continuous line.

Figure 10 :
Figure 10: Example 2: Comparison of the Pareto fronts and sets for the results with median relative hypervolume error and the reference solution.

Figure 11 :
Figure 11: Building considered from renovation together with a few possible renovation scenarios (adapted from Galimshina et al. (2020)).

Figure 12 :
Figure 12: Example 3: Relative quantile error of the entire Pareto front at the end of each cycle.

Figure 14 :
Figure 14: Example 3: Original vs. predicted responses for a subset of the Monte Carlo set used to compute the quantiles at the selected solution.

Table 1 :
Design parameters selected for the building renovation application.Curly brackets {•} correspond to categorical variables.Square brackets [•] define the interval of interest for continuous variables.

Table 2 :
Distribution of the environmental variables for the building renovation application.
compares the prediction by the final surrogate model and the original response on this validation set.As expected, the surrogate model is able to correctly approximate the original model around the chosen solution.This is confirmed by comparing the normalized mean-square error (NMSE) and the 90%-quantile shown in Table4.Even though the Monte Carlo sample set size is reduced, the estimated quantiles allow us to have an idea of how accurate the surrogate models are.More specifically the relative errors of the quantiles of LCC and LCEI due to the introduction of the surrogates are approximately 0.2% and 0.4%, respectively.The NMSE for LCEI is slightly larger compared to that of LCC, which is consistent with the convergence history in Figure

Table 4 :
Validation of the selected solution.