Hypervolume Scalarization for Shape Optimization to Improve Reliability and Cost of Ceramic Components

In engineering applications one often has to trade-off among several objectives as, for example, the mechanical stability of a component, its efficiency, its weight and its cost. We consider a biobjective shape optimization problem maximizing the mechanical stability of a ceramic component under tensile load while minimizing its volume. Stability is thereby modeled using a Weibull-type formulation of the probability of failure under external loads. The PDE formulation of the mechanical state equation is discretized by a finite element method on a regular grid. To solve the discretized biobjective shape optimization problem we suggest a hypervolume scalarization, with which also unsupported efficient solutions can be determined without adding constraints to the problem formulation. We investigate the relation of the hypervolume scalarization to the weighted sum scalarization and to direct multiobjective descent methods. Since gradient information can be efficiently obtained by solving the adjoint equation, the scalarized problem can be solved by a gradient ascent algorithm. We evaluate our approach on a 2 D test case representing a straight joint under tensile load.


Introduction
In engineering design one often follows multiple incompatible goals, for example, when maximizing reliability and efficiency while simultaneously minimizing cost. In practical applications, conflicting goals are often combined into one single objective function using a weighted sum of the criteria. The main advantage of this approach is its simplicity and its numerical stability. On the downside, the selection of meaningful weights is generally difficult. Moreover, relevant compromise solutions may be missed when the problem is non-convex. In this paper, we consider a biobjective shape optimization problem with the two objectives reliability and cost. To carry over the main advantages of the weighted sum and simultaneously overcome its main drawbacks, we suggest to use parametric hypervolume scalarizations to efficiently generate compromise solutions. The scalarized subproblems are solved using a gradient ascent method that takes advantage of derivative information obtained from the numerical solution of the adjoint equation. Shape optimization problems are typical examples of engineering problems where multiple conflicting objective functions have to be considered. In this paper, we focus on the maximization of the mechanical stability of a ceramic component under tensile load while simultaneously minimizing its volume (as a measure of cost and weight). We follow the modelling approach proposed in Doganay et al. (2019), where the mechanical stability is modelled using a Weibull formulation of the probability of failure. For a general introduction to shape optimization we refer to Haslinger and Mäkinen (2003). The main advantage of using a probabilistic formulation for the mechanical stability is that in this way the probability of failure can be assessed with a differentiable objective function. This paves the way for the application of efficient optimization methods using gradient information. We adopt the formulation suggested in Bolten et al. (2015Bolten et al. ( , 2019 that is based on the probabilistic model of Weibull (1939). In our numerical implementation we follow a first discretize then optimize paradigm as suggested in Doganay et al. (2019). In this approach, a finite dimensional optimization problem is solved on a finite element grid subject to a discretized PDE-constraint that is induced by the mechanical state equation. The minimization of the probability of failure of a component usually leads to large and hence costly components. This unwanted effect can be avoided by setting an upper bound on the admissible volume, see, for example, Bolten et al. (2019). However, such a constrained formulation of the shape optimization problem does not provide information on the trade-off between reliability and cost. Indeed, interpreting the volume of the component as a second and independent objective function allows for the computation of alternative shapes, thus supporting the decision maker with the selection of a most preferred solution with respect to both criteria. Following the approach of Doganay et al. (2019), we complement the minimization of the probability of failure of a component by the minimizaztion of its volume in a biobjective shape optimization model.
For a general introduction into the field of multiobjective optimization we refer to Miettinen (1998) and Ehrgott (2005). Problems of multiobjective shape optimization have mainly been approached with evolutionary multiobjective optimization (EMO) algorithms, see, for example, Deb and Goel (2002); Chirkov et al. (2018). However, recently multiobjective PDE-constrained optimization problems have also been considered from an optimal control point of view in Iapichino et al. (2017). In this paper, we employ a scalarization-based approach to approximate the Pareto front of the biobjective shape optimization problem. Due to its simplicity and its numerical stability, the weighted sum method is a widely used scalarization method in this context. It combines all objective functions into one weighted sum objective, hence yielding a single objective counterpart problem without adding any additional constraints. However, the weighted sum method is quite limited in the context of non-convex problems. Indeed, by varying the weights only supported efficient solutions can be computed, i.e., Pareto optimal shapes for which the corresponding outcome vectors lie on the convex hull of the set of all feasible outcomes. Since relevant compromise solutions may thus be missed, we suggest to use nonlinear scalarizing functions as, for example, the hypervolume scalarization. While sharing the property that it leads to single-objective counterpart problems without adding constraints to the problem formulation, it is not restricted to supported efficient solutions. In the biobjective case, the hypervolume scalarization yields a quadratic objective function that has a nice geomatrical interpretation. We note that quadratic scalarizations are also considered in Fliege (2004); Dandurand and Wiecek (2016). The hypervolume scalarization is based on the hypervolume indicator that was introduced in Zitzler and Thiele (1999) as a measure to compare the performance of EMO algorithms. The hypervolume indicator has since become an important measure in a large variety of applications, of which the fitness evaluation of solutions sets (particularly in the context of EMO algorithms) and of individual solutions are only two examples. Given a reference point in the objective space, the hypervolume indicator of a solution set measures the volume that is (a) dominated by the solution set, and (b) dominates the reference point, c.f. Figure 1. A formal definition is given in Section 3 below. We exemplarily refer to Auger et al. (2009);Yang et al. (2019) as two examples from this active research field. The hypervolume indicator is also used as an objective function when searching for good representations of the Pareto front. Heuristic and exact solution approaches for this subset selection problem were discussed in Guerreiro et al. (2016) and in Bringmann et al. (2014); Kuhn et al. (2016), respectively. When restricting the search to individual solutions rather than solution sets, then the hypervolume indicator can be interpreted as a nonlinear scalarizing function, see, for example, Hernandez et al. (2018); Touré et al. (2019); Schulze et al. (2020). This paper is organized as follows. In Section 2 a biobjective shape optimization model is introduced together with some fundamental properties of multiobjective optimization in the context of shape optimization. In Section 3 the hypervolume indicator is formally defined and its properties as a scalarizing functions are discussed and compared to the weighted sum method and the multiobjective gradient descent approach. Section 4 is devoted to the details of the numerical implementation. In particular, a finite element discretization and an iterative accent algorithm using smoothed gradients are presented. Furthermore, the choice of the reference point for the hypervolume scalarization is discussed. In Section 5 we consider in a 2 D case study the optimal shape of a rod under tensile load. The article is concluded in Section 6.

Biobjective Shape Optimization
In this section, we first provide a general formulation of multiobjective shape optimization problems (Section 2.1) and then focus on a particular engineering application, namely the biobjective shape optimization of a ceramic component under tensile load w.r.t. the two objectives mechanical stability and volume (Section 2.2).

Multiobjective Optimization
We consider multiobjective optimization problems occurring in engineering applications and particularly focus on multiobjective shape optimization problems. To keep the exposition general, we denote the set of all admissible shapes by O ad and refer to individual shapes as Ω ∈ O ad . Let J : O ad → R Q be a vector valued objective function comprising Q individual objective functions J i : O ad −→ R, i = 1, . . . , Q and let J(Ω) = (J 1 (Ω), . . . , J Q (Ω)) . Then the multiobjective shape optimization problem is given by . . , Q and J(Ω ) = J(Ω). The objective vector J(Ω) = (J 1 (Ω), . . . , J Q (Ω)) of an efficient solutionΩ is called nondominated point or nondominated outcome vector. An efficient solutionΩ (respectively a nondominated point J(Ω)) is called supported efficient (or supported nondominated, respectively) if there is no point y ∈ conv{J(Ω) : Ω ∈ O ad } dominating J(Ω). For an extensive introduction to multiobjective optimization see, e.g., Steuer (1986); Miettinen (1998);Ehrgott (2005).

Biobjective Optimization of Stability and Cost
In the biobjective case study described in this paper we consider a ceramic component under tensile load. The component is represented by a two dimensional compact body with Lipschitz boundary, denoted by Ω ⊂ R 2 (in general, Ω ⊂ R d , d = 2, 3). We assume that the boundary of Ω is composed of three parts, i.e., Here, ∂Ω D is the part where the boundary is fixed, i.e., where Dirichlet conditions are satisfied. The tensile load acts at ∂Ω N fixed , which is also fixed and where Neumann conditions apply. The remaining parts of ∂Ω can be modified during the optimization and are denoted by Ω N free . Furthermore, we assume that all reasonable shapes are contained in a (sufficiently large) bounded open setΩ ⊂ R 2 that satisfies the cone property (see, e.g., Bolten et al. (2015); Doganay et al. (2019)). Using this notation, the set of admissible shapes can be defined as O ad := {Ω ⊂Ω : ∂Ω D ⊂ ∂Ω, ∂Ω N free ⊂ ∂Ω, Ω satisfies the cone property}, see Figure 2 for an exemplary setting. A ceramic component under tensile load behaves according to the linear elasticity theory, see, e.g., Braess (2007). Its deformation under tensile loads is described by the state equation which is an elliptic PDE given by (2.2) Here, f ∈ L 2 (Ω, R 2 ) denotes the volume force density and g ∈ L 2 (∂Ω N fixed , R 2 ) denotes the surface loads. Moreover, u ∈ H 1 (Ω, R 2 ) describes the displacement  Fig. 2: Possible shape of a rod Ω under tensile load g field as the reaction on the given loads, the stress tensor is denoted by σ ∈ L 2 (Ω, R 2×2 ), and n(x) is the outward pointing normal vector at x ∈ ∂Ω (which is defined almost everywhere on ∂Ω). We refer to Doganay et al. (2019) for further details. Based on this formulation we consider the biobjective optimization of the mechanical stability and the cost of the component. Note that these two objective functions are usually conflicting since larger components tend to have a higher stability. We use the same formulation as Doganay et al. (2019) to assess the mechanical stability of the component via its probability of failure, that is to be minimized. Thus, our first objective function, J 1 , is based on a probabilistic Weibull-type model as suggested in Bolten et al. (2015): Here, m denotes the Weibull module that typically assumes values between 5 and 25, Du denotes the Jacobian matrix of u and σ 0 is a positive constant. The second objective function, J 2 , models the material cost which is approximated by the volume of the component Ω ∈ O ad , and which is also to be minimized: All in all, we obtain a biobjective PDE constrained shape optimization problem w.r.t. reliability and cost which is given by (2.5)

Hypervolume Scalarization
Given a biobjective optimization problem (2.5), the hypervolume indicator measures the hypervolume spanned by an outcome vector J(Ω) ∈ R 2 (or a set of outcome vectors J(S), respectively) and a predetermined reference point r ∈ R 2 (that should be selected such that it is dominated by all considered outcome vectors), see Figure 1. The value of the hypervolume indicator for a set of solutions S ⊂ O ad and an appropriate reference point r ∈ R 2 can be written as where R 2 + = {y ∈ R 2 : y i ≥ 0, i = 1, 2} denotes the positive orthant. Note that the hypervolume indicator is only meaningful when the reference point r is chosen such that r i ≥ J i (Ω) for i = 1, 2 and for all Ω ∈ S. In the case of a single solution Ω ∈ O ad the hypervolume indicator function simplifies to H(Ω) = (r 1 − J 1 (Ω)) · (r 2 − J 2 (Ω)), which corresponds to the area of the hypervolume rectangle defined by the corner points J(Ω) and r. While in EMO algorithms the hypervolume indicator is mainly used to evaluate the quality of already computed solutions (or solution sets, respectively), it can also be used as an objective function in a single-objective scalarization of (2.5) that aims at finding solutions that maximize the dominated hypervolume. For a given reference point r := (r 1 , r 2 ) ∈ R 2 , the hypervolume scalarization for (2.5) is formulated as (3.1) The constraints J 1 (Ω) ≤ r 1 and J 2 (Ω) ≤ r 2 ensure that H(Ω) is non-negative. They are redundant when all feasible outcome vectors lie below the reference point, i.e., when r is chosen such that J(Ω) r for all Ω ∈ O ad . If this is not the case, however, these two constraints can not be omitted in general, since also points that are strictly dominated by the reference point have positive hypervolume values and are thus potential candidates for maximizing H(Ω). Note that even in this case it is sufficient to include one of these constraints, since then the other will be automatically satisfied at optimality due to the maximization of the product H(Ω) = (r 1 − J 1 (Ω)) · (r 2 − J 2 (Ω)). Moreover, when problem (3.1) is solved with a steepest ascent algorithm (c.f. Section 4.2 below), starting with a shape Ω (0) such that J(Ω (0) ) < r (i.e., J 1 (Ω (0) ) < r 1 and J 2 (Ω (0) ) < r 2 ), then the constraints J 1 (Ω) ≤ r 1 and J 2 (Ω) ≤ r 2 will always be satisfied during the course of the algorithm and can hence be omitted. We conclude that while some caution is necessary with regard to the choice of the reference point r, the constraints J 1 (Ω) ≤ r 1 and J 2 (Ω) ≤ r 2 can in most cases be omitted. If not stated otherwise, we will hence assume in the following that the scalarized problem shares the same feasible set as the original problem.
The following result shows that under the above assumptions the hypervolume scalarization (3.1) always generates an efficient solution. This result is not new, however, we include a proof for the sake of completeness.
Theorem 1 An optimal solution of the hypervolume scalarization (3.1) is an efficient solution of the corresponding biobjective optimization problem (2.5).
Note that Theorem 1 also follows directly from the theory of achievement scalarizing functions of which the hypervolume indicator is a special case (see, e.g., Wierzbicki (1986a,b); Miettinen (1998)). Indeed, since −H(Ω) is a strongly increasing function (i.e., J(Ω) J(Ω ) implies that −H(Ω) < −H(Ω )), an optimal solution of max Ω∈O ad H(Ω) = min Ω∈O ad −H(Ω) is efficient for (2.5). Efficient numerical optimization procedures require gradient information to obtain fast convergence. In this situation, we can take advantage of the fact that the gradient of the hypervolume indicator can be determined from the gradients of the individual objective functions using the chain-rule of differentiation: Observe that the gradient vector of the hypervolume indicator is a negative weighted sum of the gradients of the individual objective functions.

Hypervolume Scalarization Versus Weighted Sum Scalarization
Let λ = (λ 1 , λ 2 ) ∈ R 2 + with (λ 1 , λ 2 ) = (0, 0) be a given weight vector. Then the weighted sum scalarization of problem (2.5) is given by Note that the weights are usually normalized such that λ 1 + λ 2 = 1. We omit this normalization to simplify the notation in the following. Note that this does not affect the properties of the weighted sum scalarization. Note also that under the assumption that the reference point in the hypervolume scalarization (3.1) is chosen sufficiently large the feasible sets of the hypervolume scalarization (3.1) and the weighted sum scalarization (3.3) coincide. Thus, both scalarizations share the advantage that the feasible set of the original multiobjective optimization problem is not modified, i.e., no additional constraints have to be added to the problem formulation. However, since the level sets of the hypervolume indicator in the objective space are non-convex, non-supported efficient solutions can be determined using the hypervolume scalarization, which is not possible using the weighted sum scalarization. Formally, the level set of the hypervolume indicator in the objective space at level H(Ω) is given by The non-convexity of the hypervolume indicator as a function of the outcome vector J(Ω) is easy to see when rewriting H(Ω) as a quadratic form Since the quadratic form is negative semi-definite, H(Ω) is a concave function. Thus, the hypervolume scalarization is not limited to supported efficient solutions. Even more so, by interpreting the reference point as a scalarization parameter, every efficient solution can be obtained as an optimal solution of a hypervolume scalarization with an appropriately chosen reference point.
Note that this is of rather theoretical interest since this choice of the reference point assumes that the corresponding efficient solution is already known. Practical choices for reference points in the context of biobjective shape optimization are discussed in Section 4.3. In the following, let r = (r 1 , r 2 ) be a fixed reference point and letΩ ∈ O ad be a strictly feasible solution of (3.1), i.e., J(Ω) < r. To interrelate hypervolume scalarizations with weighted sum scalarizations, we analyze the level sets and the gradient of the hypervolume indicator. To simplify the notation in the biobjective case, we denote the two sides of the hypervolume rectangle spanned by J(Ω) = (J 1 (Ω), J 2 (Ω)) and r by a(Ω) := r 1 − J 1 (Ω) and b(Ω) := r 2 − J 2 (Ω), respectively, see Figure 3. Now the hypervolume induced by J(Ω) can be rewritten as (3.4) Using this notation, the contour line of the hypervolume indicator at the level H(Ω) can be described by the parameterization hΩ : R −→ R given by Figure 3 for an illustration. Denoting the contour line by we can conclude that all points (J 1 , hΩ(J 1 )) ∈ hΩ attain the same hypervolume indicator value as J(Ω). For example, the two highlighted rectangles in Figure 3 have the same area, and hence the same hypervolume indicator value. The derivative of hΩ(J 1 ) w.r.t. J 1 can be computed as Evaluating this derivative at J 1 = J 1 (Ω) yields which is the slope of the tangent to the level set of the hypervolume indicator at the point J(Ω), see again Figure 3. Note that this slope corresponds to the ratio of the two sides of the hypervolume rectangle, and thus also corresponds to the slope of the diagonal passing through two opposite corners of the current hypervolume rectangle as illustrated in Figure 3, given by the equation As can be seen in Figure 3, the tangent to the level set of the hypervolume indicator at the point J(Ω) also corresponds to the contour line of the weighted sum scalarization W (Ω) with weights (λ 1 , λ 2 ) = (b(Ω), a(Ω)) at this point. This contour line can be described by a function of J 1 , similar to (3.6), as The corresponding contour line is denoted by As a consequence, the contour lines of both scalarizations, the hypervolume indicator and the weighted sum function, have the same slope at the point J(Ω), and hence both scalarizations share the same direction of largest improvement. Indeed, if we consider the gradient ∇H(Ω) of the hypervolume indicator at the point J(Ω) (which is the steepest ascent direction, see (3.2)) we obtain Similarly, the steepest descent direction for the weighted sum scalarization Hence, in an iterative algorithm the direction of steepest ascent of the hypervolume indicator (which is to be maximized) corresponds to the direction of steepest descent w.r.t. the weighted sum scalarization (which is to be minimized) with weights (λ 1 , λ 2 ) = (b(Ω), a(Ω)). Note that these associated weights are determined by the ratio of the sides of the hypervolume rectangle.
The above analysis implies that if an iterative ascent algorithm for the hypervolume scalarization (3.1) is initialized with a strictly feasible starting solution Ω (0) ∈ O ad , i.e., J(Ω (0) ) < r, then moving in the direction of steepest ascent ensures that the next iterate Ω (1) is also strictly feasible irrespective of the step length, i.e., J 1 (Ω (1) ) < r 1 and J 2 (Ω (1) ) < r 2 . Hence, the constraints J 1 (Ω) ≤ r 1 and J 2 (Ω) ≤ r 2 can be omitted in (3.1) when starting with a strictly feasible starting solution. Note that, for any strictly feasible solution Ω ∈ O ad , the side lengths of the hypervolume rectangle a(Ω) and b(Ω) correspond to the slackness of the constraints J 1 (Ω) ≤ r 1 and J 2 (Ω) ≤ r 2 . This is reflected in the gradient direction ∇H(Ω) at J(Ω) that is composed of a weighted sum of the gradients of the individual objective functions ∇J 1 (Ω) and ∇J 2 (Ω). The ratio of the side lengths of the hypervolume rectangle defines the relative contribution of ∇J 1 (Ω) and ∇J 2 (Ω) to the ascent direction ∇H(Ω). If this ratio is very large (very small, respectively), which is the case when one side of the hypervolume rectangle is considerably larger than the other, then the opposite objective function is emphasized in the computation of ∇H(Ω).

Hypervolume Scalarization Versus MO Descent Algorithms
Multiobjective gradient descent algorithms were proposed in Fliege and Svaiter (2000); Désidéri (2012); Fliege et al. (2019). In the minimization case, they can be interpreted as iterative optimization processes with search direction 2 . These weights are chosen adaptively to achieve the steepest descent w.r.t. all objectives in each iteration. This is in contrast to applying a single-objective gradient descent algorithm to the weighted sum scalarization (3.3), in which the weights are predefined and thus fixed throughout all iterations of the optimization procedure. We argue that performing gradient ascent steps w.r.t. the hypervolume indicator is somewhat similar to applying a multiobjective gradient descent algorithm: The weights ω (k) 1 = b(Ω (k) ) and ω (k) 2 = a(Ω (k) ) of the gradients of the individual objective functions are chosen adaptively, depending on the side lengths of the hypervolume rectangle induced by the current iterate, c.f. (3.7). While each iterate Ω (k) in a multiobjective gradient descent algorithm dominates all previous iterates Ω (k− ) for all = 1, . . . , k, i.e., J(Ω (k) ) J(Ω (k− ) ), this is in general not the case when applying a single-objective ascent algorithm to the hypervolume scalarization or a single-objective descent algorithm to the weighted sum scalarizazion, respectively. Indeed, only single-objective improvements can be guaranteed with respect to the hypervolume indicator and the weighted sum objective, i.e., H(Ω (k) ) ≥ H(Ω (k− ) ) and λ J(Ω (k) ) ≤ λ J(Ω (k− ) ), respectively. This can be seen in Figure 3 when comparing the dominance cone J(Ω) − R 2 + (which contains the image of the next iterate in a multiobjective descent algorithm), the set of improving points for the hypervolume indicator given by hΩ − R 2 + = {y ∈ R 2 : (r 1 − y 1 ) · (r 2 − y 2 ) ≥ H(Ω)}, and the set of improving points for the weighted sum objective given by , a(Ω))). Since the dominance cone J(Ω) − R 2 + is a proper subset of the respective sets of improving points, multiobjective gradient descent algorithms are more restrictive regarding the choice of the next iterate. The weighted sum scalarization is in this regard the most flexible approach, allowing for the deterioration of individual objective functions as long as the weighted sum value improves. In this respect, we can say that the hypervolume indicator function compromises between the other two approaches.

Contour Lines of the Hypervolume Indicator
The regions of the Pareto front which can be attained using a scalarization method are determined by the level sets and contour lines of the scalarizing function. Consider, for example, the contour lines of the weighted sum scalarization (hyperplanes) and the weighted Chebyshev scalarization (hyperboxes with weight-dependent aspect ratio), respectively. While the former is restricted to supported nondominted points, the latter has the potential to determine all nondominated points, including the non-supported ones. The contour lines of the hypervolume indicator for a given reference point r ∈ R 2 are illustrated in Figure 4a. It can be seen that their shape strongly depends on their Hausdorff distance from the reference point. The closer the reference point is to a nondominated outcome vector in at least one component, the more the corresponding contour line resembles the contour line of a weighted l ∞ function, i.e., a weighted Chebyshev scalarization. On the other hand, the further the reference point is from a nondominated outcome vector (in all components), the more the contour line resembles that of an l 1 function, that is, a weighted sum scalarization with equal weights. More formally, considering again the parameterization of the contour line of the hypervolume indicator w.r.t. J 1 , see (3.5), and assuming for now that J 1 < J 1 (Ω) < r 1 , we can see that This implies that for r → J(Ω) the contour line hΩ converges point-wise to a horizontal line passing through r when J 1 < J 1 (Ω). By using a parameterization of hΩ w.r.t. J 2 it can be shown that the contour line hΩ converges point-wise to a vertical line passing through r when J 2 < J 2 (Ω). Similarly, when r = (r 1 , r 1 ) (i.e., r 2 = r 1 ) and r 1 → ∞ we obtain for J 1 = J 1 (Ω) that and hence hΩ converges point-wise to a line with slope −1 and passing through J(Ω). The contour lines passing through a given point J(Ω), that are generated by different reference points, are illustrated in Figure 4b.

Numerical Implementation
The hypervolume scalarization (3.1) of the biobjective shape optimization problem (2.5) is solved using an iterative ascent algorithm in the framework of a 'first discretize, then optimize' approach. The discretization of the shape optimization problem is based on a finite element approach (see, e.g., Braess (2007)). Consequently, the state equation (2.2) and the objective functions given in (2.3) and (2.4) are evaluated on a finite element grid. In this section details of the numerical implementation are presented including the discretization approach using structured finite element grids, the determination of ascent directions for the hypervolume scalarization, and the update scheme for the current shape. Finally, the choice of the reference point is discussed.

Finite Element Discretization
The component to be optimized (i.e., the current shape) is discretized via finite elements on a structured grid in the following way. Consider a rectangular domainΩ with widthΩ l and heightΩ h that entirely covers the initial shape Ω (0) (see Figure 2 for an illustration). The domain is discretized by a regular triangular grid denoted byT with n x and n y grid points in x-and y-direction, respectively. The corresponding mesh sizes are given by m x =Ω l /(n x − 1) and m y =Ω h /(n y − 1). The mesh sizes should be chosen such that m x ≈ m y in order to avoid distorted finite elements. In a next step, as visualized in Figure 5, the boundary ∂Ω (0) of the initial shape is superimposed onto the grid and the closest grid points are moved onto the boundary in a controlled way, such that degenerate elements do not occur. More precisely, it is guaranteed that the maximum angle condition (Babuska and Aziz (1976)) is satisfied. The adapted grid is denoted by T ∞ 0 . The computations are performed only on that part of the grid that represents Ω (0) which is called the active grid and denoted by T 0 , see Figure 5.
In the iterative optimization process (see Section 4.2 for more details) the active grid T k+1 of the next iteration, k = 0, 1, 2, . . . , representing the updated shape Ω (k+1) , is generated by starting again fromT . Consequently, the connectivity is maintained throughout all iterations. Rather than going through all elements (as required in the first iteration) we now incorporate the information from the previous mesh T ∞ k . In the 'updating procedure' it is only necessary to iterate over elements in a certain neighborhood of previous boundary elements, and adapt them if necessary. Note that a complete re-meshing is necessary only when the next iterate Ω (k+1) differs too much from Ω (k) , which is usually only the case when the update step exceeds the mesh size in at least one grid point. This property and the persistent connectivity lead to a significant speed up compared to standard re-meshing techniques. As stated above, in most iterations the majority of the elements inside the component remain unchanged. This leads to an additional advantage over both standard re-meshing and mesh morphing approaches. In regions where elements are not changed, the entries of the governing PDE are also not changed. Therefore, a full re-assembly of the system matrix and the right hand side is avoided and only entries corresponding to modified elements have to be updated. In the following, we distinguish between the continuous shape Ω (k) and its (approximative) representation on the finite element grid. The discretized shape, induced by the active grid T k , is denoted by Ω (k) nx×ny . This implicates that the gradient of the hypervolume indicator ∇H is evaluated w.r.t. the discretized shape, i.e., ∇H(Ω (k) nx×ny ) is defined on the grid points given by T k .

Iterative Ascent Algorithm
Ascent direction and update scheme. In the following we develop an iterative update scheme for the biobjective shape optimization problem (2.5). As stated above, the finite element discretization of the component is based on a regular grid, where only the grid points that are close to the boundary of the current shape Ω (k) are adapted to the boundary of the component. All finite elements that are not adjacent to the boundary of the current shape Ω (k) remain unchanged and form a subset of the regular grid. Whenever the component is updated, we thus want to modify the boundary points of the component without moving inner or outer grid points. For this purpose we consider the grid points lying on the boundary of the shape Ω (k) , or rather of its finite element representation Ω (k) nx×ny . Since these grid points are induced by the active grid T k , we call them boundary points and denote them by ∂T k . The discretized shape Ω (k) nx×ny in iteration k is updated by determining a search direction and calculating a step length. For a given search direction d (k) and step length t k , we define the update of the component by the expression (4.1) The operation ⊕ is defined as follows: First, the boundary points are moved by t k · d (k) resulting in Then ∂T k is fitted by cubic splines, resulting in the updated shape Ω (k+1) . Finally, the new active grid T k+1 is generated based on ∂Ω (k+1) as detailed in Section 4.1. This defines Ω (k+1) nx×ny in (4.1). Note that the boundary points ∂T k+1 of iteration (k + 1) do not have to coincide with ∂T k since T k+1 is generated based on ∂Ω (k+1) , which is defined by cubic splines. The search direction d (k) is computed based on the gradient of the hypervolume indicator function ∇H(Ω (k) nx×ny ). However, due to the discretization ∇H(Ω (k) nx×ny ) is usually largely irregular and needs to be smoothed to avoid zigzagging boundaries and overfitted spline representations of the next iterate. To obtain a smooth deformation field for the boundary points, we follow the approach suggested in Schmidt and Schulz (2009)  nx×ny ) of an exemplary shape. The step length t k in direction d (k) is determined according on the Armijorule (see, e.g., Geiger and Kanzow, 1999;Bazaraa et al., 2006). For given parameters σ, β ∈ (0, 1) we set

Algorithm 1: Iterative optimization scheme for hypervolume maximization
Re-scaling the search direction. In order to avoid re-meshing operations whenever possible, it is often advantageous to re-scale the search direction such that the gird T ∞ k+1 can be computed by performing a simple grid adaption (c.f. Section 4.1). Indeed, when for example for a large gradient (e.g., resulting from large given loads) the step t k = 1 is not feasible, this leads to numerical problems when implementing Algorithm 1. To avoid such situations, the search direction d (k) is re-scaled in the following way: First, the norm d (k) ij 2 is calculated for all grid points (i, j), i = 1, . . . , n x and j = 1, . . . , n y , where d (k) ij is the entry of d (k) for grid point (i, j). Since d (k) is only defined at the boundary points, d (k) ij is set to zero for grid points (i, j) that are not related to the boundary points. When max i,j d ij 2 is larger than a predefined percentage p max of the minimum mesh size min{m x , m y }, with p max < 1, then the search direction is re-scaled by the factor s(p max ) given by By computing a step length t k ∈ (0, 1] it is then guaranteed that the boundary points do not move more than the mesh size, and a grid apdation can be performed. Furthermore, numerical tests have shown that near an efficient solution the search direction d (k) becomes very small, thus slowing down the potential improvement while the unsmoothed gradient ∇H(Ω (k) nx×ny ) may still be large. Then the search direction is re-scaled when max i,j d ij 2 is less than a given percentage p min ∈ (0, 1) of the mesh size. In this case, the search direction is multiplied by s(p min ).
Grid refinement. In order to accelerate the convergence to the Pareto front, it may be useful to start the optimization on a coarse grid and later continue on a finer grid. Since for numerical reasons each update step is restricted to the mesh size, we expect that a coarse grid allows for larger steps in early stages of the optimization procedure. Moreover, the evaluation of objective function values and gradients on the coarse grid is generally faster. On the other hand, with a finer resolution of the grid the accuracy of the approximation of objective function values and gradients is improved. Thus, near an efficient solution large steps are not necessary and a finer grid may result in a better approximation of the shape.

Choice and Variation of the Reference Point
For the iterative ascent algorithm (Algorithm 1) a feasible starting solution Ω (0) nx×ny and a reference point r ∈ R 2 with r > J(Ω (0) nx×ny ) are needed as input. Since in general neither the Pareto front nor efficient shapes are known beforehand, the reference point can only be chosen based on a known feasible shape Ω (0) nx×ny that may actually be far from the Pareto front. To ensure that the hypervolume scalarization (3.1) is well defined and that the constraints J 1 (Ω) ≤ r 1 and J 2 (Ω) ≤ r 2 are redundant throughout the execution of Algorithm 1 (c.f. Section 3 for a discussion of this issue,) we select the reference point r such that with positive parameters δ 1 , δ 2 > 0. The impact of the reference point on the hypervolume indicator was already discussed in Auger et al. (2009). They derive properties of optimal µ distributions depending on the choice of the reference point, i.e., of representative subsets of the Pareto front of cardinality µ maximizing the joint hypervolume. Moreover, they derive an explicit lower bound for the reference point guaranteeing that, under appropriate conditions, the extreme solutions are contained in the representation. In order to take advantage of the fact that, at least theoretically, the complete Pareto front can be generated by varying the reference point in the hypervolume scalarization (3.1) (see Corollary 2), we suggest a systematic approach to approximate the Pareto front of problem (2.5) by repeatedly solving problem (3.1) with different reference points. The ultimate goal is the determination of a Pareto front generating reference set (PFG reference set for short). In order to formally introduce the concept of a PFG reference set, let Y ND ⊂ R 2 denote the Pareto front in the objective space and let Y PND denote the set of properly nondominated outcome vectors according to Geoffrion (1968) (briefly speaking this is the set of all nondominated outcome vectors that have bounded trade-offs). Moreover, let Y H (r) denote the optimal outcome vector obtained from solving problem (3.1), and let Y H (R) := r∈R Y H (r) denote the set of all optimal outcome vectors obtained over all reference points in the set R ⊆ R 2 , called reference set. Note that a PFG reference set always exists since, for example, R = Y ND is a valid selection, c.f. Corollary 2. Note also that a PFG reference set may contain redundant reference points that can be omitted without loosing the PFG property. From a practical point of view, we can not expect to solve the hypervolume scalarization (3.1) to global optimality and thus only aim at an approximation of the Pareto front Y ND . Towards this end, we suggest to choose the reference set R based on varying values of δ = (δ 1 , δ 2 ) > (0, 0) in (4.5). More precisely, we suggest to set (δ 1 , δ 2 ) = (ξ, 1 ξ ) with ξ > 0 in (4.5), i.e.,

Definition 3 A set R
, the Pareto front lies completely 'below' the reference set. For reference points from the reference set R H the result of Theorem 1 can be strengthened.
Lemma 4 An optimal solution of the hypervolume scalarization (3.1) w.r.t. a reference point from the set R H is a properly efficient solution of the corresponding biobjective optimization problem (2.5).
Proof Consider an optimal solution of the hypervolume scalarization (3.1) y = J(Ω) ∈ Y H (r) for an arbitrary reference pointr ∈ R H . Since J(Ω (0) nx×ny ) <r we know that for this reference point H(Ω (0) nx×ny ) > 0, and hence also H(Ω) > 0. Therefore, the hypervolume rectangle induced by y andr must have positive side lengths a(Ω) > 0 and b(Ω) > 0. This implies that y is locally optimal for a weighted sum scalarization with weights λ 1 = b(Ω) > 0 and λ 2 = a(Ω) > 0 since this has the same cone of improving directions as the hypervolume indicator in this point (see Section 3.1 for a detailed derivation). It follows that y ∈ Y PND . Note that Lemma 4 is in line with the results of Auger et al. (2009) who showed that, irrespective of the choice of the reference point, nondominated points with unbounded trade-off are not part of what they call 'optimal µ distributions', that is, of subsets of points on the Pareto front that maximize the joint hypervolume. Note also that by varying ξ > 0 different reference points are obtained that lie along a hyperbola in the objective space, see Figure 7 for an illustration. Indeed, for two parameter values 0 < ξ 1 < ξ 2 we have that r 1 (ξ 1 ) < r 1 (ξ 2 ) and r 2 (ξ 1 ) > r 2 (ξ 2 ). Hence, the hypervolume rectangle spanned by J(Ω (0) nx×ny ) and r(ξ 1 ) cannot be contained in the hypervolume rectangle spanned by J(Ω (0) nx×ny ) and r(ξ 2 ) and vice versa. However, for all associated hypervolume rectangles it holds that a(Ω (0) nx×ny ) · b(Ω (0) nx×ny ) = 1. While we can not guarantee that the set R H is a PFG reference set in general, we argue that the chances are high that the reference set R H induces solutions with varying objective values. This is validated by the numerical experiments described in Section 5 below. From a theoratical perspective, we argue that at least for convex problems the reference set R H is pPFG, i.e., the properly efficient set can be generated by varying the reference points in R H .
Theorem 5 Suppose that the biobjective optimization problem (2.5) is convex, i.e., all efficient solutions of (2.5) are supported efficient solutions. Then Proof First, recall that Lemma 4 implies that Y H (R H ) ⊆ Y PND . Thus, it remains to be shown that Y PND ⊆ Y H (R H ). Now let y = J(Ω) ∈ Y PND be properly efficient. Then there exist weights λ 1 , λ 2 > 0 such thatΩ is optimal for the associated weighted sum scalarization (3.3), see, e.g., Ehrgott (2005); Geoffrion (1968). Now define a rectangle with sides a := λ 2 > 0 and b := λ 1 > 0 and let r := J(Ω) + (a, b) . It is easy to see that the hypervolume rectangle spanned by J(Ω) and r is precisely the rectangle with side lengths a and b. From the discussion in Section 3.1 we can conclude thatΩ is optimal for the hypervolume scalarization (3.1) with reference point r since in this situation the set of improving points w.r.t. the hypervolume indicator is completely contained in that of the weighted sum (which is empty in this case). See Figure 8 for an illustration of the respective contour lines. Moreover, J(Ω) is the unique optimal outcome vector of the hypervolume scalarization (3.1) in this case. Since this property only relies on the ratio b a of the side lengths of the hypervolume rectangle, the reference point can be moved along the half-line starting at J(Ω) and passing through r until it intersects the set R H . This intersection point exists and can be determined as the positive solution ξ of the system We obtain two solutions for ξ, namely Since a b > 0 we have that ( c 2 4 + a b ) 1 2 > | c 2 | and hence exactly one of the two solutions for ξ is positive, irrespective of the sign of c. This solution yields the sought reference point r(ξ) ∈ R H , and we can conclude that Y PND ⊆ Y H (R H ).
Note that even though Theorem 5 is restricted to convex problems, the set R H may be pPFG also for non-convex problems. This depends on the degree of non-convexity on one hand, and on the distance of the reference set R H from the Pareto front in the objective space on the other hand, where the latter is defined by the image of the starting solution J(Ω (0) nx×ny ).

Numerical Tests
We consider a 2 D test case of a ceramic rod consisting of Beryllium oxide (BeO). The rod is 0.6 m long and has a height of 0.1 m. It is fixed on one side while tensile load is applied on the other side, modeling a horizontal load transfer. On the fixed boundary Ω D of the rod the displacement field u(x) is zero, i.e., here Dirichlet boundary conditions hold, c.f. (2.2). The part Ω N fixed denotes the boundary on which the surface loads g are applied, i.e., where Neumann boundary conditions hold. The remaining parts of the boundary, denoted by Ω N free , can be modified according to the optimization objectives whereas the boundaries Ω D and Ω N fixed are fixed during the optimization. In particular, the corner points cannot be moved. In Figure 2 the rod and the decomposition of its boundary into Ω D , Ω N fixed and Ω N free are illustrated. As we consider the same ceramic material BeO as in Doganay et al. (2019), we use the same material parameter setting in order to evaluate the state equation (2.2) and the PoF objective (2.3). In particular, Young's modulus is set to E = 320 GPa, Poisson's ratio to ν = 0.25, the ultimate tensile strength to 140 MPa and the Weibull parameter to m = 5. The surface loads, acting on Ω N fixed , are set to g = (10 8 , 0) Pa and the volume force density is set to f = (0, 1000) Pa. To speed up the numerical evaluation of the state equation, an improved discretization approach based on regular grids is employed, see Section 4.1 for more details and Figure 9 for an illustration of the considered shape. In contrast to Doganay et al. (2019), volume forces that model, e.g., gravity, are included in the numerical evaluation. We choose a bent rod as shown in Figure 2 as the initial shape (i.e., as the starting solution) for the optimization that is implemented using Algorithm 1. Note that this shape is clearly not efficient. Indeed, in this situation the efficient shapes are straight rods of different width, trading off between small probabilities of failure and high volumes on one hand, and high probabilities of failure and low volumes on the other hand. This simple setting has the advantage that the optimization results can be validated and compared with the 'true' Pareto front. The rectangular domainΩ that contains all admissible shapes O ad is given by a rectangle of widthΩ l = 0.7 m and heightΩ h = 0.35 m. The initial shape is placed in the interior of this rectangle. Moreover, the whole setting is slightly rotated so that the boundaries of the initial shape can be represented by interpolating cubic splines. The domainΩ is discretized by a triangular grid with n x × n y grid points as described in Section 4.1. Hereby, we determine the number of grid points such that a prescribed mesh size of m x = m y ≈ 0.02 m is realized. In our setting, this results in a discretization with 36 × 18 grid points, see Figure 9. Note that this shape representation has more degrees of freedom than the B-spline representation in Doganay et al. (2019). This significantly increases the flexibility of our approach. In the numerical tests reported below, we use the Armijo-rule (4.3) with parameters β = 0.5 and σ = 0.1. Moreover, the smoothed gradient d (k) is adaptively scaled such that max i,j d (k) ij 2 is within 10% to 90% of the mesh size, i.e., p min = 0.1 and p max = 0.9 as explained in Section 4.2. The optimization process is restricted to 500 iterations, i.e., K = 500. To avoid tiny step sizes, the Armijo-rule is restricted to 30 iterations, i.e., ∈ {0, . . . , 30}. Whenever the Armijo rule does not yield a sufficient improvement for reasonable step sizes, the algorithm stops. Moreover, the algorithm stops when the Frobenius norm of the unscaled search direction d (k) is lower than ε = 10 −10 . Twelve exemplary reference points r(ξ) are selected from the pPFG reference set R H (see (4.6)), with ξ ∈ {0.0005, 0.001, 0.0015, . . . , 0.005, 0.006, 0.007}. Algorithm 1 is implemented in Python 3.6 and all tests are run under opensuse 15.0 on an IntelCore i7-8700 with 32 GB RAM.

Results
The twelve optimization runs with varying reference points yield 12 different mutually nondominated outcome vectors. The resulting approximation of the Pareto front is shown in Figure 10. As was to be expected, the optimized shapes resemble a straight rod with varying thickness. The objective vector of the initial shape Ω 36×18 is (18456.28, 0.06) , which is actually far form the Pareto front. Several exemplary shapes are shown in Figure 10. In addition to the illustration of the respective shapes, the stresses in the direction of the tensile load acting on the finite elements are depicted as arrows at the respective center points. Note that the diagonal pattern of the arrows is due to the rotated (not axisparallel) grid structure. It turns out that the iterates evolve similarly for different optimization runs. For about 50 iterations the hump of the rod decreases quickly, and the Armijo rule often returns a sufficient improvement already for t k = 1. Thus, the shape converges quickly to a nearly straight joint, however, usually with a wavy boundary, and its probability of failure improves considerably. In later iterations the changes are less visible and the shape converges slowly to a straight joint, the thickness of which depends on the chosen reference point. During the optimization process, the volume decreases only slowly as the wavy boundary gets smoother. The optimization runs require between 292 and 486 iterations with one exception: for the reference point r(0.0005) the optimization process stops at the maximum of 500 iterations. In all other cases the algorithm terminated in the Armijo rule. The solutions from the coarse 36 × 18 grid were used as starting solutions on a refined 71 × 36 grid with mesh size m = 0.01 m. Exemplary results for the reference points r(0.001) and r(0.003) are shown by red crosses in Figure 10. The respective starting solutions, which are now evaluated on a 71 × 36 grid, are illustrated by red circles in the same Figure. The results are smoother, see Figure 11, but require significantly more computational time. Nevertheless, in this test case setting the coarse grid approximation obtained already considerably good results, which are not dominated by the corresponding results on the finer grid. Moreover, numerical tests show that shapes representing very thin or thick straight rods, respectively, induce numerical difficulties due to the fixed boundary parts and, particularly, their corner points. Thick rods suffer from overfit-Johanna Schultes et al.

Preprint -Preprint -Preprint -Preprint -Preprint -Preprint
(a) Solution Ω * 36×18 evaluated on a 71 × 36 grid (b) Solution Ω * 71×36 Fig. 11: Comparison of the coarse and fine grid solution obtained for reference point r(0.003) ted spline representations of their boundaries. Thin rods, on the other hand, lead to distorted finite elements at the spiky corner points.

Conclusion
The hypervolume indicator function is a well suited scalarization approach for multiobjective shape optimization problems, since it does not require an explicit handling of additional constraints. In contrast to the weighted sum approach which shares this advantage it is also capable of computing nonsupported efficient solutions. The nonlinear dependency on the individual objective functions is not a critical issue in this context since the individual objective functions are already nonlinear. Furthermore, the gradient of the hypervolume can be easily computed based on the gradients of the objective functions. We show that for convex problems, a non-trivial reference set generating the set of properly nondominated outcome vectors (pPFG reference set) can be determined without prior knowledge on the Pareto front. We validate the approach at a case study from engineering design where the shape of a horizontal bar under tensile load is to be optimized w.r.t. reliability and cost. Clearly, there are other possibilities for defining pPFG reference sets. We will test and analyze different choices of reference sets in the future. Moreover, the question if and how non-trivial pPFG reference sets can be defined for non-convex problems -without knowing the Pareto front beforehand -will be analyzed. From an engineering point of view, 3 D shape optimization are still a challenge that needs further attention.