Smoothing inertial method for worst-case robust topology optimization under load uncertainty

We consider a worst-case robust topology optimization problem under load uncertainty, which can be formulated as a minimization problem of the maximum eigenvalue of a symmetric matrix. The objective function is nondifferentiable where the multiplicity of maximum eigenvalues occurs. Nondifferentiability often causes some numerical instabilities in an optimization algorithm such as oscillation of the generated sequence and convergence to a non-optimal point. We use a smoothing method to tackle these issues. The proposed method is guaranteed to converge to a point satisfying the first-order optimality condition. In addition, it is a simple first-order optimization method and thus has low computational cost per iteration even in a large-scale problem. In numerical experiments, we show that the proposed method suppresses oscillation and converges faster than other existing methods.


Introduction
Recently, topology optimization considering robustness has been attracting much attention in real-world engineering. There are several approaches to consider the robustness of a structure, such as a probabilistic approach (Dunning and Kim 2013) and a worst-case approach (Takezawa et al. 2011). We consider the latter approach in this paper. See, e.g., Kanno (2020), for the difference between these concepts.
In topology optimization of continua, Takezawa et al. (2011) formulate a worst-case compliance minimization problem under load uncertainty as a minimization problem of the maximal eigenvalue of a symmetric matrix. The maximum eigenvalue is nonsmooth (i.e., nondifferentiable) where the multiplicity of eigenvalues occurs. Therefore, this problem is categorized as a nonsmooth optimization problem. Takezawa et al. (2011) use the directional derivative of the maximum eigenvalue where the multiplicity of eigenvalues occurs and apply the method of moving asymptotes (MMA) (Svanberg 1987) to the problem. Although it converges fine in many cases, it oscillates and does not converge if the multiplicity of eigenvalues occurs at an optimal solution as mentioned in Thore (2022). This fact signifies the importance of the convergence guarantee in this type of nonsmooth problem. Holmberg et al. (2015) reformulate the problem into a nonlinear semidefinite programming (NSDP) problem. Then, utilizing the Cholesky factorization, they solve it with a standard nonlinear solver (the interior-point method of IPOPT (IPOPT 2022)). By this approach, they can avoid nonsmoothness and guarantee the convergence to a solution satisfying the first-order optimality condition. However, the interior-point method has a large computational cost per iteration in a large-scale problem because it requires (approximate) Hessian information and a solution of linear equation (e.g.,Eq. (13) in Wächter and Biegler (2006) for computation of search directions) at every iteration. Even if we use approximate Hessian, the computational cost per iteration of the general-purpose interior-point method becomes larger, especially in large-scale nonlinear optimization, than simple gradient descent based methods. A topology optimization problem can be very large-scale when considering 3D structural design, and thus, an efficient method for largescale problems is needed. There are other approaches to worst-case robust structural optimization, e.g., Cherkaev (2004, 2008).
Eigenvalue optimization problems appear in other applications of topology optimization such as vibration and buckling problems (Díaaz and Kikuchi 1992;Kočvara 2002;Ohsaki et al. 1999;Yamada and Kanno 2016). The influence of multiplicity of eigenvalues at an optimal solution has been extensively studied in the field of structural optimization (Seyranian et al. 1994). Eigenvalue optimization has also been extensively studied outside of the field of structural optimization. It is closely related to semidefinite programming (Lewis and Overton 1996;Helmberg and Rendl 2000). It also appears in the field of control engineering (Apkarian et al. 2008). However, many studies focus on convex or unconstrained cases, thus they are not applicable to the eigenvalue optimization in topology optimization where the objective function is nonconvex and the volume constraint is applied in most cases.
A smoothing method is a nonsmooth optimization framework, which utilizes a smooth (i.e., differentiable) approximation of a nonsmooth function called a smoothing function. A smoothing function has a parameter that controls the approximation accuracy; a smoothing function converges to the original nonsmooth function when we take the limit of the parameter. The advantage of smoothing methods is that, by using smooth approximation, we can utilize welldeveloped smooth optimization methods. The basic idea of smoothing methods has a long history (Bertsekas 1975;Zang 1980), and there are many types of smoothing methods depending on the update scheme of a smoothing parameter (Nesterov 2007;Chen 2012;Bian and Wu 2021) and the smooth optimization methods they are based on: e.g., smoothing projected gradient method (Zhang and Chen 2009), smoothing augmented Lagrangian method (Xu et al. 2015), etc.
A smooth approximation has been utilized in some structural (topology) optimization problems such as a problem with stress constraints (Yang and Chen 1996), a dynamic problem (Torii and de Faria 2017), and a buckling problem (Ferrari and Sigmund 2019). However, they solve an approximated problem with a fixed smoothing parameter, and thus, the obtained solution is not the optimal solution of the original nonsmooth problem, but rather a solution of an approximated problem. In this paper, we adopt a smoothing method based on Chen (2009), Chen (2012) which updates a smoothing parameter at each iteration so that the convergence to the optimal solution of the original problem is guaranteed.
Recently, simple first-order optimization methods, which only require the first-order derivative and the value of the objective function and do not require solutions of complicated subproblems, have been attracting much attention, especially in the machine learning literature. They have low computational cost per iteration, and thus are suitable for large-scale problems. There are many researches on accelerating the convergence of simple firstorder methods (d'Aspremont et al. 2021;Ghadimi and Lan 2016;Li and Lin 2015;Nesterov 1983;Ochs et al. 2014). Recently, these optimization algorithms have been applied to some topology optimization problems (Li and Zhang 2021;Kanno 2021, 2023). Beck (Beck 2017) gives various examples of simple first-order methods.
In this paper, we propose a smoothing method for solving a worst-case topology optimization problem. It is a simple first-order method and suited for large-scale problems. It has a convergence guarantee to a solution satisfying the first-order optimality condition of the original problem and suppresses the oscillation caused by nonsmoothness at the optimal solution. We propose an inertial technique based on Ochs et al. (2014) to accelerate the smoothing method and discuss the parameter setting (the smoothing parameter and the stepsize parameters) of the proposed method for better convergence in our problems. Compared to existing methods, MMA and NSDP, the proposed method consists of simple and comprehensible update formula without solutions of subproblems nor linear equations at each iteration. Therefore, the proposed method is easy to implement. In the numerical experiments, we compare the proposed method with two existing methods, MMA and NSDP approaches. We show that the proposed method converges faster and stably without oscillation. Moreover, we show that the globally convergent version of MMA (GCMMA) (Svanberg 2002), which has the convergence guarantee only for a smooth problem, possibly fails in this kind of nonsmooth problems. This paper is organized as follows. Sect. 2 provides the fundamentals of worst-case topology optimization. Sect. 3 provides the fundamentals of smoothing methods. In Sect. 4, we propose a smoothing method for worst-case topology optimization. We discuss the implementation details, the inertial technique and parameter settings. In Sect. 5, we show the results of numerical experiments. Finally, some concluding remarks are provided in Sect. 6.
We use the following notations. The norm ‖ ⋅ ‖ and the inner product ⟨⋅, ⋅⟩ denote the Euclidean norm and inner product of vectors, respectively, throughout the paper. The vectors 0 and 1 have all components equal to 0 and 1, respectively.

3
Page 3 of 16 82 2 Worst-case topology optimization Consider a worst-case compliance minimization problem (Takezawa et al. 2011) shown in Fig. 1. It is an extension of the conventional compliance minimization problem in topology optimization. The basic problem setting is the same as the ones in Andreassen et al. (2011), Bendsøe and Sigmund (2004), i.e., we use the SIMP (solid isotropic material with penalization) based density method, the density filtering and the conventional finite element discretization. The design variable of an optimization problem is the density vector denoted by x ∈ ℝ n . The feasible set of optimization problems is written by where v is a constant vector with positive components and V 0 > 0 is the designated upper limit of the structural volume.
The conventional compliance minimization, which aims to maximize the stiffness of a structure, can be written as follows (see, e.g., Andreassen et al. (2011)): where f ∈ ℝ m is the constant external load vector, m is the number of degrees of freedom of the nodal displacements, K(⋅) ∈ ℝ m×m is the global stiffness matrix, K(Hx) −1 denotes the inverse of the global stiffness matrix with argument Hx , and H ∈ ℝ n×n is the filtering matrix to prevent mesh dependency (Bourdin 2001. Using the filtering matrix, v = H T 1 in (1) when each finite element has the same volume. The global stiffness matrix is defined by where E 0 ≫ E min > 0 are constants, x e is the e-th component of x , p > 1 is the SIMP penalty parameter, and K e (e = 1, … , n) is the local stiffness matrix with unit Young's modulus which is a constant symmetric matrix. Problem (2) has a nonconvex objective function and linear constraints.
To consider the worst-case compliance, we replace the objective function of (2) with the maximum compliance over the uncertainty set of the external load. To control the uncertainty of load, we set f = Af , where f ∈ ℝ d is the uncertain load vector and A ∈ ℝ m×d is a constant matrix. Note that the dimension of uncertainty d is normally much less than the dimension of the global stiffness matrix m. We consider the uncertainty ‖f ‖ = 1 , which is equivalent to ‖f ‖ ≤ 1 because the compliance is a convex function with respect to f , and a convex function attains its maximum at an extreme point of a bounded closed convex feasible set (Rockafellar 1970). If we want to consider an ellipsoidal uncertainty set, we only need to change the components of A, not ‖f ‖ = 1 . The worst-case compliance minimization problem is defined as follows (Takezawa et al. 2011): where C(x) = A T K(Hx) −1 A is a symmetric positive semidefinite matrix. As C(x) ∈ ℝ d×d is a symmetric matrix, Takezawa et al. (2011) show that the problem (3) is equivalent to the following minimization problem of the maximum eigenvalue of C(x): The maximum eigenvalue is nonsmooth (nondifferentiable) with respect to x where the multiplicity of eigenvalues occurs, and thus, problem (4) is classified as a nonsmooth optimization problem. The derivative of the maximum eigenvalue without multiplicity (derivative at a differentiable point) is calculated as follows (Takezawa et al. 2011): where max is the eigenvector corresponding to the maximum eigenvalue (Although max is dependent on x , we omit the argument (x) for simplicity), Z ∈ ℝ m×d is the solution of the linear equation and h ei is the (e, i)th component of the filtering matrix H. The solution of the linear equations (6) corresponds to the finite element analysis (FEA) and the objective function can also be calculated using the solution Z with the formula C(x) = A T Z . Note that in the case of the multiple maximum eigenvalues, only the subgradient and the directional derivative are available. By (5), we see that the derivative of the objective function with respect to the design variable is always nonpositive as Therefore, we set the volume constraint as an equality constraint as in (1), because the volume constraint is always active at an optimal solution even if we set it as an inequality constraint. ( Although the maximum eigenvalue is nonsmooth, it is a locally Lipschitz function and hence almost everywhere differentiable by Rademacher's theorem (Rockafellar and Wets 1998). This means that, normally, the sequence generated by an optimization algorithm does not hit a nondifferentiable point. Therefore, we can apply a normal (smooth) optimization algorithm to our problem, because the gradient exists almost everywhere. However, the sequence generated by a smooth optimization algorithm can oscillate or converge to a non-optimal point because of nonsmoothness. 1 Indeed, Takezawa et al. (2011) apply MMA (Svanberg 1987) to the problem considering the directional derivative, and although it converges fine in many cases, it oscillates if the multiplicity of eigenvalues occurs at an optimal solution. Therefore, it is important to consider the convergence guarantee, especially in a nonsmooth optimization problem.
One way to avoid nonsmoothness is a nonlinear semidefinite programming (NSDP) approach. Holmberg et al. (2015) show that the minimization problem of the maximum eigenvalue (4) is equivalent to the following NSDP problem: where z ∈ ℝ is an auxiliary variable and W ⪰ 0 denotes that W is a positive semidefinite matrix. Furthermore, Holmberg et al. (2015) put zI − C(x) = LL T , add each component of the triangular matrix L to the design variables, and solve the problem by the interior-point method of nonlinear programming (IPOPT). However, the interior-point method is not suited for a very large-scale problem such as a topology optimization problem because of its large computational cost per iteration.
To tackle the above issues, we propose a smoothing method for worst-case topology optimization which (i) has the convergence guarantee and suppresses the oscillation caused by nonsmoothness and (ii) has low computational cost per iteration even in a large-scale problem.

Subgradient and optimality condition
In a nonsmooth optimization problem, the objective (and/ or the constraint) function may be nondifferentiable at an optimal solution. Therefore, to consider the optimality condition, we use the Clarke subdifferential (Clarke 1990; Rockafellar and Wets 1998), which is an extension of the subdifferential to a nonconvex function. For a locally Lipschitz function f ∶ ℝ n → ℝ (not to be confused with the load vector f which only appears in the precious section), the Clarke subdifferential at x ∈ ℝ n is defined by where conA denotes a convex hull of a set A , In this paper, we aim to find a point satisfying the firstorder optimality condition called a Clarke stationary point. For a nonsmooth optimization problem with a locally Lipschitz continuous (nonsmooth) objective function f ∶ ℝ n → ℝ and a nonempty closed convex feasible set D ⊂ ℝ n , the Clarke stationary point is a point x * ∈ D satisfying for some g ∈ f (x * ) . The above condition becomes 0 ∈ f (x * ) when D = ℝ n and ∇f (x * ) = 0 when additionally f is differentiable. Thus, a Clarke stationary point is a natural extension of the stationary point to a nonsmooth constrained problem. Note that the condition (8) is valid only when D is convex. We need to consider a more complicated optimality condition based on the KKT (Karush-Kuhn-Tucker) condition when the constraints are nonconvex (Xu et al. 2015).

Smoothing function
In a smoothing method, we use a parameterized smooth approximation called a smoothing function f (⋅; ) ∶ ℝ n → ℝ of a nonsmooth function f ∶ ℝ n → ℝ . The parameter > 0 controls the degree of the approximation; a smoothing function f (x; ) converges to the original nonsmooth function f (x) when ↓ 0 as shown in Fig. 2. More precisely, a function f (⋅; ) is called a smoothing function of f if it is continuously differentiable for all > 0 , and it satisfies In addition, the gradient consistency of a smoothing function is required in most cases to guarantee the convergence of a smoothing method. Note that ∇f (x; ) denotes the gradient of f (⋅; ) ∶ ℝ n → ℝ with respect to x for each > 0 .
Construction of (the Chen-Mangasarian) smoothing functions satisfying the conditions above for various nonsmooth functions (max function, absolute value function, composite functions of those, etc.) can be found in Chen (2012). There are other types of smooth approximations; however, it may not be straightforward to check whether a smooth approximation which is not the Chen-Mangasarian type satisfies the gradient consistency (9).

Smoothing method
The smoothing method is a nonsmooth optimization framework which utilizes a smoothing function. By using a smoothing function, we can utilize well-developed smooth optimization methods. There are many variants of smoothing methods based on the underlying smooth optimization methods, e.g., the steepest descent method (Chen 2012), the projected gradient method (Zhang and Chen 2009) and the augmented Lagrangian method (Xu et al. 2015). Therefore, smoothing methods can be applied to a wide variety of problem settings if a smoothing function of the objective (and constraint) function is available. See Chen (2012) for construction of smoothing functions of various nonsmooth functions.
There are two approaches with respect to a smoothing parameter in smoothing methods, i.e., a fixed smoothing parameter and an adaptive smoothing parameter. Consider a nonsmooth optimization problem (7). Nesterov's smoothing method (Nesterov 2005(Nesterov , 2007 for a convex optimization problem uses a fixed smoothing parameter, i.e., it solves a smoothly approximated optimization problem with fixed > 0 instead of the original problem (7). A smoothing parameter > 0 is determined beforehand depending on the desired accuracy. As it solves an approximated problem, the algorithm does not converge to a solution of the original (10) Minimize x∈Df (x; ) nonsmooth problem (7). However, it converges to a point close to the solution for sufficiently small . The same kind of technique as a smoothing method with a fixed smoothing parameter is often used in structural optimization (Yang and Chen 1996;Torii and de Faria 2017;Ferrari and Sigmund 2019).
In contrast, Zhang and Chen (2009) and Chen (2012) introduce a smoothing method with an adaptive smoothing parameter. It solves the original nonsmooth problem (7) utilizing f (x; k ) where k may change at each iteration. Two advantages of adjusting a smoothing parameter at each iteration are as follows: (i) The generated sequence converges to an optimal solution of the original problem, not of an approximated problem. (ii) It may converge faster by utilizing a smoothing function with a better property at first. If a smoothing parameter is too large, the smoothing function is far from the original function. However, if we want a solution with better accuracy, we need to set a smoothing parameter very small, which makes the smoothing function illconditioned (it has large changes in its gradient and is close to a nonsmooth function). Therefore, it may be more efficient to start with a sufficiently large smoothing parameter and gradually decrease it. Considering the above advantages, we adopt a smoothing method with an adaptive smoothing parameter.

Smoothing projected gradient method
To treat convex constraints, the smoothing projected gradient method is introduced by Zhang and Chen (2009). For a nonsmooth constrained optimization problem (7), the smoothing projected gradient method uses the projected gradient method (Goldstein 1964) as a smooth optimization subroutine. The design variable is updated by for k = 0, 1, 2, … , where Π D ∶ ℝ n → D is the projection operator onto D, i.e., Π D (x) ∶= arg min y∈D ‖x − y‖ and k > 0 is an appropriate step size chosen by, e.g., the Armijo backtracking. Then the smoothing parameter is updated by where ∈ (0, 1) , > 0 and ‖ 1 k (x k+1 − x k )‖ is an optimality measure, which coincides with the norm of the gradient ‖∇f (x k ; )‖ in the unconstrained case D = ℝ n .
We can derive k → 0 from (12). Assume k =̄> 0 after a finite number of iterations, then the projected gradient update for a fixed objective function f (⋅;̄) with an appropriate stepsize leads to ‖ 1 k (x k+1 − x k )‖ → 0 . This contradicts the fact that k is fixed after finite iterations. Therefore, the  (12) is satisfied infinitely many times, which leads to k → 0 . This property is essential to guarantee the convergence to an optimal solution of the original problem. Technically, the smoothing projected gradient method has the following convergence guarantee: any accumulation point of {x k | k ∈ ℕ ∪ {0}, k+1 = k } generated by the smoothing projected gradient method becomes a Clarke stationary point. See Zhang and Chen (2009) for the proof.

Smoothing method for worst-case topology optimization
In this section, we explain the details of the implementation of the smoothing method for problem (4). Although our algorithm is based on the smoothing projected gradient method (Zhang and Chen 2009), we propose some techniques to make it more efficient.

Smoothing function and projection
For simplicity, we omit C(⋅) in the argument of the eigenvalue functions and write, for example, max (x) instead of max (C(x)) . Moreover, we denote the ith largest eigenvalue by i (x) ( max (x) coincides with 1 (x) ). A smoothing function of max can be written as follows: 2 We use the formulation in the second equality to avoid numerical instability caused by large exponents. This smoothing function ̃m ax (x; ) satisfies the gradient consistency (9), which is an important property to guarantee the convergence of smoothing methods.
The gradient of the smoothing function of the maximum eigenvalue is The gradient of the i-th eigenvalue ∇ i (x) can be calculated based on (5) with the eigenvector i corresponding to i instead of max . If there exists a multiplicity of the maximum eigenvalue, we can choose any maximum eigenvector and calculate ∇ max (x) by (5). Note that in this case, the gradient of max (x) does not exist, but ∇̃m ax (x; ) can be computed by the formulae in (5) and (14).

Remark 1
The p-norm ‖a‖ p (p > 1) is a smooth approximation of max{|a 1 |, … , |a d |} . As all the eigenvalues are nonnegative in our problem, we may use ‖ ‖ p as a smooth approximation of max (except for = 0 ) where is a vector with ith component equal to i . However, we use the smoothing function (13) in this paper, because the theoretical studies such as the gradient consistency are more readily available in the literature (Chen 2012;Nesterov 2007).
The projection Π S (x) onto the feasible set (1) can be calculated by where max{0, ⋅} and min{1, ⋅} are operated to a vector componentwise and ∈ ℝ is the solution of a piecewise linear equation We can compute efficiently by, e.g., the bisection method. See Kanno (2021, 2023) for more details about the projection. Therefore, we can apply the smoothing projected gradient method to problem (4). In the following sections, we propose some techniques to make the method more efficient.

Inertial technique
The smoothing projected gradient method is a very simple method. We can use the inertial (acceleration) technique proposed in Ochs et al. (2014) to accelerate the convergence and we call the proposed method the smoothing inertial projected method. For the general problem setting (7), the update scheme becomes where k , k > 0 are stepsize parameters and k (x k − x k−1 ) is called an inertial term. The computational cost per iteration (14) ∇̃m ax (x; ) is almost the same as the smoothing projected gradient method. Although there is no theoretical improvement in the convergence rate, we show a faster convergence than the smoothing projected gradient method in numerical experiments.
There are many variants of acceleration techniques (Li and Lin 2015;Ghadimi and Lan 2016;Carmon et al. 2018) known as accelerated gradient methods. However, some algorithms require evaluations of the objective value (and gradient) several times at each iteration. This means several FEAs are required and the computational cost per iteration becomes large. Thus, we use a simple scheme by Ochs et al. (2014). An accelerated version of smoothing methods have been studied for some specified problems in Bian and Wu (2021), Wang and Chen (2022).

Stepsize and smoothing parameters
To guarantee the convergence, we need to choose stepsize and smoothing parameters appropriately. The smoothing parameter k for the smoothing inertial projected gradient method is updated by where ∈ (0, 1) , > 0 , and is the proximal residual (Ochs et al. 2014), which is the optimality measure for smoothly approximated problem (10), i.e., r(x; ) = 0 if and only if x is a stationary point of (10). The update rule of the smoothing parameter (17) is modified from (12) to guarantee the convergence when the inertial term is added. Similarly in various smoothing methods (Zhang and Chen 2009;Chen 2012;Xu et al. 2015), to guarantee the convergence, the smoothing parameter is updated when the optimality measure for a smoothly approximated problem becomes small enough. By using this strategy, we may combine smoothing methods with GCMMA or other convergent optimization algorithms, although the convergence proof may not be straightforward.
Stepsize parameters must be chosen such that ‖r(x k ; k )‖ → 0 when k is unchanged. Thereby, (17) is satisfied infinitely many times and k → 0 is achieved. We can follow the same stepsize rule as Ochs et al. (2014) to achieve ‖r(x k ; k )‖ → 0 when k is unchanged. The parameter L k > 0 satisfying the following descent condition plays an important role for the convergence 3 : By using such L k , stepsize parameters for (16) can be determined as follows Ochs et al. (2014): and a 1 ≥ a 2 > 0 are constant parameters. Note that when k = 1∕L k and k ≡ 0 , the algorithm becomes the smoothing projected gradient method.
The Lipschitz constant L of the gradient of f (⋅; k ) defined by always satisfies the descent condition (18) (see, e.g., Beck (2017) for a proof). If L can be easily calculated for each k , we can use stepsizes (19) with L k = L . However, L is hard to estimate beforehand in our problem. In addition, a constant stepsize is often inefficient compared with choosing smaller L k , which means a larger stepsize, satisfying the descent condition (18) at some iterations.
We use the backtracking to find L k satisfying (18). We start with an initial value s k for L k and gradually increase it by multiplying some constant > 1 until (18) is satisfied. The check of the descent condition (18) requires the evaluation of the objective value ( x k+1 in the left-hand side of (18) depends on L k ). To reduce the number of evaluations of the objective value (the number of FEAs), we estimate the initial value for the backtracking procedure s k for k = 1, 2, … by where L min is a small positive constant to avoid the numerical instability and ensure the boundedness of L k Kanno 2021, 2023). For k = 0 , we cannot use (21), thus choose sufficiently large s 0 = L 0 > 0 . Although the initial estimate (21) is a very simple estimate based on (20), it works well in our numerical experiments. Indeed, it is often the case that the initial estimate (21) itself satisfies the descent condition (18), and no additional evaluation of the objective value (FEA) is needed in numerical experiments.
Note that, in practice, when the change ‖x k+1 − x k ‖ or k becomes too small after a large number of iterations, the descent condition (18) will not be satisfied because of numerical error. In that case, we need to just terminate the iteration.
Based on all of the above discussions, the algorithm of the smoothing inertial projected gradient method for the worstcase topology optimization is described in Algorithm 1. A MATLAB code for Algorithm 1 is provided on Appendix B. The condition k < for small > 0 can be used for the stopping criterion. However, our preliminary numerical experiments demonstrate that the number of iterations required to satisfy k < heavily depends on , , and 0 > 0 . Development of a practical stopping criterion is our future work. Each iteration of Algorithm 1 consists of vector additions, scalar multiplications, projections, and eigenvalue calculation of a small-size matrix other than FEA, and thus, the computational cost per iteration is small. Now we can guarantee the convergence of the proposed method. We state the convergence theorem in a more general setting than the worst-case topology optimization. We consider a nonsmooth optimization problem (7) and assume the following: (i) The objective function f ∶ ℝ n → ℝ is locally Lipschitz continuous. (ii) There exists a smoothing function f (⋅; ) of f such that ∇f (⋅; ) is Lipschitz continuous for all > 0 and satisfy the gradient consistency (9). (iii) The feasible set D is nonempty, closed, and convex, and the projection onto D is computationally tractable. In our problem (4), where f = max (C(⋅)) and D = S defined in (1), the above assumptions hold. Under the above assumptions, the following theorem holds. The proof is shown in Appendix A.
Theorem 1 (Convergence of the smoothing inertial projected gradient method) Let K = {k ∈ ℕ ∪ {0} | k+1 = k } . Every accumulation point of the subsequence {x k } k∈K generated by the smoothing inertial projected gradient method (16) with the stepsize parameters (19) and x 0 ∈ D is a Clarke stationary point of problem (7).

Remark 2
The convergence guarantee in Theorem 1 is weaker than saying " {x k } converges to a Clarke stationary point." Indeed, there is a possibility of {x k } to oscillate between multiple (infinitely many) Clarke stationary points (this also applies to other smoothing gradient methods (Zhang and Chen 2009;Chen 2012;Xu et al. 2015)). To rule out the possibility of oscillations in nonconvex optimization, we need stronger assumptions and more complicated arguments (see, e.g., Bolte et al. (2014)), which is beyond the scope of this paper. In our numerical experiments, no oscillation is observed.

Numerical experiments
All the experiments have been conducted on MacBook Pro (2019, 1.4 GHz Quad-Core Intel Core i5, 8 GB memory) and MAT-LAB R2022b. The MATLAB code for topology optimization is based on Andreassen et al. (2011) and Ferrari and Sigmund (2020). The following values are common in all the experiments: E 0 = 1 , E min = 10 −3 , and the SIMP penalty parameter p = 3 . The construction of the local stiffness matrix is the same as Ferrari and Sigmund (2020). The Poisson ratio in the stiffness matrix is 0.3. The filter radius used for the density filter is 0.02 times the number of finite elements in the horizontal direction. The initial point of each algorithm is x 0 = (V 0 ∕n)1 . The parameters of the proposed method are as follows: L 0 = 1000 , L min = 10 −3 , = 1.5 , a 1 = 0.1 , a 2 = 10 −6 , 0 = 10 , = 0.99 and = 0.1.

Study of smoothing method
Before comparing the proposed method with the existing methods, we show the effectiveness of the smoothing method and proposed inertial technique. In this subsection, we consider a 2D L-shaped beam in Fig. 1 with the number of design variables n = 7500 and the volume fraction V 0 ∕n = 0.4 . For discretization of 2D L-shaped beam, we mesh a square domain and fix the elements in the upper right corner to void.

Effectiveness of adaptive smoothing parameter
To show the effectiveness of adjusting the smoothing parameter at each iteration by (17), we compare the smoothing inertial projected gradient method with adaptive, heuristic, and fixed smoothing parameters. "Adaptive" corresponds to the proposed update scheme (17). "Heuristic" updates the smoothing parameter every 40 iterations as k = 10, 1, 10 −1 , 10 −2 , 10 −3 . This kind of continuation scheme is often used in structural optimization. Two fixed smoothing parameters are a small value and a large value: k ≡ 10 −2 and k ≡ 10 2 . The objective values at each iteration and the obtained designs after 200 iterations are shown in Figs. 3 and 4, respectively. Fig. 3 shows that too large leads to convergence to a solution with a large objective value because the approximation of the objective function is inaccurate. Too small leads to slow convergence because only a small stepsize is acceptable to satisfy the descent condition (the smoothing function becomes illconditioned with small ). Indeed, Fig. 4 shows that the methods with small have not converged in 200 iterations. Also, the heuristic continuation scheme exhibits slow convergence because of a sudden change of the smoothing parameter. In contrast, adapting the smoothing parameter by (17) leads to fast and stable convergence to a better solution as shown in Figs. 3 and 4. With this adaptive update scheme, we do not need to set the update scheme manually like the conventional continuation scheme.

Effectiveness of inertial technique
We compare the smoothing method with the inertial technique, SmoothingIPG in (16), and the smoothing method without the inertial technique, SmoothingPG in (11). We also add the conventional projected gradient method with a fixed stepsize, 4 PG, The update rule of the variable is as follows: which does not have the convergence guarantee in a nonsmooth optimization problem. The objective values at each iteration are shown in Fig. 5. Fig. 5 shows the acceleration of the convergence by the inertial technique. Also, the projected gradient with a fixed stepsize oscillates and converges slowly because of the nonsmoothness of the objective function. This supports the effectiveness of the smoothing method.

Comparison with existing methods
We compare the proposed method with two existing methods, the MMA approach by Takezawa et al. (2011) which applies MMA to the original nonsmooth problem (no convergence guarantee), and the NSDP approach by Holmberg et al. (2015). The implementation of MMA is based on Svanberg (1987Svanberg ( , 2022. The implementation of NSDP is based on the open-source MATLAB code fminsdp (Thore 2013) and the L-BFGS formula with 6 correction pairs is used for Hessian approximation in the interior-point method. We also show that GCMMA (Svanberg 2002) does not work in this nonsmooth problem, because GCMMA does not have the convergence guarantee in the problem which lacks the assumption of twice continuous differentiability of the objective and constraint functions.

2D L-shaped beam
We consider the same problem setting as in Sect. 5.1 where the multiplicity of eigenvalues occurs at an optimal  Figs. 6,7,8,9. The initial design 1 has uniform density. The initial design 2 is obtained by sampling each density from the uniform distribution over [0, 1] and projecting it onto the feasible set. The figures hereafter (Figs. 10,11,12,13,14,and 15) are results with the initial design 1. The two eigenvalues at each iteration of the three algorithms are shown in Figs. 10, 11, 12, and13. To see the change of order of the ith eigenvalues, and ′ are labeled by the directions of the eigenvectors. The average of computational time per iteration and the number of FEAs (6) for 200 iterations with different problem size n are shown in Figs. 14 and 15. Note that a practical computational time of an optimization algorithm heavily depends on a stopping criterion. Two existing methods (Takezawa et al. 2011;Holmberg et al. 2015) do not provide stopping criteria for their algorithms, and thus, it is difficult to compare the practical computational time and the number of FEAs required for each algorithm to converge. Therefore, we compare the performance for fixed number of iterations.
Experiments with two different initial designs show similar convergence results in Figs. 6, 7, 8 and 9. Different designs suggest that the problem has multiple local optimal solutions. As shown in Figs. 6, 7, 8, and 9 and 12, MMA oscillates and does not converge to an appropriate solution because of nonsmoothness. Figs. 6-7 and 13 show that GCMMA does not oscillate because it uses a line-search-like algorithm to ensure a sufficient decrease of the objective value at each iteration (hence globally convergent for a sufficiently smooth problem). However, its convergence is very slow, and its computational cost per iteration and numbers of FEAs are very large as shown in Figs. 14 and 15 because the line-search-like algorithm works poorly with nonsmoothness (discontinuous change of gradient). In contrast, the proposed method and NSDP converge fine and obtain clear solutions because they have the convergence guarantees. Figs. 10, 11, 12 and 13  (6)). GCMMA is omitted as it conducts much larger number of FEAs (1930 with n = 7500) show that the maximum eigenvalue switch and the objective value oscillate especially in MMA. Although Fig. 10 suggests that the proposed method is still in the middle of convergence for 200 iterations as the eigenvalues do not multiply, Figs. 6, 7, 8 and 9 show it converges to a reasonable design faster than the others. Moreover, the computational time per iteration of the proposed method is shorter than the others as shown in Fig. 14. The computational time per iteration of NSDP increases faster than the proposed method because of more complicated algorithm with solutions of linear equations. Figs. 14 and 15 suggest that the computational cost of the optimization procedure of NSDP excluding FEA is not negligible as it conducts single FEA at each iteration in most cases and still has higher computational cost per iteration. Less rapid increase of the computational cost per iteration of the proposed method suggests it works well even for larger problems. The proposed method solves a nonsmooth problem directly and utilizes the problem structure; hence, it is expected to be more efficient than other methods. Note that Fig. 15 also suggests that the estimation (21) is good enough and no additional evaluation of the objective value (FEA) to adjust the stepsize is needed in many cases.

3D cantilever
We consider a 3D setting where the number of columns of C(x) is d = 3 and the triple multiplicity of eigenvalues occurs near an optimal solution as shown in Fig. 16. We construct such a problem setting by utilizing symmetry. The compliance caused by a load in X-direction is exactly the same as the one caused by a load in Y-direction because of the symmetry. This suggests two of three eigenvalues are always the same at a point corresponding to a symmetric structure. Moreover, by adjusting the magnitude of uncertainty in the Z-direction, we can construct the case where the triple multiplicity occurs (compliance becomes constant for all the directions of the load in the uncertainty set). Fig. 17 shows that the triple multiplicity of eigenvalues occurs in iterations of MMA. To see the change of order of the ith eigenvalues, , ′ , and ′′ are labeled by the directions of the eigenvectors. The number of design variables is n = 27000 and the volume fraction is V 0 ∕n = 0.1.
The objective values at each iteration and the obtained designs after 300 iterations are shown in Figs. 18 and 19, respectively.
Although NSDP and MMA are less oscillatory, Figs. 18 and 19 show the similar results as in 2D L-shaped beam. Drastic oscillations of MMA for first few iterations lead to slow convergence.

Conclusion
We proposed a smoothing method for the worst-case topology optimization under load uncertainty. It consists of simple update scheme and is easy to implement. It has a low computational cost per iteration even in a large-scale problem, and has the convergence guarantee to a solution satisfying the first-order optimality and converges fast suppressing oscillation. In a nonsmooth optimization problem,  Obtained designs after 300 iterations (3D cantilever). Each objective value is shown in parenthesis. SmoothingIPG is denoted by S-IPG for short the convergence guarantee is especially important to obtain an optimal solution properly and efficiently; otherwise, the sequence becomes oscillatory or converges to a non-optimal point. Moreover, the proposed method exploits the problem structure, e.g., use of projection and smooth approximation to specific types of objective and constraints, and therefore, it is often more efficient than the general-purpose nonlinear optimization algorithm such as the interior-point method.
There may be room for developing more efficient optimization algorithms specifically designed for eigenvalue optimization problems. In contrast, one of the advantages of the smoothing method is that it is simple and can be applied to various nonsmooth optimization problems. It can be combined with many optimization algorithms and treated nonlinear constraints.
In future work, we extend the idea and apply the smoothing method to other types of nonsmooth structural (topology) optimization problems such as vibration and buckling problems. In these examples, the dimension d of the matrix of which the eigenvalue is optimized is very large. Therefore, an additional strategy may be necessary to reduce the computational cost of the smooth approximation. Moreover, we may need other smoothing methods such as smoothing augmented Lagrangian method (Xu et al. 2015) to treat nonlinear constraints. Moreover, development of a practical optimality measure and a stopping criterion for this kind of nonsmooth nonconvex optimization problems is required for a practical use of smoothing methods.