Smoothing inertial method for worst-case robust topology optimization under load uncertainty

Nishioka, Akatsuki; Kanno, Yoshihiro

doi:10.1007/s00158-023-03543-7

Smoothing inertial method for worst-case robust topology optimization under load uncertainty

Research Paper
Open access
Published: 24 March 2023

Volume 66, article number 82, (2023)
Cite this article

Download PDF

You have full access to this open access article

Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Smoothing inertial method for worst-case robust topology optimization under load uncertainty

Download PDF

1548 Accesses
1 Citation
Explore all metrics

Abstract

We consider a worst-case robust topology optimization problem under load uncertainty, which can be formulated as a minimization problem of the maximum eigenvalue of a symmetric matrix. The objective function is nondifferentiable where the multiplicity of maximum eigenvalues occurs. Nondifferentiability often causes some numerical instabilities in an optimization algorithm such as oscillation of the generated sequence and convergence to a non-optimal point. We use a smoothing method to tackle these issues. The proposed method is guaranteed to converge to a point satisfying the first-order optimality condition. In addition, it is a simple first-order optimization method and thus has low computational cost per iteration even in a large-scale problem. In numerical experiments, we show that the proposed method suppresses oscillation and converges faster than other existing methods.

Stability constraints for geometrically nonlinear topology optimization

Article Open access 05 December 2023

Stochastic Sensitivity Analysis for Robust Topology Optimization

Non-gradient Robust Topology Optimization Method Considering Loading Uncertainty

Article 10 August 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Recently, topology optimization considering robustness has been attracting much attention in real-world engineering. There are several approaches to consider the robustness of a structure, such as a probabilistic approach (Dunning and Kim 2013) and a worst-case approach (Takezawa et al. 2011). We consider the latter approach in this paper. See, e.g., Kanno (2020), for the difference between these concepts.

In topology optimization of continua, Takezawa et al. (2011) formulate a worst-case compliance minimization problem under load uncertainty as a minimization problem of the maximal eigenvalue of a symmetric matrix. The maximum eigenvalue is nonsmooth (i.e., nondifferentiable) where the multiplicity of eigenvalues occurs. Therefore, this problem is categorized as a nonsmooth optimization problem. Takezawa et al. (2011) use the directional derivative of the maximum eigenvalue where the multiplicity of eigenvalues occurs and apply the method of moving asymptotes (MMA) (Svanberg 1987) to the problem. Although it converges fine in many cases, it oscillates and does not converge if the multiplicity of eigenvalues occurs at an optimal solution as mentioned in Thore (2022). This fact signifies the importance of the convergence guarantee in this type of nonsmooth problem. Holmberg et al. (2015) reformulate the problem into a nonlinear semidefinite programming (NSDP) problem. Then, utilizing the Cholesky factorization, they solve it with a standard nonlinear solver (the interior-point method of IPOPT (IPOPT 2022)). By this approach, they can avoid nonsmoothness and guarantee the convergence to a solution satisfying the first-order optimality condition. However, the interior-point method has a large computational cost per iteration in a large-scale problem because it requires (approximate) Hessian information and a solution of linear equation (e.g., Eq. (13) in Wächter and Biegler (2006) for computation of search directions) at every iteration. Even if we use approximate Hessian, the computational cost per iteration of the general-purpose interior-point method becomes larger, especially in large-scale nonlinear optimization, than simple gradient descent based methods. A topology optimization problem can be very large-scale when considering 3D structural design, and thus, an efficient method for large-scale problems is needed. There are other approaches to worst-case robust structural optimization, e.g., Cherkaev and Cherkaev (2004, 2008).

Eigenvalue optimization problems appear in other applications of topology optimization such as vibration and buckling problems (Díaaz and Kikuchi 1992; Kočvara 2002; Ohsaki et al. 1999; Yamada and Kanno 2016). The influence of multiplicity of eigenvalues at an optimal solution has been extensively studied in the field of structural optimization (Seyranian et al. 1994). Eigenvalue optimization has also been extensively studied outside of the field of structural optimization. It is closely related to semidefinite programming (Lewis and Overton 1996; Helmberg and Rendl 2000). It also appears in the field of control engineering (Apkarian et al. 2008). However, many studies focus on convex or unconstrained cases, thus they are not applicable to the eigenvalue optimization in topology optimization where the objective function is nonconvex and the volume constraint is applied in most cases.

A smoothing method is a nonsmooth optimization framework, which utilizes a smooth (i.e., differentiable) approximation of a nonsmooth function called a smoothing function. A smoothing function has a parameter that controls the approximation accuracy; a smoothing function converges to the original nonsmooth function when we take the limit of the parameter. The advantage of smoothing methods is that, by using smooth approximation, we can utilize well-developed smooth optimization methods. The basic idea of smoothing methods has a long history (Bertsekas 1975; Zang 1980), and there are many types of smoothing methods depending on the update scheme of a smoothing parameter (Nesterov 2007; Chen 2012; Bian and Wu 2021) and the smooth optimization methods they are based on: e.g., smoothing projected gradient method (Zhang and Chen 2009), smoothing augmented Lagrangian method (Xu et al. 2015), etc.

A smooth approximation has been utilized in some structural (topology) optimization problems such as a problem with stress constraints (Yang and Chen 1996), a dynamic problem (Torii and de Faria 2017), and a buckling problem (Ferrari and Sigmund 2019). However, they solve an approximated problem with a fixed smoothing parameter, and thus, the obtained solution is not the optimal solution of the original nonsmooth problem, but rather a solution of an approximated problem. In this paper, we adopt a smoothing method based on Zhang and Chen (2009), Chen (2012) which updates a smoothing parameter at each iteration so that the convergence to the optimal solution of the original problem is guaranteed.

Recently, simple first-order optimization methods, which only require the first-order derivative and the value of the objective function and do not require solutions of complicated subproblems, have been attracting much attention, especially in the machine learning literature. They have low computational cost per iteration, and thus are suitable for large-scale problems. There are many researches on accelerating the convergence of simple first-order methods (d’Aspremont et al. 2021; Ghadimi and Lan 2016; Li and Lin 2015; Nesterov 1983; Ochs et al. 2014). Recently, these optimization algorithms have been applied to some topology optimization problems (Li and Zhang 2021; Nishioka and Kanno 2021, 2023). Beck (Beck 2017) gives various examples of simple first-order methods.

In this paper, we propose a smoothing method for solving a worst-case topology optimization problem. It is a simple first-order method and suited for large-scale problems. It has a convergence guarantee to a solution satisfying the first-order optimality condition of the original problem and suppresses the oscillation caused by nonsmoothness at the optimal solution. We propose an inertial technique based on Ochs et al. (2014) to accelerate the smoothing method and discuss the parameter setting (the smoothing parameter and the stepsize parameters) of the proposed method for better convergence in our problems. Compared to existing methods, MMA and NSDP, the proposed method consists of simple and comprehensible update formula without solutions of subproblems nor linear equations at each iteration. Therefore, the proposed method is easy to implement. In the numerical experiments, we compare the proposed method with two existing methods, MMA and NSDP approaches. We show that the proposed method converges faster and stably without oscillation. Moreover, we show that the globally convergent version of MMA (GCMMA) (Svanberg 2002), which has the convergence guarantee only for a smooth problem, possibly fails in this kind of nonsmooth problems.

This paper is organized as follows. Sect. 2 provides the fundamentals of worst-case topology optimization. Sect. 3 provides the fundamentals of smoothing methods. In Sect. 4, we propose a smoothing method for worst-case topology optimization. We discuss the implementation details, the inertial technique and parameter settings. In Sect. 5, we show the results of numerical experiments. Finally, some concluding remarks are provided in Sect. 6.

We use the following notations. The norm $\Vert \cdot \Vert$ and the inner product $\langle \cdot ,\cdot \rangle$ denote the Euclidean norm and inner product of vectors, respectively, throughout the paper. The vectors $\varvec{0}$ and $\varvec{1}$ have all components equal to 0 and 1, respectively.

2 Worst-case topology optimization

Consider a worst-case compliance minimization problem (Takezawa et al. 2011) shown in Fig. 1. It is an extension of the conventional compliance minimization problem in topology optimization. The basic problem setting is the same as the ones in Andreassen et al. (2011), Bendsøe and Sigmund (2004), i.e., we use the SIMP (solid isotropic material with penalization) based density method, the density filtering and the conventional finite element discretization. The design variable of an optimization problem is the density vector denoted by $\varvec{x}\in {\mathbb {R}}^{n}$. The feasible set of optimization problems is written by

$$\begin{aligned} S=\{\varvec{x}\in {\mathbb {R}}^{n}\mid \varvec{v}^{\textrm{T}}\varvec{x}=V_0,\ \varvec{0}\le \varvec{x}\le \varvec{1}\}, \end{aligned}$$

(1)

where $\varvec{v}$ is a constant vector with positive components and $V_0>0$ is the designated upper limit of the structural volume.

The conventional compliance minimization, which aims to maximize the stiffness of a structure, can be written as follows (see, e.g., Andreassen et al. (2011)):

$$\begin{aligned} \underset{\varvec{x}\in S}{\textrm{Minimize}}\ \ {\bar{\varvec{f}}}^{\textrm{T}}K(H\varvec{x})^{-1}{\bar{\varvec{f}}}, \end{aligned}$$

(2)

where ${\bar{\varvec{f}}}\in {\mathbb {R}}^m$ is the constant external load vector, m is the number of degrees of freedom of the nodal displacements, $K(\cdot )\in {\mathbb {R}}^{m\times m}$ is the global stiffness matrix, $K(H\varvec{x})^{-1}$ denotes the inverse of the global stiffness matrix with argument $H\varvec{x}$, and $H\in {\mathbb {R}}^{n\times n}$ is the filtering matrix to prevent mesh dependency (Bourdin 2001. Using the filtering matrix, $\varvec{v}=H^{\textrm{T}}\varvec{1}$ in (1) when each finite element has the same volume. The global stiffness matrix is defined by

$$\begin{aligned} K(\varvec{x})=\sum _{e=1}^n (E_{\textrm{min}}+(E_0-E_{\textrm{min}})x_e^pK_e), \end{aligned}$$

where $E_0\gg E_{\textrm{min}}>0$ are constants, $x_e$ is the e-th component of $\varvec{x}$, $p>1$ is the SIMP penalty parameter, and $K_e\ (e=1,\ldots ,n)$ is the local stiffness matrix with unit Young’s modulus which is a constant symmetric matrix. Problem (2) has a nonconvex objective function and linear constraints.

To consider the worst-case compliance, we replace the objective function of (2) with the maximum compliance over the uncertainty set of the external load. To control the uncertainty of load, we set ${\bar{\varvec{f}}}=A{\varvec{f}}$, where ${\varvec{f}}\in {\mathbb {R}}^d$ is the uncertain load vector and $A\in {\mathbb {R}}^{m\times d}$ is a constant matrix. Note that the dimension of uncertainty d is normally much less than the dimension of the global stiffness matrix m. We consider the uncertainty $\Vert {\varvec{f}}\Vert =1$, which is equivalent to $\Vert {\varvec{f}}\Vert \le 1$ because the compliance is a convex function with respect to ${\varvec{f}}$, and a convex function attains its maximum at an extreme point of a bounded closed convex feasible set (Rockafellar 1970). If we want to consider an ellipsoidal uncertainty set, we only need to change the components of A, not $\Vert {\varvec{f}}\Vert =1$. The worst-case compliance minimization problem is defined as follows (Takezawa et al. 2011):

$$\begin{aligned} \underset{\varvec{x}\in S}{\textrm{Minimize}}\ \ \underset{\Vert {\varvec{f}}\Vert = 1}{\max }{\varvec{f}}^{\textrm{T}}C(\varvec{x}){\varvec{f}}, \end{aligned}$$

(3)

where $C(\varvec{x})=A^{\textrm{T}}K(H\varvec{x})^{-1}A$ is a symmetric positive semidefinite matrix. As $C(\varvec{x})\in {\mathbb {R}}^{d\times d}$ is a symmetric matrix, Takezawa et al. (2011) show that the problem (3) is equivalent to the following minimization problem of the maximum eigenvalue of $C(\varvec{x})$:

$$\begin{aligned} \underset{\varvec{x}\in S}{\textrm{Minimize}}\ \ \lambda _{\textrm{max}}\left( C(\varvec{x})\right) . \end{aligned}$$

(4)

The maximum eigenvalue is nonsmooth (nondifferentiable) with respect to $\varvec{x}$ where the multiplicity of eigenvalues occurs, and thus, problem (4) is classified as a nonsmooth optimization problem.

The derivative of the maximum eigenvalue without multiplicity (derivative at a differentiable point) is calculated as follows (Takezawa et al. 2011):

$$\begin{aligned} \frac{\partial }{\partial x_e}\lambda _{\textrm{max}}\left( C(\varvec{x})\right) ={-}\sum _{i=1}^n h_{ei}(Z\varvec{\phi }_{\textrm{max}})^{\textrm{T}}\frac{\partial }{\partial x_e}K(\varvec{x})(Z\varvec{\phi }_{\textrm{max}}), \end{aligned}$$

(5)

where $\varvec{\phi }_{\textrm{max}}$ is the eigenvector corresponding to the maximum eigenvalue (Although $\varvec{\phi }_{\textrm{max}}$ is dependent on $\varvec{x}$, we omit the argument $(\varvec{x})$ for simplicity), $Z\in {\mathbb {R}}^{m\times d}$ is the solution of the linear equation

$$\begin{aligned} K(H\varvec{x})Z=A \end{aligned}$$

(6)

and $h_{ei}$ is the (e, i)th component of the filtering matrix H. The solution of the linear equations (6) corresponds to the finite element analysis (FEA) and the objective function can also be calculated using the solution Z with the formula $C(\varvec{x})=A^{\textrm{T}}Z$. Note that in the case of the multiple maximum eigenvalues, only the subgradient and the directional derivative are available. By (5), we see that the derivative of the objective function with respect to the design variable is always nonpositive as $\frac{\partial }{\partial x_e}K(\varvec{x})\succeq 0$ and $h_{ei}\ge 0$ for all e and i. Therefore, we set the volume constraint as an equality constraint as in (1), because the volume constraint is always active at an optimal solution even if we set it as an inequality constraint.

Although the maximum eigenvalue is nonsmooth, it is a locally Lipschitz function and hence almost everywhere differentiable by Rademacher’s theorem (Rockafellar and Wets 1998). This means that, normally, the sequence generated by an optimization algorithm does not hit a nondifferentiable point. Therefore, we can apply a normal (smooth) optimization algorithm to our problem, because the gradient exists almost everywhere. However, the sequence generated by a smooth optimization algorithm can oscillate or converge to a non-optimal point because of nonsmoothness.^{Footnote 1} Indeed, Takezawa et al. (2011) apply MMA (Svanberg 1987) to the problem considering the directional derivative, and although it converges fine in many cases, it oscillates if the multiplicity of eigenvalues occurs at an optimal solution. Therefore, it is important to consider the convergence guarantee, especially in a nonsmooth optimization problem.

One way to avoid nonsmoothness is a nonlinear semidefinite programming (NSDP) approach. Holmberg et al. (2015) show that the minimization problem of the maximum eigenvalue (4) is equivalent to the following NSDP problem:

$$\begin{aligned}&\underset{\varvec{x}\in S,\ z\in {\mathbb {R}}}{\textrm{Minimize}}\ \ \ \ z\\&\textrm{subject}\ \textrm{to}\ \ \ zI-C(\varvec{x})\succeq 0, \end{aligned}$$

where $z\in {\mathbb {R}}$ is an auxiliary variable and $W\succeq 0$ denotes that W is a positive semidefinite matrix. Furthermore, Holmberg et al. (2015) put $zI-C(\varvec{x})=LL^{\textrm{T}}$, add each component of the triangular matrix L to the design variables, and solve the problem by the interior-point method of nonlinear programming (IPOPT). However, the interior-point method is not suited for a very large-scale problem such as a topology optimization problem because of its large computational cost per iteration.

To tackle the above issues, we propose a smoothing method for worst-case topology optimization which (i) has the convergence guarantee and suppresses the oscillation caused by nonsmoothness and (ii) has low computational cost per iteration even in a large-scale problem.

3 Fundamentals of smoothing methods

3.1 Subgradient and optimality condition

In a nonsmooth optimization problem, the objective (and/or the constraint) function may be nondifferentiable at an optimal solution. Therefore, to consider the optimality condition, we use the Clarke subdifferential (Clarke 1990; Rockafellar and Wets 1998), which is an extension of the subdifferential to a nonconvex function. For a locally Lipschitz function $f:{\mathbb {R}}^{n}\rightarrow {\mathbb {R}}$ (not to be confused with the load vector $\varvec{f}$ which only appears in the precious section), the Clarke subdifferential at $\varvec{x}\in {\mathbb {R}}^{n}$ is defined by

$$\begin{aligned} \partial f(\varvec{x})=\textrm{con}\{\underset{i\rightarrow \infty }{\lim }\nabla f(\varvec{x}^i) \mid \underset{i\rightarrow \infty }{\lim } \varvec{x}^i\rightarrow \varvec{x},\ \varvec{x}^i\in D_f \}, \end{aligned}$$

where $\textrm{con}{\mathcal {A}}$ denotes a convex hull of a set ${\mathcal {A}}$, $\{\varvec{x}^i\}\ (i=0,1,\ldots )$ is an infinite sequence in ${\mathbb {R}}^{n}$ , and $D_f$ is a dense subset of ${\mathbb {R}}^{n}$ at which f is differentiable. Note that a Clarke subdifferential is a set and becomes $\partial f(\varvec{x})=\{\nabla f(\varvec{x})\}$ if f is differentiable at $\varvec{x}$.

In this paper, we aim to find a point satisfying the first-order optimality condition called a Clarke stationary point. For a nonsmooth optimization problem

$$\begin{aligned} \underset{\varvec{x}\in D}{\textrm{Minimize}}\ \ f(\varvec{x}) \end{aligned}$$

(7)

with a locally Lipschitz continuous (nonsmooth) objective function $f:{\mathbb {R}}^{n}\rightarrow {\mathbb {R}}$ and a nonempty closed convex feasible set $D\subset {\mathbb {R}}^{n}$, the Clarke stationary point is a point $\varvec{x}^*\in D$ satisfying

$$\begin{aligned} \langle \varvec{g},\varvec{x}^*-\varvec{x}\rangle \le 0\ \ \ \ (\forall \varvec{x}\in D) \end{aligned}$$

(8)

for some $\varvec{g}\in \partial f(\varvec{x}^*)$. The above condition becomes $\varvec{0}\in \partial f(\varvec{x}^*)$ when $D={\mathbb {R}}^{n}$ and $\nabla f(\varvec{x}^*)=\varvec{0}$ when additionally f is differentiable. Thus, a Clarke stationary point is a natural extension of the stationary point to a nonsmooth constrained problem. Note that the condition (8) is valid only when D is convex. We need to consider a more complicated optimality condition based on the KKT (Karush–Kuhn–Tucker) condition when the constraints are nonconvex (Xu et al. 2015).

3.2 Smoothing function

In a smoothing method, we use a parameterized smooth approximation called a smoothing function $\tilde{f}(\cdot {;}\mu ):{\mathbb {R}}^{n}\rightarrow {\mathbb {R}}$ of a nonsmooth function $f:{\mathbb {R}}^{n}\rightarrow {\mathbb {R}}$. The parameter $\mu >0$ controls the degree of the approximation; a smoothing function $\tilde{f}(\varvec{x}{;}\mu )$ converges to the original nonsmooth function $f(\varvec{x})$ when $\mu \downarrow 0$ as shown in Fig. 2. More precisely, a function $\tilde{f}(\cdot {;}\mu )$ is called a smoothing function of f if it is continuously differentiable for all $\mu >0$, and it satisfies

$$\begin{aligned} \underset{\varvec{y}\rightarrow \varvec{x},\ \mu \downarrow 0}{\lim }\tilde{f}(\varvec{y}{;}\mu )=f(\varvec{x}). \end{aligned}$$

In addition, the gradient consistency of a smoothing function

$$\begin{aligned} \underset{\varvec{y}\rightarrow \varvec{x},\ \mu \downarrow 0}{\lim }\nabla \tilde{f}(\varvec{y}{;}\mu )\in \partial f(\varvec{x}) \end{aligned}$$

(9)

is required in most cases to guarantee the convergence of a smoothing method. Note that $\nabla \tilde{f}(\varvec{x}{;}\mu )$ denotes the gradient of $\tilde{f}(\cdot {;}\mu ):{\mathbb {R}}^{n}\rightarrow {\mathbb {R}}$ with respect to $\varvec{x}$ for each $\mu >0$. Construction of (the Chen–Mangasarian) smoothing functions satisfying the conditions above for various nonsmooth functions (max function, absolute value function, composite functions of those, etc.) can be found in Chen (2012). There are other types of smooth approximations; however, it may not be straightforward to check whether a smooth approximation which is not the Chen–Mangasarian type satisfies the gradient consistency (9).

3.3 Smoothing method

The smoothing method is a nonsmooth optimization framework which utilizes a smoothing function. By using a smoothing function, we can utilize well-developed smooth optimization methods. There are many variants of smoothing methods based on the underlying smooth optimization methods, e.g., the steepest descent method (Chen 2012), the projected gradient method (Zhang and Chen 2009) and the augmented Lagrangian method (Xu et al. 2015). Therefore, smoothing methods can be applied to a wide variety of problem settings if a smoothing function of the objective (and constraint) function is available. See Chen (2012) for construction of smoothing functions of various nonsmooth functions.

There are two approaches with respect to a smoothing parameter $\mu$ in smoothing methods, i.e., a fixed smoothing parameter and an adaptive smoothing parameter. Consider a nonsmooth optimization problem (7). Nesterov’s smoothing method (Nesterov 2005, 2007) for a convex optimization problem uses a fixed smoothing parameter, i.e., it solves a smoothly approximated optimization problem with fixed $\mu >0$

$$\begin{aligned} \underset{\varvec{x}\in D}{\textrm{Minimize}}\ \ \tilde{f}(\varvec{x}{;}\mu ) \end{aligned}$$

(10)

instead of the original problem (7). A smoothing parameter $\mu >0$ is determined beforehand depending on the desired accuracy. As it solves an approximated problem, the algorithm does not converge to a solution of the original nonsmooth problem (7). However, it converges to a point close to the solution for sufficiently small $\mu$. The same kind of technique as a smoothing method with a fixed smoothing parameter is often used in structural optimization (Yang and Chen 1996; Torii and de Faria 2017; Ferrari and Sigmund 2019).

In contrast, Zhang and Chen (2009) and Chen (2012) introduce a smoothing method with an adaptive smoothing parameter. It solves the original nonsmooth problem (7) utilizing $\tilde{f}(\varvec{x}{;}\mu _k)$ where $\mu _k$ may change at each iteration. Two advantages of adjusting a smoothing parameter at each iteration are as follows: (i) The generated sequence converges to an optimal solution of the original problem, not of an approximated problem. (ii) It may converge faster by utilizing a smoothing function with a better property at first. If a smoothing parameter is too large, the smoothing function is far from the original function. However, if we want a solution with better accuracy, we need to set a smoothing parameter very small, which makes the smoothing function ill-conditioned (it has large changes in its gradient and is close to a nonsmooth function). Therefore, it may be more efficient to start with a sufficiently large smoothing parameter and gradually decrease it. Considering the above advantages, we adopt a smoothing method with an adaptive smoothing parameter.

3.4 Smoothing projected gradient method

To treat convex constraints, the smoothing projected gradient method is introduced by Zhang and Chen (2009). For a nonsmooth constrained optimization problem (7), the smoothing projected gradient method uses the projected gradient method (Goldstein 1964) as a smooth optimization subroutine. The design variable is updated by

$$\begin{aligned} \varvec{x}^{k+1}=\Pi _D\left( \varvec{x}^{k}-\alpha _k\nabla \tilde{f}(\varvec{x}^{k}{;}\mu _k)\right) \end{aligned}$$

(11)

for $k=0,1,2,\ldots$, where $\Pi _D:{\mathbb {R}}^{n}\rightarrow D$ is the projection operator onto D, i.e., $\Pi _D(\varvec{x}):=\textrm{arg}\,\textrm{min}_{\varvec{y}\in D}\Vert \varvec{x}-\varvec{y}\Vert$ and $\alpha _k>0$ is an appropriate step size chosen by, e.g., the Armijo backtracking. Then the smoothing parameter is updated by

$$\begin{aligned} \mu _{k+1}={\left\{ \begin{array}{ll} \sigma \mu _k\ \ \ \ &{}\text {if }\Vert \frac{1}{\alpha _k} (\varvec{x}^{k+1}-\varvec{x}^k)\Vert <\gamma \mu _k,\\ \mu _k &{}\text {otherwise}, \end{array}\right. } \end{aligned}$$

(12)

where $\sigma \in (0,1)$, $\gamma >0$ and $\Vert \frac{1}{\alpha _k}(\varvec{x}^{k+1}-\varvec{x}^k)\Vert$ is an optimality measure, which coincides with the norm of the gradient $\Vert \nabla \tilde{f}(\varvec{x}^k{;}\mu )\Vert$ in the unconstrained case $D={\mathbb {R}}^{n}$.

We can derive $\mu _k\rightarrow 0$ from (12). Assume $\mu _k=\bar{\mu }>0$ after a finite number of iterations, then the projected gradient update for a fixed objective function $\tilde{f}(\cdot {;}\bar{\mu })$ with an appropriate stepsize leads to $\Vert \frac{1}{\alpha _k}(\varvec{x}^{k+1}-\varvec{x}^k)\Vert \rightarrow 0$. This contradicts the fact that $\mu _k$ is fixed after finite iterations. Therefore, the update condition (12) is satisfied infinitely many times, which leads to $\mu _k\rightarrow 0$. This property is essential to guarantee the convergence to an optimal solution of the original problem. Technically, the smoothing projected gradient method has the following convergence guarantee: any accumulation point of $\{\varvec{x}^k\mid k\in {\mathbb {N}}\cup \{0\},\mu _{k+1}=\sigma \mu _k\}$ generated by the smoothing projected gradient method becomes a Clarke stationary point. See Zhang and Chen (2009) for the proof.

4 Smoothing method for worst-case topology optimization

In this section, we explain the details of the implementation of the smoothing method for problem (4). Although our algorithm is based on the smoothing projected gradient method (Zhang and Chen 2009), we propose some techniques to make it more efficient.

4.1 Smoothing function and projection

For simplicity, we omit $C(\cdot )$ in the argument of the eigenvalue functions and write, for example, $\lambda _{\textrm{max}}(\varvec{x})$ instead of $\lambda _{\textrm{max}}\left( C(\varvec{x})\right)$. Moreover, we denote the ith largest eigenvalue by $\lambda _i(\varvec{x})$ ($\lambda _{\textrm{max}}(\varvec{x})$ coincides with $\lambda _1(\varvec{x})$). A smoothing function of $\lambda _{\textrm{max}}$ can be written as follows:^{Footnote 2}

$$\begin{aligned} \tilde{\lambda }_{\textrm{max}}(\varvec{x}{;}\mu )&:=\mu \ln \left( \sum _{i=1}^d \exp \left( \frac{\lambda _i(\varvec{x})}{\mu }\right) \right) \nonumber \\&=\lambda _{\textrm{max}}(\varvec{x})+\mu \ln \left( \sum _{i=1}^d \exp \left( \frac{\lambda _i(\varvec{x})-\lambda _{\textrm{max}} (\varvec{x})}{\mu }\right) \right) . \end{aligned}$$

(13)

We use the formulation in the second equality to avoid numerical instability caused by large exponents. This smoothing function $\tilde{\lambda }_{\textrm{max}}(\varvec{x}{;}\mu )$ satisfies the gradient consistency (9), which is an important property to guarantee the convergence of smoothing methods.

The gradient of the smoothing function of the maximum eigenvalue is

$$\begin{aligned}&\nabla \tilde{\lambda }_{\textrm{max}}(\varvec{x}{;}\mu )\nonumber \\&\quad =\nabla \lambda _{\textrm{max}}(\varvec{x})\nonumber \\&\qquad +\frac{\sum _{i=1}^d \left( \nabla \lambda _i(\varvec{x})-\nabla \lambda _{\textrm{max}} (\varvec{x})\right) \exp \left( \frac{\lambda _i(\varvec{x}) -\lambda _{\textrm{max}}(\varvec{x})}{\mu }\right) }{\sum _{i=1}^d \exp \left( \frac{\lambda _i(\varvec{x})-\lambda _{\textrm{max}} (\varvec{x})}{\mu }\right) }. \end{aligned}$$

(14)

The gradient of the i-th eigenvalue $\nabla \lambda _i(\varvec{x})$ can be calculated based on (5) with the eigenvector $\varvec{\phi }_i$ corresponding to $\lambda _i$ instead of $\varvec{\phi }_{\textrm{max}}$. If there exists a multiplicity of the maximum eigenvalue, we can choose any maximum eigenvector and calculate $\nabla \lambda _{\textrm{max}}(\varvec{x})$ by (5). Note that in this case, the gradient of $\lambda _{\textrm{max}}(\varvec{x})$ does not exist, but $\nabla \tilde{\lambda }_{\textrm{max}}(\varvec{x}{;}\mu )$ can be computed by the formulae in (5) and (14).

Remark 1

The p-norm $\Vert \varvec{a}\Vert _p\ (p>1)$ is a smooth approximation of $\max \{\vert a_1\vert ,\ldots ,\vert a_d\vert \}$. As all the eigenvalues are nonnegative in our problem, we may use $\Vert \varvec{\lambda }\Vert _p$ as a smooth approximation of $\lambda _{\textrm{max}}$ (except for $\varvec{\lambda }=\varvec{0}$) where $\varvec{\lambda }$ is a vector with ith component equal to $\lambda _i$. However, we use the smoothing function (13) in this paper, because the theoretical studies such as the gradient consistency are more readily available in the literature (Chen 2012; Nesterov 2007).

The projection $\Pi _S(\varvec{x})$ onto the feasible set (1) can be calculated by

$$\begin{aligned} \Pi _S(\varvec{x})=\max \{0,\min \{1,\varvec{x}-\nu \varvec{v}\}\}, \end{aligned}$$

(15)

where $\max \{0,\cdot \}$ and $\min \{1,\cdot \}$ are operated to a vector componentwise and $\nu \in {\mathbb {R}}$ is the solution of a piecewise linear equation

$$\begin{aligned} \varvec{v}^{\textrm{T}}\max \{0,\min \{1,\varvec{x}-\nu \varvec{v}\}\}=V_0. \end{aligned}$$

We can compute $\nu$ efficiently by, e.g., the bisection method. See Nishioka and Kanno (2021, 2023) for more details about the projection. Therefore, we can apply the smoothing projected gradient method to problem (4). In the following sections, we propose some techniques to make the method more efficient.

4.2 Inertial technique

The smoothing projected gradient method is a very simple method. We can use the inertial (acceleration) technique proposed in Ochs et al. (2014) to accelerate the convergence and we call the proposed method the smoothing inertial projected method. For the general problem setting (7), the update scheme becomes

$$\begin{aligned} \varvec{x}^{k+1}=\Pi _D\left( \varvec{x}^k-\alpha _k\nabla \tilde{f}(\varvec{x}^{k}{;}\mu _k)+\beta _k(\varvec{x}^k-\varvec{x}^{k-1})\right) , \end{aligned}$$

(16)

where $\alpha _k,\beta _k>0$ are stepsize parameters and $\beta _k(\varvec{x}^k-\varvec{x}^{k-1})$ is called an inertial term. The computational cost per iteration is almost the same as the smoothing projected gradient method. Although there is no theoretical improvement in the convergence rate, we show a faster convergence than the smoothing projected gradient method in numerical experiments.

There are many variants of acceleration techniques (Li and Lin 2015; Ghadimi and Lan 2016; Carmon et al. 2018) known as accelerated gradient methods. However, some algorithms require evaluations of the objective value (and gradient) several times at each iteration. This means several FEAs are required and the computational cost per iteration becomes large. Thus, we use a simple scheme by Ochs et al. (2014). An accelerated version of smoothing methods have been studied for some specified problems in Bian and Wu (2021), Wang and Chen (2022).

4.3 Stepsize and smoothing parameters

To guarantee the convergence, we need to choose stepsize and smoothing parameters appropriately. The smoothing parameter $\mu _k$ for the smoothing inertial projected gradient method is updated by

$$\begin{aligned} \mu _{k+1}={\left\{ \begin{array}{ll} \sigma \mu _k\ \ \ \ &{}\text {if }\Vert \varvec{r} (\varvec{x}^k{;}\mu _k)\Vert <\gamma \mu _k,\\ \mu _k &{}\text {otherwise}, \end{array}\right. } \end{aligned}$$

(17)

where $\sigma \in (0,1)$, $\gamma >0$ , and

$$\begin{aligned} \varvec{r}(\varvec{x}^k{;}\mu _k)=\Pi _D(\varvec{x}^k-\nabla \tilde{f}(\varvec{x}^k{;}\mu _k))-\varvec{x}^k \end{aligned}$$

is the proximal residual (Ochs et al. 2014), which is the optimality measure for smoothly approximated problem (10), i.e., $\varvec{r}(\bar{\varvec{x}}{;}\mu )=\varvec{0}$ if and only if $\bar{\varvec{x}}$ is a stationary point of (10). The update rule of the smoothing parameter (17) is modified from (12) to guarantee the convergence when the inertial term is added. Similarly in various smoothing methods (Zhang and Chen 2009; Chen 2012; Xu et al. 2015), to guarantee the convergence, the smoothing parameter is updated when the optimality measure for a smoothly approximated problem becomes small enough. By using this strategy, we may combine smoothing methods with GCMMA or other convergent optimization algorithms, although the convergence proof may not be straightforward.

Stepsize parameters must be chosen such that $\Vert \varvec{r}(\varvec{x}^k{;}\mu _k)\Vert \rightarrow 0$ when $\mu _k$ is unchanged. Thereby, (17) is satisfied infinitely many times and $\mu _k\rightarrow 0$ is achieved. We can follow the same stepsize rule as Ochs et al. (2014) to achieve $\Vert \varvec{r}(\varvec{x}^k{;}\mu _k)\Vert \rightarrow 0$ when $\mu _k$ is unchanged. The parameter $L_k>0$ satisfying the following descent condition plays an important role for the convergence^{Footnote 3}:

$$\begin{aligned}&\tilde{f}(\varvec{x}^{k+1}{;}\mu _k)\nonumber \\&\quad \le \tilde{f}(\varvec{x}^{k}{;}\mu _k)+\langle \nabla \tilde{f}(\varvec{x}^{k}{;}\mu _k),\varvec{x}^{k+1}-\varvec{x}^{k}\rangle +\frac{L_k}{2}\Vert \varvec{x}^{k+1}-\varvec{x}^k\Vert ^2. \end{aligned}$$

(18)

By using such $L_k$, stepsize parameters for (16) can be determined as follows Ochs et al. (2014):

$$\begin{aligned}&\alpha _k=2(1-\beta _k)/(2a_2+L_k), \end{aligned}$$

(19a)

$$\begin{aligned}&\beta _k=\left. (b-1)\big / \left( b-\frac{1}{2}\right) \right. , \end{aligned}$$

(19b)

where $b=\left( a_1+\frac{L_k}{2}\right) /\left( a_2+\frac{L_k}{2}\right)$ and $a_1\ge a_2>0$ are constant parameters. Note that when $\alpha _k=1/L_k$ and $\beta _k\equiv 0$, the algorithm becomes the smoothing projected gradient method.

The Lipschitz constant L of the gradient of $\tilde{f}(\cdot {;}\mu _k)$ defined by

$$\begin{aligned} \Vert \nabla \tilde{f}(\varvec{x}{;}\mu _k)-\nabla \tilde{f}(\varvec{y}{;}\mu _k)\Vert \le L\Vert \varvec{x}-\varvec{y}\Vert \quad (\forall \varvec{x},\varvec{y}\in S) \end{aligned}$$

(20)

always satisfies the descent condition (18) (see, e.g., Beck (2017) for a proof). If L can be easily calculated for each $\mu _k$, we can use stepsizes (19) with $L_k=L$. However, L is hard to estimate beforehand in our problem. In addition, a constant stepsize is often inefficient compared with choosing smaller $L_k$, which means a larger stepsize, satisfying the descent condition (18) at some iterations.

We use the backtracking to find $L_k$ satisfying (18). We start with an initial value $s_k$ for $L_k$ and gradually increase it by multiplying some constant $\eta >1$ until (18) is satisfied. The check of the descent condition (18) requires the evaluation of the objective value ($\varvec{x}^{k+1}$ in the left-hand side of (18) depends on $L_k$). To reduce the number of evaluations of the objective value (the number of FEAs), we estimate the initial value for the backtracking procedure $s_k$ for $k=1,2,\ldots$ by

$$\begin{aligned} \max \left\{ L_{\textrm{min}},\frac{\Vert \nabla \tilde{f}(\varvec{x}^k{;}\mu _k)-\nabla \tilde{f}(\varvec{x}^{k-1}{;}\mu _{{k-1}})\Vert }{\Vert \varvec{x}^k -\varvec{x}^{k-1}\Vert }\right\} , \end{aligned}$$

(21)

where $L_{\textrm{min}}$ is a small positive constant to avoid the numerical instability and ensure the boundedness of $L_k$ (Nishioka and Kanno 2021, 2023). For $k=0$, we cannot use (21), thus choose sufficiently large $s_0=L_0>0$. Although the initial estimate (21) is a very simple estimate based on (20), it works well in our numerical experiments. Indeed, it is often the case that the initial estimate (21) itself satisfies the descent condition (18), and no additional evaluation of the objective value (FEA) is needed in numerical experiments.

Note that, in practice, when the change $\Vert \varvec{x}^{k+1}-\varvec{x}^k\Vert$ or $\mu _k$ becomes too small after a large number of iterations, the descent condition (18) will not be satisfied because of numerical error. In that case, we need to just terminate the iteration.

Based on all of the above discussions, the algorithm of the smoothing inertial projected gradient method for the worst-case topology optimization is described in Algorithm 1. A MATLAB code for Algorithm 1 is provided on Appendix B. The condition $\mu _k<\epsilon$ for small $\epsilon >0$ can be used for the stopping criterion. However, our preliminary numerical experiments demonstrate that the number of iterations required to satisfy $\mu _k<\epsilon$ heavily depends on $\sigma$, $\gamma$ , and $\mu _0>0$. Development of a practical stopping criterion is our future work. Each iteration of Algorithm 1 consists of vector additions, scalar multiplications, projections, and eigenvalue calculation of a small-size matrix other than FEA, and thus, the computational cost per iteration is small.

Now we can guarantee the convergence of the proposed method. We state the convergence theorem in a more general setting than the worst-case topology optimization. We consider a nonsmooth optimization problem (7) and assume the following: (i) The objective function $f:{\mathbb {R}}^{n}\rightarrow {\mathbb {R}}$ is locally Lipschitz continuous. (ii) There exists a smoothing function $\tilde{f}(\cdot {;}\mu )$ of f such that $\nabla \tilde{f}(\cdot {;}\mu )$ is Lipschitz continuous for all $\mu >0$ and satisfy the gradient consistency (9). (iii) The feasible set D is nonempty, closed, and convex, and the projection onto D is computationally tractable. In our problem (4), where $f=\lambda _{\textrm{max}}(C(\cdot ))$ and $D=S$ defined in (1), the above assumptions hold. Under the above assumptions, the following theorem holds. The proof is shown in Appendix A.

Theorem 1

(Convergence of the smoothing inertial projected gradient method) Let $\hat{K}=\{k\in {\mathbb {N}}\cup \{0\}\mid \mu _{k+1} =\sigma \mu _k\}$. Every accumulation point of the subsequence $\{\varvec{x}^k\}_{k\in \hat{K}}$ generated by the smoothing inertial projected gradient method (16) with the stepsize parameters (19) and $\varvec{x}^0\in D$ is a Clarke stationary point of problem (7).

Remark 2

The convergence guarantee in Theorem 1 is weaker than saying “$\{\varvec{x}^k\}$ converges to a Clarke stationary point.” Indeed, there is a possibility of $\{\varvec{x}^k\}$ to oscillate between multiple (infinitely many) Clarke stationary points (this also applies to other smoothing gradient methods (Zhang and Chen 2009; Chen 2012; Xu et al. 2015)). To rule out the possibility of oscillations in nonconvex optimization, we need stronger assumptions and more complicated arguments (see, e.g., Bolte et al. (2014)), which is beyond the scope of this paper. In our numerical experiments, no oscillation is observed.

5 Numerical experiments

All the experiments have been conducted on MacBook Pro (2019, 1.4 GHz Quad-Core Intel Core i5, 8 GB memory) and MATLAB R2022b. The MATLAB code for topology optimization is based on Andreassen et al. (2011) and Ferrari and Sigmund (2020). The following values are common in all the experiments: $E_0=1$, $E_{\textrm{min}}=10^{-3}$ , and the SIMP penalty parameter $p=3$. The construction of the local stiffness matrix is the same as Ferrari and Sigmund (2020). The Poisson ratio in the stiffness matrix is 0.3. The filter radius used for the density filter is 0.02 times the number of finite elements in the horizontal direction. The initial point of each algorithm is $\varvec{x}^0=(V_0/n)\varvec{1}$. The parameters of the proposed method are as follows: $L_0=1000$, $L_{\textrm{min}}=10^{-3}$, $\eta =1.5$, $a_1=0.1$, $a_2=10^{-6}$, $\mu _0=10$, $\sigma =0.99$ and $\gamma =0.1$.

5.1 Study of smoothing method

Before comparing the proposed method with the existing methods, we show the effectiveness of the smoothing method and proposed inertial technique. In this subsection, we consider a 2D L-shaped beam in Fig. 1 with the number of design variables $n=7500$ and the volume fraction $V_0/n=0.4$. For discretization of 2D L-shaped beam, we mesh a square domain and fix the elements in the upper right corner to void.

5.1.1 Effectiveness of adaptive smoothing parameter

To show the effectiveness of adjusting the smoothing parameter at each iteration by (17), we compare the smoothing inertial projected gradient method with adaptive, heuristic, and fixed smoothing parameters. “Adaptive” corresponds to the proposed update scheme (17). “Heuristic” updates the smoothing parameter every 40 iterations as $\mu _k=10,1,10^{-1},10^{-2},10^{-3}$. This kind of continuation scheme is often used in structural optimization. Two fixed smoothing parameters are a small value and a large value: $\mu _k\equiv 10^{-2}$ and $\mu _k\equiv 10^2$. The objective values at each iteration and the obtained designs after 200 iterations are shown in Figs. 3 and 4, respectively.

Fig. 3 shows that too large $\mu$ leads to convergence to a solution with a large objective value because the approximation of the objective function is inaccurate. Too small $\mu$ leads to slow convergence because only a small stepsize is acceptable to satisfy the descent condition (the smoothing function becomes ill-conditioned with small $\mu$). Indeed, Fig. 4 shows that the methods with small $\mu$ have not converged in 200 iterations. Also, the heuristic continuation scheme exhibits slow convergence because of a sudden change of the smoothing parameter. In contrast, adapting the smoothing parameter by (17) leads to fast and stable convergence to a better solution as shown in Figs. 3 and 4. With this adaptive update scheme, we do not need to set the update scheme manually like the conventional continuation scheme.

5.1.2 Effectiveness of inertial technique

We compare the smoothing method with the inertial technique, SmoothingIPG in (16), and the smoothing method without the inertial technique, SmoothingPG in (11). We also add the conventional projected gradient method with a fixed stepsize,^{Footnote 4} PG, which does not have the convergence guarantee in a nonsmooth optimization problem. The objective values at each iteration are shown in Fig. 5.

Fig. 5 shows the acceleration of the convergence by the inertial technique. Also, the projected gradient with a fixed stepsize oscillates and converges slowly because of the nonsmoothness of the objective function. This supports the effectiveness of the smoothing method.

5.2 Comparison with existing methods

We compare the proposed method with two existing methods, the MMA approach by Takezawa et al. (2011) which applies MMA to the original nonsmooth problem (no convergence guarantee), and the NSDP approach by Holmberg et al. (2015). The implementation of MMA is based on Svanberg (1987, 2022). The implementation of NSDP is based on the open-source MATLAB code fminsdp (Thore 2013) and the L-BFGS formula with 6 correction pairs is used for Hessian approximation in the interior-point method. We also show that GCMMA (Svanberg 2002) does not work in this nonsmooth problem, because GCMMA does not have the convergence guarantee in the problem which lacks the assumption of twice continuous differentiability of the objective and constraint functions.

5.2.1 2D L-shaped beam

We consider the same problem setting as in Sect. 5.1 where the multiplicity of eigenvalues occurs at an optimal solution. It occurs when the compliance value is the same for all directions of the applied force in 2D. The objective values at each iteration and the obtained designs after 200 iterations for two different initial designs are shown in Figs. 6, 7, 8, 9. The initial design 1 has uniform density. The initial design 2 is obtained by sampling each density from the uniform distribution over [0, 1] and projecting it onto the feasible set. The figures hereafter (Figs. 10, 11, 12, 13, 14, and 15) are results with the initial design 1. The two eigenvalues at each iteration of the three algorithms are shown in Figs. 10, 11, 12, and13. To see the change of order of the ith eigenvalues, $\lambda$ and $\lambda '$ are labeled by the directions of the eigenvectors. The average of computational time per iteration and the number of FEAs (6) for 200 iterations with different problem size n are shown in Figs. 14 and 15. Note that a practical computational time of an optimization algorithm heavily depends on a stopping criterion. Two existing methods (Takezawa et al. 2011; Holmberg et al. 2015) do not provide stopping criteria for their algorithms, and thus, it is difficult to compare the practical computational time and the number of FEAs required for each algorithm to converge. Therefore, we compare the performance for fixed number of iterations.

Experiments with two different initial designs show similar convergence results in Figs. 6, 7, 8 and 9. Different designs suggest that the problem has multiple local optimal solutions. As shown in Figs. 6, 7, 8, and 9 and 12, MMA oscillates and does not converge to an appropriate solution because of nonsmoothness. Figs. 6–7 and 13 show that GCMMA does not oscillate because it uses a line-search-like algorithm to ensure a sufficient decrease of the objective value at each iteration (hence globally convergent for a sufficiently smooth problem). However, its convergence is very slow, and its computational cost per iteration and numbers of FEAs are very large as shown in Figs. 14 and 15 because the line-search-like algorithm works poorly with nonsmoothness (discontinuous change of gradient). In contrast, the proposed method and NSDP converge fine and obtain clear solutions because they have the convergence guarantees. Figs. 10, 11, 12 and 13 show that the maximum eigenvalue switch and the objective value oscillate especially in MMA. Although Fig. 10 suggests that the proposed method is still in the middle of convergence for 200 iterations as the eigenvalues do not multiply, Figs. 6, 7, 8 and 9 show it converges to a reasonable design faster than the others. Moreover, the computational time per iteration of the proposed method is shorter than the others as shown in Fig. 14. The computational time per iteration of NSDP increases faster than the proposed method because of more complicated algorithm with solutions of linear equations. Figs. 14 and 15 suggest that the computational cost of the optimization procedure of NSDP excluding FEA is not negligible as it conducts single FEA at each iteration in most cases and still has higher computational cost per iteration. Less rapid increase of the computational cost per iteration of the proposed method suggests it works well even for larger problems. The proposed method solves a nonsmooth problem directly and utilizes the problem structure; hence, it is expected to be more efficient than other methods. Note that Fig. 15 also suggests that the estimation (21) is good enough and no additional evaluation of the objective value (FEA) to adjust the stepsize is needed in many cases.

5.2.2 3D cantilever

We consider a 3D setting where the number of columns of $C(\varvec{x})$ is $d=3$ and the triple multiplicity of eigenvalues occurs near an optimal solution as shown in Fig. 16. We construct such a problem setting by utilizing symmetry. The compliance caused by a load in X-direction is exactly the same as the one caused by a load in Y-direction because of the symmetry. This suggests two of three eigenvalues are always the same at a point corresponding to a symmetric structure. Moreover, by adjusting the magnitude of uncertainty in the Z-direction, we can construct the case where the triple multiplicity occurs (compliance becomes constant for all the directions of the load in the uncertainty set). Fig. 17 shows that the triple multiplicity of eigenvalues occurs in iterations of MMA. To see the change of order of the ith eigenvalues, $\lambda$, $\lambda '$ , and $\lambda ''$ are labeled by the directions of the eigenvectors. The number of design variables is $n=27000$ and the volume fraction is $V_0/n=0.1$.

The objective values at each iteration and the obtained designs after 300 iterations are shown in Figs. 18 and 19, respectively.

Although NSDP and MMA are less oscillatory, Figs. 18 and 19 show the similar results as in 2D L-shaped beam. Drastic oscillations of MMA for first few iterations lead to slow convergence.

6 Conclusion

We proposed a smoothing method for the worst-case topology optimization under load uncertainty. It consists of simple update scheme and is easy to implement. It has a low computational cost per iteration even in a large-scale problem, and has the convergence guarantee to a solution satisfying the first-order optimality and converges fast suppressing oscillation. In a nonsmooth optimization problem, the convergence guarantee is especially important to obtain an optimal solution properly and efficiently; otherwise, the sequence becomes oscillatory or converges to a non-optimal point. Moreover, the proposed method exploits the problem structure, e.g., use of projection and smooth approximation to specific types of objective and constraints, and therefore, it is often more efficient than the general-purpose nonlinear optimization algorithm such as the interior-point method.

There may be room for developing more efficient optimization algorithms specifically designed for eigenvalue optimization problems. In contrast, one of the advantages of the smoothing method is that it is simple and can be applied to various nonsmooth optimization problems. It can be combined with many optimization algorithms and treated nonlinear constraints.

In future work, we extend the idea and apply the smoothing method to other types of nonsmooth structural (topology) optimization problems such as vibration and buckling problems. In these examples, the dimension d of the matrix of which the eigenvalue is optimized is very large. Therefore, an additional strategy may be necessary to reduce the computational cost of the smooth approximation. Moreover, we may need other smoothing methods such as smoothing augmented Lagrangian method (Xu et al. 2015) to treat nonlinear constraints. Moreover, development of a practical optimality measure and a stopping criterion for this kind of nonsmooth nonconvex optimization problems is required for a practical use of smoothing methods.

Notes

Wolfe’s example is such an example where the steepest descent method converges to a non-optimal point in a nonsmooth problem. See Beck (2017), for example.
A smoothing function of the max function often called the log-sum-exp function or, in structural optimization, the Kreisselmeier–Steinhauser (KS) function is used here.
GCMMA (Svanberg 2002) also uses a different kind of descent condition which ensures a sufficient decrease of the objective value at each iteration. This kind of condition is essential in many optimization algorithms to guarantee the convergence.
The update rule of the variable is as follows: $\varvec{x}^{k+1}=\Pi _S\left( \varvec{x}^k-\alpha \nabla f(\varvec{x}^k)\right)$ with $\alpha =0.1$.
$\mu _N'$ in Theorem 4.14 in Ochs et al. (2014) corresponds to $\underset{0\le i\le N}{\min }\Vert \varvec{r}(\varvec{x}^i;\bar{\mu })\Vert$ in this paper, and it decreases in inverse proportion to the iteration number as the theorem claims.

References

Andreassen E, Clausen A, Schevenels M, Lazarov BS, Sigmund O (2011) Efficient topology optimization in MATLAB using 88 lines of code. Struct Multidisc Optim 43(1):1–16
MATH Google Scholar
Apkarian P, Noll D, Prot O (2008) A trust region spectral bundle method for nonconvex eigenvalue optimization. SIAM J Optim 19(1):281–306
MathSciNet MATH Google Scholar
Beck A (2017) First-order methods in optimization. SIAM, Philadelphia
MATH Google Scholar
Bendsøe MP, Sigmund O (2004) Topology optimization: theory, methods and application, 2nd edn. Springer, Berlin
MATH Google Scholar
Bertsekas DP (1975) Nondifferentiable optimization via approximation. In: Balinski ML, Wolfe P (eds) Nondifferentiable optimization. Springer, Berlin, pp 1–25
MATH Google Scholar
Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena Scientific, MA
MATH Google Scholar
Bian W, Wu F (2021) Accelerated forward-backward method with fast convergence rate for nonsmooth convex optimization beyond differentiability. arXiv preprint arXiv:2110.01454
Bolte J, Sabach S, Teboulle M (2014) Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math Program 146(1):459–494
MathSciNet MATH Google Scholar
Bourdin B (2001) Filters in topology optimization. Int J Numer Methods Eng 50(9):2143–2158
MathSciNet MATH Google Scholar
Carmon Y, Duchi JC, Hinder O, Sidford A (2018) Accelerated methods for nonconvex optimization. SIAM J Optim 28(2):1751–1772
MathSciNet MATH Google Scholar
Chen X (2012) Smoothing methods for nonsmooth, nonconvex minimization. Math Program 134(1):71–99
MathSciNet MATH Google Scholar
Cherkaev E, Cherkaev A (2004) Principal compliance and robust optimal design. In: Man C-S, Fosdick RL (eds) The rational spirit in modern continuum mechanics. Springer, Dordrecht, pp 169–196
Google Scholar
Cherkaev E, Cherkaev A (2008) Minimax optimization problem of structural design. Comput Struct 86(13–14):1426–1435
MATH Google Scholar
Clarke FH (1990) Optimization and nonsmooth analysis. SIAM, Philadelphia
MATH Google Scholar
d’Aspremont A, Scieur D, Taylor A (2021) Acceleration methods. Foundations Trends Optim 5(1–2):1–245
Google Scholar
Díaaz AR, Kikuchi N (1992) Solutions to shape and topology eigenvalue optimization problems using a homogenization method. Int J Numer Methods Eng 35(7):1487–1502
MathSciNet MATH Google Scholar
Dunning PD, Kim HA (2013) Robust topology optimization: minimization of expected and variance of compliance. AIAA J 51(11):2656–2664
Google Scholar
Ferrari F, Sigmund O (2019) Revisiting topology optimization with buckling constraints. Struct Multidisc Optim 59(5):1401–1415
MathSciNet Google Scholar
Ferrari F, Sigmund O (2020) A new generation 99 line MATLAB code for compliance topology optimization and its extension to 3D. Struct Multidisc Optim 62(4):2211–2228
MathSciNet Google Scholar
Ghadimi S, Lan G (2016) Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math Program 156:59–99
MathSciNet MATH Google Scholar
Goldstein AA (1964) Convex programming in Hilbert space. Bull Am Math Soc 70(5):709–710
MathSciNet MATH Google Scholar
Helmberg C, Rendl F (2000) A spectral bundle method for semi-definite programming. SIAM J Optim 10(3):673–696
MathSciNet MATH Google Scholar
Holmberg E, Thore C-J, Klarbring A (2015) Worst-case topology optimization of self-weight loaded structures using semi-definite programming. Struct Multidisc Optim 52(5):915–928
MathSciNet Google Scholar
IPOPT (2022) Ipopt Documentation. https://coin-or.github.io/Ipopt/. Accessed Sept 30
Kanno Y (2020) On three concepts in robust design optimization: absolute robustness, relative robustness, and less variance. Struct Multidisc Optim 62(2):979–1000
MathSciNet Google Scholar
Kočvara M (2002) On the modelling and solving of the truss design problem with global stability constraints. Struct Multidisc Optim 23(3):189–203
Google Scholar
Lewis AS, Overton ML (1996) Eigenvalue optimization. Acta Numerica 5:149–190
MathSciNet MATH Google Scholar
Li H, Lin Z (2015) Accelerated proximal gradient methods for nonconvex programming. Adv Neural Inf Process Syst 28:379–387
Google Scholar
Li W, Zhang XS (2021) Momentum-based accelerated mirror descent stochastic approximation for robust topology optimization under stochastic loads. Int J Numer Methods Eng 122(17):4431–4457
MathSciNet Google Scholar
Nesterov YE (1983) A method of solving a convex programming problem with convergence rate ${O}(1/k^2)$. Soviet Math Doklady 269:543–547
Google Scholar
Nesterov Y (2005) Smooth minimization of non-smooth functions. Math Program 103(1):127–152
MathSciNet MATH Google Scholar
Nesterov Y (2007) Smoothing technique and its applications in semidefinite optimization. Math Program 110(2):245–259
MathSciNet MATH Google Scholar
Nishioka A, Kanno Y (2021) Accelerated projected gradient method with adaptive step size for compliance minimization problem. JSIAM Lett 13:33–36
MathSciNet MATH Google Scholar
Nishioka A, Kanno Y (2023) Inertial projected gradient method for large-scale topology optimization. Jpn J Ind Appl Math. https://doi.org/10.1007/s13160-023-00563-0
Ochs P, Chen Y, Brox T, Pock T (2014) iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J Imaging Sci 7(2):1388–1419
MathSciNet MATH Google Scholar
Oden JT, Demkowicz LF (1996) Applied functional analysis. CRC Press, Boca Raton
MATH Google Scholar
Ohsaki M, Fujisawa K, Katoh N, Kanno Y (1999) Semi-definite programming for topology optimization of trusses under multiple eigenvalue constraints. Comput Methods Appl Mech Eng 180(1–2):203–217
MATH Google Scholar
Rockafellar RT (1970) Convex analysis. Princeton University Press, Princeton
MATH Google Scholar
Rockafellar RT, Wets RJ-B (1998) Variational analysis. Springer, Heidelberg
MATH Google Scholar
Seyranian AP, Lund E, Olhoff N (1994) Multiple eigenvalues in structural optimization problems. Struct Optim 8(4):207–227
Google Scholar
Svanberg K (2022) Svanberg matematisk optimering och IT AB. http://www.smoptit.se. Accessed Jan 21
Svanberg K (1987) The method of moving asymptotes—a new method for structural optimization. Int J Numer Methods Eng 24(2):359–373
MathSciNet MATH Google Scholar
Svanberg K (2002) A class of globally convergent optimization methods based on conservative convex separable approximations. SIAM J Optim 12(2):555–573
MathSciNet MATH Google Scholar
Takezawa A, Nii S, Kitamura M, Kogiso N (2011) Topology optimization for worst load conditions based on the eigenvalue analysis of an aggregated linear system. Comput Methods Appl Mech Eng 200:2268–2281
MathSciNet MATH Google Scholar
Thore C-J (2013) fminsdp—a code for solving optimization problems with matrix inequality constraints. Freely available at http://www.mathworks.com/matlabcentral/fileexchange/43643-fminsdp
Thore C-J (2022) A worst-case approach to topology optimization for maximum stiffness under uncertain boundary displacement. Comput Struct 259:106696
Google Scholar
Torii AJ, de Faria JR (2017) Structural optimization considering smallest magnitude eigenvalues: a smooth approximation. J Braz Soc Mech Sci Eng 39:1745–1754
Google Scholar
Wächter A, Biegler LT (2006) On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math Program 106(1):25–57
MathSciNet MATH Google Scholar
Wang W, Chen Y (2022) An accelerated smoothing gradient method for nonconvex nonsmooth minimization in image processing. J Sci Comput 90(1):1–28
MathSciNet MATH Google Scholar
Xu M, Ye JJ, Zhang L (2015) Smoothing augmented Lagrangian method for nonsmooth constrained optimization problems. J Global Optim 62(4):675–694
MathSciNet MATH Google Scholar
Yamada S, Kanno Y (2016) Relaxation approach to topology optimization of frame structure under frequency constraint. Struct Multidisc Optim 53(4):731–744
MathSciNet Google Scholar
Yang R, Chen C (1996) Stress-based topology optimization. Struct Optim 12(2):98–105
Google Scholar
Zang I (1980) A smoothing-out technique for min-max optimization. Math Program 19(1):61–77
MathSciNet MATH Google Scholar
Zhang C, Chen X (2009) Smoothing projected gradient method and its application to stochastic linear complementarity problems. SIAM J Optim 20(2):627–649
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research is part of the results of Value Exchange Engineering, a joint research project between R4D, Mercari, Inc., and the RIISE. The work of the second author is partially supported by JST CREST Grant No. JPMJCR1911, Japan, the Kajima Foundation’s Research Grant, and JSPS KAKENHI 21K04351.

Funding

Open access funding provided by The University of Tokyo.

Author information

Authors and Affiliations

Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, 1138656, Tokyo, Japan
Akatsuki Nishioka & Yoshihiro Kanno
Mathematics and Informatics Center, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, 1138656, Tokyo, Japan
Yoshihiro Kanno

Authors

Akatsuki Nishioka
View author publications
You can also search for this author in PubMed Google Scholar
Yoshihiro Kanno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akatsuki Nishioka.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Replication of results

A MATLAB code of the main part of Algorithm 1 is provided in Appendix B. One can run this code together with an implementation of the finite element method of the MATLAB code in Ferrari and Sigmund (2020) with slight modification. Please contact the corresponding author for the complete code.

Additional information

Responsible Editor: Jun Wu

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Theoretical convergence guarantee

The proof of Theorem 1 is based on Zhang and Chen (2009), Chen (2012), and Ochs et al. (2014).

Lemma 2

When $\mu _k=\bar{\mu }\ (\forall k\ge \bar{k})$ for some $\bar{\mu }>0$, the update rule (16) with the stepsize parameters (19) satisfies

$$\begin{aligned} \Vert \varvec{r}(\varvec{x}^k;\mu _k)\Vert \rightarrow 0. \end{aligned}$$

Proof

We can directly apply Theorem 4.14^{Footnote 5} in Ochs et al. (2014) as $\tilde{f}(\cdot ,\bar{\mu })$ is unchanged and smooth. $\square$

Lemma 3

Algorithm 1 satisfies

$$\begin{aligned} \mu _k\rightarrow 0. \end{aligned}$$

Proof

Suppose there exists $\bar{\mu }>0$ such that $\mu _k=\bar{\mu }$ for all $k\ge \bar{k}$. By the update rule of the smoothing parameter (17), we have $\Vert \varvec{r}(\varvec{x}^k;\mu _k)\Vert \ge \gamma \bar{\mu }$ for all $k\ge \bar{k}$. This contradicts to Lemma 2. $\square$

Lemma 4

(A property of projection) Let $\varvec{y}\in {\mathbb {R}}^{n}$. The projection of $\varvec{y}$ onto a nonempty closed convex set $D\subset {\mathbb {R}}^{n}$ satisfies

$$\begin{aligned} \langle \varvec{y}-\Pi _D(\varvec{y}),\varvec{x}-\Pi _D(\varvec{y}) \rangle \le 0 \ \ \ \ (\forall \varvec{x}\in D). \end{aligned}$$

Proof

See, e.g., Proposition 2.1.3 in Bertsekas (1999) $\square$

Proof of Theorem 1

Let $\varvec{x}^*$ be any accumulation point of the sequence $\{\varvec{x}^k\}_{k\in \hat{K}}$ generated by the smoothing inertial projected gradient method. Take a convergent subsequence $\{\varvec{x}^k\}_{k\in \bar{K}}$ where $\bar{K}\subset \hat{K}$ such that $\underset{k\rightarrow \infty ,\ k\in \bar{K}}{\lim }\varvec{x}^{k}=\varvec{x}^*$. By Lemma 3 and the gradient consistency (9), there exists $\varvec{g}\in \partial f(\varvec{x}^*)$ such that

$$\begin{aligned} \underset{k\rightarrow \infty ,\ k\in \bar{K}}{\lim }\nabla \tilde{f}(\varvec{x}^{k};\mu _{k})=\varvec{g}. \end{aligned}$$

(A1)

By the update rule of the smoothing parameter (17), we obtain

$$\begin{aligned} \Vert \varvec{r}(\varvec{x}^k;\mu _k)\Vert <\gamma \mu _k\ \ \ \ (\forall k\in \hat{K}). \end{aligned}$$

Therefore, by Lemma 3, we have

$$\begin{aligned} \Vert \varvec{r}(\varvec{x}^k;\mu _k)\Vert \rightarrow 0\ \ \ \ (k\rightarrow \infty ,\ k\in \hat{K}). \end{aligned}$$

(A2)

By applying Lemma 4 for $\varvec{y}=\varvec{x}^{k}-\nabla \tilde{f}(\varvec{x}^{k};\mu _{k})$ and defining $\hat{\varvec{x}}^{k}=\Pi _D(\varvec{y})$, we obtain

$$\begin{aligned}&\langle \varvec{x}^{k}-\nabla \tilde{f}(\varvec{x}^{k};\mu _{k}) -\hat{\varvec{x}}^{k},\varvec{x}-\hat{\varvec{x}}^{k} \rangle \nonumber \\&\quad = \langle \varvec{r}(\varvec{x}^k;\mu _k)-\nabla \tilde{f} (\varvec{x}^{k};\mu _{k}),\varvec{x}-\hat{\varvec{x}}^{k} \rangle \nonumber \\&\quad \le 0\ \ \ \ (\forall \varvec{x}\in D). \end{aligned}$$

(A3)

By taking the limit $k\rightarrow \infty \ (k\in \bar{K})$ in (A3) and using (A1) and (A2), we obtain

$$\begin{aligned} \langle \varvec{g},\varvec{x}^*-\varvec{x}\rangle \le 0\ \ \ \ (\forall \varvec{x}\in D) \end{aligned}$$

for $\varvec{g}\in \partial f(\varvec{x}^*)$; hence, $\varvec{x}^*$ is a Clarke stationary point. Note that $\hat{\varvec{x}}^{k}\rightarrow \varvec{x}^*$ because $\Vert \varvec{r}(\varvec{x}^k;\mu _k)\Vert =\Vert \hat{\varvec{x}}^{k}-\varvec{x}^{k}\Vert \rightarrow 0$ and $\varvec{x}^{k}\rightarrow \varvec{x}^*$. $\square$

In our problem (4), the feasible set (1) is compact, and thus, the existence of an accumulation point and a convergent subsequence of $\{\varvec{x}^k\}_{k\in \hat{K}}$ is guaranteed by the Bolzano–Weierstrass theorem (see, e.g., Theorem 4.9.1 in Oden and Demkowicz (1996)).

Appendix B: MATLAB code for Algorithm 1

We provide an implementation of a main part of Algorithm 1. Due to copyright issue, we only include a MATLAB code for optimization part. To use the following code, function eigenvalue_analysis is required. Its input is $n\times 1$ vector x (optimization variables) and structure fem containing parameters required for FEA (6). Its output is $d\times 1$ vector lambda containing eigenvalues of $C(\varvec{x})$ in descending order and $n\times d$ vector lambda_grad (e, i)-th component of which corresponds to the derivative of $\lambda _i$ with respect to $x_e$ calculated by (5). The function eigenvalue_analysis made by the authors contains a part of the MATLAB code in Ferrari and Sigmund (2020). Please contact the corresponding author for the complete code.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Nishioka, A., Kanno, Y. Smoothing inertial method for worst-case robust topology optimization under load uncertainty. Struct Multidisc Optim 66, 82 (2023). https://doi.org/10.1007/s00158-023-03543-7

Download citation

Received: 28 October 2022
Revised: 06 March 2023
Accepted: 07 March 2023
Published: 24 March 2023
DOI: https://doi.org/10.1007/s00158-023-03543-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Smoothing inertial method for worst-case robust topology optimization under load uncertainty

Abstract

Similar content being viewed by others

Stability constraints for geometrically nonlinear topology optimization

Stochastic Sensitivity Analysis for Robust Topology Optimization

Non-gradient Robust Topology Optimization Method Considering Loading Uncertainty

1 Introduction

2 Worst-case topology optimization

3 Fundamentals of smoothing methods

3.1 Subgradient and optimality condition

3.2 Smoothing function

3.3 Smoothing method

3.4 Smoothing projected gradient method

4 Smoothing method for worst-case topology optimization

4.1 Smoothing function and projection

Remark 1

4.2 Inertial technique

4.3 Stepsize and smoothing parameters

Theorem 1

Remark 2

5 Numerical experiments

5.1 Study of smoothing method

5.1.1 Effectiveness of adaptive smoothing parameter

5.1.2 Effectiveness of inertial technique

5.2 Comparison with existing methods

5.2.1 2D L-shaped beam

5.2.2 3D cantilever

6 Conclusion

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Replication of results

Additional information

Publisher's Note

Appendices

Appendix A: Theoretical convergence guarantee

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Proof of Theorem 1

Appendix B: MATLAB code for Algorithm 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation