1 Introduction

In almost all electric design procedures, numerical optimization is employed as one of the last design steps in order to optimize the device’s performance and efficiency, to minimize its weight and size and to save on material and manufacturing costs. Often, the quality of this optimization step indirectly determines the success of the product and, hence, the market position of the company. The reliability, accuracy and computational cost of the numerical optimization procedure becomes in itself a subject of competition. This paper illustrates that shape optimization can be improved substantially when finite-element (FE) analysis procedures are equipped with piecewise affine parametrization or design elements, such that well-performing deterministic optimization methods become applicable.

Impressive technical improvements have been achieved by numerical optimization on the basis of magnetic equivalent circuits or 2D and 3D FE models. All have led to highly optimized designs, for example, for permanent-magnet synchronous machines (PMSMs) in automotive applications. Since three decades, FE-based optimization has been addressed in several text books (see, e.g., [12]) and hundreds of journal articles (see, e.g., [14] and the references therein). Although originally, gradient-based methods were preferred (see, e.g., [48, 56, 58]), already for more than two decades, stochastic algorithms are more popular (see, e.g., [19, 33]). The majority of the proposed procedures opt for stochastic or population-based optimization methods, such as genetic algorithms and particle swarm optimization (see, e.g., [34]), because they allow to use FE solvers as a black box, they can easily consider geometric parameters, their parallelization is straightforward and they are more likely to find the global optimum. Stochastic algorithms have been used for robust optimization, have been applied together with surrogate modeling and have been extended to multi-objective optimization problems [3, 23]. In particular for PMSMs, optimization with stochastic methods became the method of choice [2, 9, 51].

The trend toward stochastic optimization combined with FE analysis continues without restraint, as is illustrated by the number of according contributions at recent conferences. This paper partially counteracts this tendency by turning back to deterministic optimization algorithms. Deterministic optimization methods are known to converge faster than stochastic optimization methods, albeit possibly to a local optimum. Moreover, the analysis of gradient-based methods is more mature, allowing for a rigorous control of mesh discretization errors, for instance. The main drawback of many deterministic methods is, however, the necessity to provide derivatives, which is particularly cumbersome when optimization according to geometric parameters is pursued. This drawback is here addressed explicitly and is alleviated by piecewise affine parametrizations of the geometry or by the design element approach. The overall deterministic optimization routine is shown to outperform the most popular stochastic algorithms by factors. Moreover, the optimization method will be robustified to include uncertainties on the design parameters.

The paper is structured as follows: Sect. 2 recalls the basics of mathematical optimization. It clearly distinguishes between deterministic methods (Sect. 2.3) and particle swarm optimization as a relevant representative of stochastic methods (Sect. 2.4). Furthermore, an extension to robust optimization is discussed in Sect. 2.5. Section 3 deals with FE analysis of magnetodynamic fields. The core parts of the paper are Sect. 3.3.1 about affine parametrization and Sect. 3.3.2 about design elements, both facilitating and improving the calculation of derivatives with respect to geometric parameters. The superior performance of gradient-type deterministic optimization is illustrated for a benchmark example in Sect. 4 and for a PMSM in Sect. 5. Conclusions are formulated in Sect. 6.

2 Constrained optimization

2.1 Constrained optimization problem

The optimization is carried out with respect to I design parameters \(\mathbf{p}=(p_1,p_2,\ldots ,p_I)\) belonging to the admissible set \({\mathcal {P}}_{\mathrm {ad}} =\{\mathbf{p}\in {\mathbb {R}}^I|G_m(\mathbf{p})\le 0,m=1,\ldots ,M\}\), where \(G_m(\mathbf{p})\) denote the constraints. The design parameters can be any continuous variables, for example material constants, excitation parameters and geometric sizes or positions. The constraints limit the admissible range of these parameters, for example to preserve the topology of the geometry or to set physical and operational constraints. Discrete design parameters are not considered in this work, although many methods apply, for example as part of a branch-and-bound technique, to mixed-integer optimization problems as well [21].

The optimization goal is represented by the objective function \(J(\mathbf{p})\) returning a scalar value for every set of design parameters. Relevant quantities are, for example force, torque, current, efficiency, weight, temperature or a combination thereof. When N objective functions \(J_n(\mathbf{p})\), \(n=1,\ldots ,N\) are relevant, a possible approach is to combine them with user-defined weight factors \(\alpha _n\) into a single cost function \(J(\mathbf{p})=\sum _{n=1}^N \alpha _n J_n(\mathbf{p})\). The optimization problem then reads

$$\begin{aligned} \underset{\mathbf{p}\in {\mathbb {R}}^I}{\text {minimize}}&\quad J(\mathbf{p}) \text {,}\end{aligned}$$
(1a)
$$\begin{aligned} \text {subject to}&\quad G_m(\mathbf{p}) \le 0,\quad m=1,\ldots ,M \text {.}\end{aligned}$$
(1b)

In this work, the evaluation of \(G_m(\mathbf{p})\) and/or \(J(\mathbf{p})\) involves a FE analysis of the device. Hence, the computational performance of the overall approach is heavily determined by the number of FE-solver calls.

2.2 Optimization methods

The selection of a particular optimization method consists of four essentially independent choices (see also Table 3 in [18]).

  • Problem (1) considers a single optimization goal. For a multi-objective optimization problem, a Pareto front is calculated such that the relative importance of the optimization goals can be fixed later on [7, 12]. This paper does not further consider multi-objective optimization. Nonetheless, the developed techniques are applicable to multi-objective optimization as well.

  • A distinction is made between global optimization and local optimization, where the first strives to a global optimum, whereas the second may run into a local one. This paper is limited to methods for local optimization. In practice, if a global optimum is required, the methods may be repeated for different start values, or could be embedded as part of a global optimization scheme [24].

  • Especially when the evaluation of the objective function is computationally expensive, it is recommended to carry out the optimization method on the basis of a surrogate model (indirect optimization methods). Such a simplified model can be obtained by expert knowledge on the application [59], by design space reduction [17], by a response surface methodology [17] or by space mapping [29] or manifold mapping [15]. Here, a direct optimization procedure is used. All ideas presented here can, however, be used in combination with indirect optimization approaches as well [30].

  • The result from a nominal optimization is a set of optimized design parameters leading to an optimum of the objective function. The optimum may, however, become irrelevant when it is highly sensitive to uncertainties in the design parameters. One speaks about robust optimization when the optimization is carried out taking such uncertainties into account. In this paper, both nominal and robust optimization methods are considered. An approach for robustification is discussed in Sect. 2.5.

  • Two families of basic optimization methods exist: deterministic and stochastic methods. Among the stochastic methods, genetic algorithms [35], differential evolution [38] and particle swarm optimization (PSO) [27] are well known.

This paper motivates the use of a gradient-based deterministic method for nominal and robust optimization and compares it with a standard particle swarm technique.

2.3 Gradient-based deterministic method

This work proposes to solve (1) by standard sequential quadratic programming (SQP) with damped Broyden–Fletcher–Goldfarb–Shanno (BFGS) updates for the Hessian approximation [22, 40]. This method establishes locally a second-order convergence, which means that

$$\begin{aligned} |J(\mathbf{p}_{k+1}) - J(\mathbf{p}_\text {opt})| \le C|J(\mathbf{p}_k) - J(\mathbf{p}_\text {opt})|^2 \end{aligned}$$
(2)

for \(C>0\) and k the iteration step, which should be sufficiently large. The method, however, requires knowledge about the sensitivities of the objective function with respect to the design parameters, in particular, \(\nabla _\mathbf{p}J(\mathbf{p})\) or, alternatively, a locally quadratic approximation of the objective function [44]. Many FE solution and post-processing routines do not provide this information, especially when geometric design parameters are involved. Therefore, one is tempted to approximate the sensitivities by finite differences as in, e.g., [48]. This is, however, known to be particularly cumbersome because of the limited accuracy of the finite differences [58]. Even when relying on gradient-free deterministic methods (e.g., [44, 45]), artifacts caused by FE analysis may hamper the convergence of the optimization routines. Eventually, as apparently the only option, deterministic optimization algorithms are abandoned in favor of stochastic approaches. This paper, however, sticks to gradient-based deterministic methods by complementing the FE simulation procedure with sensitivity information. The problems caused by the presence of geometric parameters are alleviated by introducing piecewise affine parametrization (see Sect. 3.3.1) or, alternatively, design elements (see Sect. 3.3.2) to the FE procedure.

2.4 Particle swarm optimization

Particle swarm optimization (PSO) [27] belongs to the broad class of Stochastic algorithms and is particularly popular for optimizing electric machines (see, e.g., [2, 3, 9, 23, 34]). In PSO, a set of Q particles indicated by \(q=1,\ldots ,Q\) moves through the admissible set in the design space in search of an optimum. At each iteration step k, the algorithm evaluates the objective function \(J(\mathbf{p})\) in every particle position \(\mathbf{p}_{k,q}\). The newly obtained values are compared to the previous best values in the individual particle histories and the best value of the entire swarm. The corresponding best sets are denoted by \({\hat{\mathbf{p}}}_q\) and \({\hat{\mathbf{p}}}_\text {swarm}\), respectively. The velocities of the particles are updated according to

$$\begin{aligned} \mathbf {v}_q \leftarrow \underbrace{\omega _0\mathbf {v}_q}_\text {1)} + \underbrace{\omega _1 {\mathbf {N}}_1 ({\hat{\mathbf{p}}}_q-\mathbf{p}_{k,q})}_\text {2)} + \underbrace{\omega _2 {\mathbf {N}}_2 ({\hat{\mathbf{p}}}_\text {swarm}-\mathbf{p}_{k,q})}_\text {3)},\nonumber \\ \end{aligned}$$
(3)

where \(\omega _0\), \(\omega _1\) and \(\omega _2\) are swarm characteristic constants and \({\mathbf {N}}_1\) and \({\mathbf {N}}_2\) are two random diagonal matrices with elements in [0, 1] generated independently and uniformly for each particle at every step, representing the free will of the swarm. The components of the velocity update are:

  1. 1.

    Maintain a part of the current velocity;

  2. 2.

    Head toward the particle’s best found point (\({\hat{\mathbf{p}}}_q\));

  3. 3.

    Head toward the swarm’s best found point (\({\hat{\mathbf{p}}}_\text {swarm}\)).

If at some iteration there is a particle that leaves the admissible set, its position is projected on the boundary of the admissible set. Initially, all particles are randomly and uniformly distributed in the admissible set and the initial velocities are set to 0. The particle swarm is a gradient-free method and works for non-smooth functions as well. The iteration ends when a maximum number of iterations is reached, or when the majority of the particles are close enough to the best point \({\hat{\mathbf{p}}}_\text {swarm}\):

$$\begin{aligned} \frac{1}{Q}\sum \limits _{q=1}^{Q}\Vert {\hat{\mathbf{p}}}_\text {swarm}-\mathbf{p}_{k,p}\Vert _2 < \epsilon \text {,}\end{aligned}$$
(4)

with a user-defined tolerance \(\epsilon \), or if there is no further change in the global best point \({\hat{\mathbf{p}}}_\text {swarm}\) over \(N_\text {stall}\) consecutive iterations.

2.5 Robust optimization

In a nominal optimization procedure, one is looking for the minimum value of an objective function. However, during manufacturing, small deviations can occur on the parameters. As a consequence, the optimal solution may become suboptimal in reality. Robust optimization searches for an optimum that is not too much affected by the expected parameter deviations [41, 60].

One possibility is to optimize such that the worst-case scenario within a stochastic set of possibilities around the optimal design parameters is the best possible. The robust counterpart of (1) adopting a worst-case scenario is

$$\begin{aligned}&\underset{\mathbf{p}\in {\mathbb {R}}^I}{\text {minimize}}&\quad&\max _{\varvec{\delta }\in U} J(\mathbf{p}+\varvec{\delta }) \text {,}\end{aligned}$$
(5a)
$$\begin{aligned}&\text {subject to}&\quad&\max _{\varvec{\delta }\in U}G_m(\mathbf{p}+\varvec{\delta }) \le 0, \quad m=1,\ldots ,M \text {.}\end{aligned}$$
(5b)

Here, the uncertainty set for the deviations \(\varvec{\delta }\) is defined by

$$\begin{aligned} U:= & {} \left\{ \varvec{\delta }\in {\mathbb {R}}^n\,|\,\delta ^{\mathrm {l}}_i\le \varvec{\delta }_i \le \delta ^{\mathrm {u}}_i,\,i=1,\ldots ,n\right\} \nonumber \\= & {} \left\{ \varvec{\delta }\in {\mathbb {R}}^n\,|\,\Vert {\mathbf {D}}^{-1}\varvec{\delta }\Vert _\infty \le 1\right\} \text {,} \end{aligned}$$
(6)

where \({\mathbf {D}}\) is a scaling matrix and where \(\delta _i^l=-\delta _i^u\).

The nested optimization problem formulated by (5) is hard to solve. A numerically feasible optimization problem is obtained by approximating the \(\max \) problem, i.e., by applying a first-order Taylor approximation of the objective function and the constraints with respect to \(\mathbf{p}\) [13]:

$$\begin{aligned} J(\mathbf{p}+\varvec{\delta })&\approx J(\mathbf{p}) +\nabla _{\mathbf{p}}J(\mathbf{p})\cdot \varvec{\delta }\text {;}\end{aligned}$$
(7)
$$\begin{aligned} G_m(\mathbf{p}+\varvec{\delta })&\approx G_m(\mathbf{p}) +\nabla _{\mathbf{p}} G_m(\mathbf{p})\cdot \varvec{\delta }\text {,}\end{aligned}$$
(8)

for \(m = 1,\ldots ,M\). Inserting this approximation into (5), one obtains the linear approximation of the robust optimization problem:

$$\begin{aligned} \underset{\mathbf{p}\in {\mathbb {R}}^I}{\text {minimize}}&\quad J(\mathbf{p}) +\Vert {\mathbf {D}}\nabla _{\mathbf{p}}J(\mathbf{p})\Vert _1 \text {,}\end{aligned}$$
(9a)
$$\begin{aligned} \text {subject to}&\quad G_m(\mathbf{p}) +\Vert {\mathbf {D}}\nabla _{\mathbf{p}} G_m(\mathbf{p})\Vert _1 \le 0 \text {,}\end{aligned}$$
(9b)

for \(m = 1,\ldots ,M\). A dual norm \(||\cdot ||_*\) is defined by

$$\begin{aligned} \Vert \cdot \Vert _*:&\qquad {\mathbb {R}}^I \rightarrow {\mathbb {R}} \nonumber \\&\qquad \mathbf {g} \mapsto \Vert \mathbf {g}\Vert _* := \displaystyle \max _{\Vert \varvec{\delta }\Vert \le 1} \mathbf {g}^\top \varvec{\delta }\text {.}\end{aligned}$$
(10)

In this particular case, one can use the property that the dual of \(\Vert {\mathbf {D}}^{-1}\cdot \Vert _\infty \) is given by \(\Vert {\mathbf {D}}\cdot \Vert _1\).

A further problem is introduced by the fact that the norms are not differentiable, which leads to a non-smooth optimization problem. A differentiable problem is obtained by introducing \(M+1\) slack variables \({\xi }_0,\ldots ,{\xi }_M\) and reformulate (9) as

$$\begin{aligned} \underset{\mathbf{p}\in R^I, {\xi }_0,\ldots ,{\xi }_M \in {\mathbb {R}}^I}{\text {minimize}}&\qquad J(\mathbf{p}) +{\mathbb {V}}^\top {\xi }_0 \text {,}\end{aligned}$$
(11a)
$$\begin{aligned} \text {subject to}&\qquad G_m(\mathbf{p}) +{\mathbb {V}}^\top {\xi }_m \le 0 \text {,}\end{aligned}$$
(11b)
$$\begin{aligned}&\qquad -{\xi }_0 \le {\mathbf {D}}\nabla _{\mathbf{p}} J(\mathbf{p})\le {\xi }_0 \text {,}\end{aligned}$$
(11c)
$$\begin{aligned}&\qquad -{\xi }_m \le {\mathbf {D}}\nabla _{\mathbf{p}} G_m(\mathbf{p})\le {\xi }_m \text {,}\end{aligned}$$
(11d)

where \(m=1,\ldots ,M\) and \({\mathbb {V}} =[1,\ldots ,1]^\top \in \mathbb R^I\). This optimization problem can now be efficiently solved numerically. Additionally to the quantities introduced in the previous section, now also second-order sensitivities with respect to the design parameters are required. This approach can be generalized to use a quadratic approximation with respect to \(\mathbf{p}\) as worked out in [30].

3 Finite-element model

The behavior of the devices under consideration is determined by magnetic field phenomena and is simulated using a FE model.

3.1 Magnetoquasistatic formulation

The magnetoquasistatic (MQS) subset of Maxwell’s equations is considered. The design parameters \(\mathbf{p}\) influence the material distribution represented by the reluctivity \(\nu (\mathbf{p})\) and the conductivity \(\sigma (\mathbf{p})\), as well as the excitations, represented by the applied current density \(\mathbf {J}_{\mathrm{src}}(\mathbf{p})\) in current carrying conductors and the magnetizing field strength \(\mathbf {H}_\mathrm{m}(\mathbf{p})\) of the present permanent magnets. The MQS formulation in terms of the magnetic vector potential \(\mathbf {A}(\mathbf{p})\) reads

$$\begin{aligned}&\mathbf {\nabla }\times \left( \nu (\mathbf{p})\mathbf {\nabla }\times \mathbf {A}(\mathbf{p})\right) +\sigma (\mathbf{p})\frac{\partial {\mathbf {A}(\mathbf{p})}}{\partial {{t}}} \nonumber \\&\quad =\mathbf {J}_\mathrm{src}(\mathbf{p})-\mathbf {\nabla }\times \mathbf {H}_{\mathrm{m}}(\mathbf{p}) \text {,}\end{aligned}$$
(12)

and is complemented with adequate boundary conditions. Equation 12 encompasses the case of linear, nonlinear and remanent magnetic materials expressed by

$$\begin{aligned} \mathbf {H}(\mathbf{p})&= \nu (\mathbf{p})\mathbf {B}(\mathbf{p}) \text {,}\end{aligned}$$
(13)
$$\begin{aligned} \mathbf {H}(\mathbf{p})&= \nu (\mathbf{p},\mathbf {B}(\mathbf{p}))\mathbf {B}(\mathbf{p}) \text {,}\end{aligned}$$
(14)
$$\begin{aligned} \mathbf {H}(\mathbf{p})&= \mathbf {H}_{\mathrm{m}}(\mathbf{p})+\nu (\mathbf{p})\mathbf {B}(\mathbf{p}) \end{aligned}$$
(15)

respectively. \(\mathbf {H}(\mathbf{p})\) and \(\mathbf {B}(\mathbf{p})=\nabla \times \mathbf {A}(\mathbf{p})\) are the magnetic field strength and magnetic flux density. In the nonlinear setting, the formulation is treated by the Newton method, which is equivalent to using a linearized material relation \(\mathbf {H}(\mathbf{p})=\mathbf {H}_\mathrm{m}^{(k)}(\mathbf{p})+\overline{{\overline{\nu }}}^{(k)}(\mathbf{p})\mathbf {B}(\mathbf{p})\) and updating the tensorial differential permeability \(\overline{{\overline{\nu }}}^{(k)}(\mathbf{p})\) and the magnetizing field strength \(\mathbf {H}_{\mathrm{m}}^{(k)}(\mathbf{p})\) between the successive Newton steps k [28].

3.2 Finite-element discretization

The magnetic vector potential is discretized by lowest-order Nédélec edge shape functions \(\mathbf {w}_j(x,y,z)\),

$$\begin{aligned} \mathbf {A}(\mathbf{p})\approx \sum _{j=1}^{N_{\mathrm{dof}}} a_j(\mathbf{p}) \mathbf {w}_j\text {,}\end{aligned}$$
(16)

where \(a_j(\mathbf{p})\) are the degrees of freedom and \(N_{\mathrm{dof}}\) is the number of degrees of freedom. In the 3D case, the shape functions are associated with the edges of a tetrahedral mesh. In the 2D Cartesian case, the edge shape functions are aligned with the z-axis and are constructed from the nodal shape functions \(N_j(x,y)\) associated with the nodes of a 2D mesh:

$$\begin{aligned} \mathbf {w}_j(x,y) =\frac{N_j(x,y)}{l_z}\mathbf {e}_z \text {,}\end{aligned}$$
(17)

where \(l_z\) is the length of the device in z-direction. In both cases, the discretization procedure leads to the system of equations

$$\begin{aligned} {\mathbf {K}}_\nu (\mathbf{p}){\mathbf {a}}(\mathbf{p}) +{\mathbf {M}}_\sigma (\mathbf{p})\frac{\mathrm {d}{{\mathbf {a}}(\mathbf{p})}}{\mathrm {d}t}={\mathbf {j}}_\mathrm{src}(\mathbf{p})+{\mathbf {j}}_{\mathrm{m}}(\mathbf{p}) \text {,}\end{aligned}$$
(18)

where

$$\begin{aligned} K_{\nu ,i,j}(\mathbf{p})&= \int _{V_D} \nu (\mathbf{p}) \mathbf {\nabla }\times \mathbf {w}_j\cdot \mathbf {\nabla }\times \mathbf {w}_i \,\text{ d }{{V}}\text {;}\end{aligned}$$
(19)
$$\begin{aligned} M_{\sigma ,i,j}(\mathbf{p})&= \int _{V_D} \sigma (\mathbf{p}) \mathbf {w}_j\cdot \mathbf {w}_i \,\text{ d }{{V}}\text {;}\end{aligned}$$
(20)
$$\begin{aligned} j_{\text {src},i}(\mathbf{p})&= \int _{V_D}\mathbf {J}_{\mathrm{src}}(\mathbf{p})\cdot \mathbf {w}_i \,\text{ d }{{V}}\text {;}\end{aligned}$$
(21)
$$\begin{aligned} j_{\text {m},i}(\mathbf{p})&= -\int _{V_D}\mathbf {H}_{\mathrm{m}}(\mathbf{p}) \cdot \mathbf {\nabla }\times \mathbf {w}_i \,\text{ d }{{V}}\text {,}\end{aligned}$$
(22)

and where \(V_D\) is the computational domain [36]. In the 2D case, \(V_D=S_D \times [0,l_z]\) where \(S_D\) is the cross section of the device. Equation 18 is further discretized in time by, for example, an implicit Runge–Kutta method, linearized by the Newton–Raphson method and solved by a solution method for large sparse systems of equations [10, 28].

3.3 Geometry parametrization

In the following, designs will be optimized with respect to geometric parameters. At first sight, the changing geometry necessitates the reconstruction of the computational mesh. This would, however, lead to unacceptably high computation times. Moreover, the unavoidable changes in mesh topology would introduce numerical noise which could mask the true sensitivity of the quantities of interest on the geometric parameters. Two different types of parametrizations are presented in the following. Affine parametrization or affine decomposition (see, e.g., [47]) is particularly appealing in the context of model order reduction and well suited for parallelization. However, complex transformations cannot be represented exactly by this approach. One may turn to (empirical) interpolation methods, but they introduce additional approximation errors. This is not the case for the second parametrization, which is based on the well-established concept of design elements [6] in combination with non-uniform rational B-splines (NURBS). Here, the mapping will not be affine and more effort is needed for the update of the FE matrices and vectors. In either case, good results can be obtained for many shape optimization problems by one of the two methods, with moderate implementation effort. It should also be mentioned that nonparametric approaches to shape optimization [11] present a viable alternative and have already been applied for electric machines [16]. There, however, advanced techniques for both derivation and implementation are needed.

Fig. 1
figure 1

Affine maps between a deformed element and a reference element

The geometry is decomposed in a domain \(V_D^0\) that is unaffected from the geometric parameters and domains \(V_D^\ell (\mathbf{p})\), \(\ell =1,\ldots ,L\) subject to geometry changes according to the geometric parameters \(\mathbf{p}\). The FE matrices \({\mathbf {K}}_\nu (\mathbf{p})\) and \({\mathbf {M}}_\sigma (\mathbf{p})\) and vectors \({\mathbf {j}}_{\mathrm{src}}(\mathbf{p})\) and \({\mathbf {j}}_{\mathrm{m}}(\mathbf{p})\) can be partitioned accordingly:

$$\begin{aligned} {\mathbf {K}}_\nu (\mathbf{p}) ={\mathbf {K}}_\nu ^{0}+\sum _{\ell =1}^L {\mathbf {K}}_\nu ^{\ell }(\mathbf{p}) \text {,}\end{aligned}$$
(23)

and similarly for \({\mathbf {M}}_\sigma (\mathbf{p})\), \({\mathbf {j}}_\text {src}(\mathbf{p})\) and \({\mathbf {j}}_\text {m}(\mathbf{p})\). Reference geometries \({\hat{V}}_D^\ell , \ell =1,\ldots ,L\) and \({\hat{V}}_D^0 = V_D^0\) are defined, as well as maps \(f^\ell _\mathbf{p}:{\hat{V}}_D^\ell \rightarrow V_D^\ell (\mathbf{p}), \hat{\mathbf {r}}\mapsto f^\ell _\mathbf{p}(\hat{\mathbf {r}})= \mathbf {r}\), which depend on \(\mathbf{p}\).

3.3.1 Affine parametrization

This section discusses the idea of decomposing (a part of) the mesh such that geometrical changes can be represented by piecewise affine maps [30, 47]. The affine maps will be referred to by \(f^\ell _{\mathbf{p},\text {aff}}\).

As an example, an affine map for the 2D case is developed (Fig. 1). Let \({\mathbf {r}}_j=\left[ x_j \ y_j \right] ^\top \), \(j=1,2,3\) denote the nodes of a deformed triangle in the domain \(V_D^\ell (\mathbf{p})\) and \(\hat{{\mathbf {r}}}_j=\left[ {\hat{x}}_j \ {\hat{y}}_j \right] ^\top \), \(j=1,2,3\) the nodes of the corresponding triangle in the reference domain \({\hat{V}}_D^\ell \). The transformation from the reference triangle to the deformed triangle \(\hat{\mathbf {r}}\mapsto f^\ell _{\mathbf{p},\text {aff}}(\hat{\mathbf {r}}) = \mathbf {r}\) is given by the affine map

$$\begin{aligned} \begin{bmatrix} x \\ y \end{bmatrix}= & {} \begin{bmatrix} x_1&x_2&x_3 \\ y_1&y_2&y_3 \end{bmatrix}\nonumber \\&\times \,\frac{1}{{\hat{S}}} \begin{bmatrix} {\hat{x}}_2{\hat{y}}_3-{\hat{x}}_3{\hat{y}}_2&\quad {\hat{y}}_2-{\hat{y}}_3&\quad {\hat{x}}_3-{\hat{x}}_2 \\ {\hat{x}}_3{\hat{y}}_1-{\hat{x}}_1{\hat{y}}_3&\quad {\hat{y}}_3-{\hat{y}}_1&\quad {\hat{x}}_1-{\hat{x}}_3 \\ {\hat{x}}_1{\hat{y}}_2-{\hat{x}}_2{\hat{y}}_1&\quad {\hat{y}}_1-{\hat{y}}_2&\quad {\hat{x}}_2-{\hat{x}}_1 \end{bmatrix} \begin{bmatrix} 1 \\ {\hat{x}} \\ {\hat{y}} \end{bmatrix} \text {,}\nonumber \\ \end{aligned}$$
(24)

with \({\hat{S}}\) the cross-sectional area of the reference triangle. In short, this becomes a linear transformation of the form

$$\begin{aligned} {\mathbf {r}} ={\mathbf {N}}_1^\ell (\mathbf{p})+{\mathbf {T}}^\ell (\mathbf{p})\hat{{\mathbf {r}}}\qquad \text {where}\quad \hat{\mathbf {r}}\in {\hat{V}}_D^\ell \text {.}\end{aligned}$$
(25)

In 3D, affine parametrization is organized using a tetrahedral decomposition of the computational domain. The key of affine parametrization is that the Jacobian of the map,

$$\begin{aligned} J_{\mathrm{aff}}^\ell (\mathbf{p})=\left[ \begin{array}{ccc} \frac{\partial {x}}{\partial {{\hat{x}}}} &{} \frac{\partial {x}}{\partial {{\hat{y}}}}&{}\frac{\partial {x}}{\partial {{\hat{z}}}} \\ \frac{\partial {y}}{\partial {{\hat{x}}}} &{} \frac{\partial {y}}{\partial {{\hat{y}}}} &{}\frac{\partial {y}}{\partial {{\hat{z}}}} \\ \frac{\partial {z}}{\partial {{\hat{x}}}} &{} \frac{\partial {z}}{\partial {{\hat{y}}}}&{}\frac{\partial {z}}{\partial {{\hat{z}}}} \end{array}\right] \text {,}\end{aligned}$$
(26)

is constant on each subdomain \(V_D^\ell (\mathbf{p})\). In the integrations in (19)–(22), the volume integrations now have to be carried out according to \(\,\text{ d }{{V}}=\vartheta _0^\ell (\mathbf{p})\,\text{ d }{{\hat{V}}}\), where \(\vartheta _0^\ell (\mathbf{p})=|J_{\mathrm{aff}}^\ell (\mathbf{p})|\) denotes the determinant of the Jacobian. Hence,

$$\begin{aligned} {\mathbf {M}}_\sigma (\mathbf{p})&=\vartheta _0^\ell (\mathbf{p})\hat{{\mathbf {M}}}_\sigma \text {;}\end{aligned}$$
(27)
$$\begin{aligned} {\mathbf {j}}_\text {src}(\mathbf{p})&=\vartheta _0^\ell (\mathbf{p})\hat{{\mathbf {j}}}_\text {src} \text {,}\end{aligned}$$
(28)

where \(\hat{{\mathbf {M}}}_\sigma \) and \(\hat{{\mathbf {j}}}_\text {src}\) are assembled for the reference geometry only once. Additionally, the affine maps affect the differential operators in (19) and (22). A bit of calculation is needed to work out the transformed differential operators and the scalar products component-wise. For the 2D Cartesian case, the results are

$$\begin{aligned} {\mathbf {K}}_\nu ^{\ell }(\mathbf{p})&= \vartheta ^{\ell }_1(\mathbf{p}) \hat{{\mathbf {K}}}^{\ell }_{\nu ,xx}+\vartheta ^{\ell }_2(\mathbf{p})\hat{{\mathbf {K}}}^{\ell }_{\nu ,yy}\nonumber \\&\quad +\vartheta ^{\ell }_3(\mathbf{p}) \hat{{\mathbf {K}}}^{\ell }_{\nu ,xy}+\vartheta ^{\ell }_4(\mathbf{p})\hat{{\mathbf {K}}}^{\ell }_{\nu ,yx} \text {;}\end{aligned}$$
(29)
$$\begin{aligned} {\mathbf {j}}_\text {m}^{\ell }(\mathbf{p})&= \vartheta ^\ell _5(\mathbf{p})\hat{{\mathbf {j}}}_{\text {m},x}^\ell +\vartheta ^\ell _6(\mathbf{p})\hat{{\mathbf {j}}}_{\text {m},y}^\ell \text {,}\end{aligned}$$
(30)

where the matrix factors \(\hat{{\mathbf {K}}}^{\ell }_{\nu ,xx}\), \(\hat{{\mathbf {K}}}^{\ell }_{\nu ,yy}\), \(\hat{{\mathbf {K}}}^{\ell }_{\nu ,xy}\), \(\hat{{\mathbf {K}}}^{\ell }_{\nu ,yx}\), and the vector factors \(\hat{{\mathbf {j}}}_{\text {m},x}^\ell \) and \(\hat{{\mathbf {j}}}_{\text {m},y}\) are assembled for the reference geometry in advance. Hence, the assembly of new FE matrices and vectors can be avoided during the optimization procedure. The functions \(\vartheta _q^\ell (\mathbf{p})\) are simple scalar functions in terms of the design parameters and are evaluated for each model instantiation.

Fig. 2
figure 2

Reference domain \([0,1]^2\) and design element patch

3.3.2 Design element approach

For many geometry optimization tasks, the domain underlying geometry changes cannot be decomposed in triangles or tetrahedra with straight edges and faces, which excludes the use of affine parametrization. NURBS are a more general way to represent geometries and are widely used in CAD systems. Therefore, it seems natural to use the control points (and weights) of NURBS curves as design parameters [6, 49]. This approach has received considerable attention in recent years as new approaches, incorporating NURBS geometries into FE analysis, have emerged. Isogeometric analysis [25] and the NURBS-enhanced FE method [50] are important examples. Here, NURBS are only used for the geometry parametrization. A triangular (tetrahedral) mesh is generated once and deformed using the concept of design elements [6, 26].

In the following, for simplicity, the two-dimensional case is considered solely. A generic NURBS curve of degree p is given as

$$\begin{aligned} \mathbf {C}({\hat{x}}) = \sum _{i} R_i^p({\hat{x}}) \mathbf {P}_{i} \text {,}\end{aligned}$$
(31)

where \(\mathbf {P}_{i}\) refers to a control point and the rational spline \(R_i^p\) is defined in terms of B-splines \(N_i^p\) and weights \(w_i\) as

$$\begin{aligned} R_i^p({\hat{x}}) = \frac{N_i^p({\hat{x}}) w_i}{\sum _j N_j^p({\hat{x}}) w_j} \text {.}\end{aligned}$$
(32)

In total, L design elements are considered, each of which is represented by two NURBS curves \(\mathbf {C}_{\mathbf{p},1}^\ell \) and \(\mathbf {C}_{\mathbf{p},2}^\ell \), each depending on the geometric parameters \(\mathbf{p}\). More precisely, a design element is defined by a map \(f_{\mathbf{p},\text {de}}^\ell :{\hat{V}}_D^\ell = [0,1]^2\rightarrow V_D^\ell (\mathbf{p})\) given as

$$\begin{aligned} f_{\mathbf{p},\text {de}}^\ell ({\hat{x}},{\hat{y}}) = \mathbf {C}_{\mathbf{p},1}^\ell ({\hat{x}}) {\hat{y}} + \mathbf {C}_{\mathbf{p},2}^\ell ({\hat{x}}) (1 - {\hat{y}}) \text {.}\end{aligned}$$
(33)

Hence, design elements are obtained by successively connecting points of each NURBS curve by a straight line, as depicted in Fig. 2. The affine parametrization may result in unstructured representations where such a patch structure is missing. For each node \((x_i,y_i)\) in \(V_D^\ell (\mathbf{p})\), its position in the reference domain \([0,1]^2\) is computed in advance by solving

$$\begin{aligned} ({\hat{x}}_i,{\hat{y}}_i) \in \mathrm{argmin}_{({\hat{x}},{\hat{y}})} \left| f_{\mathbf{p},\text {de}}^\ell ({\hat{x}},{\hat{y}}) - (x_i,y_i)\right| \text {,}\end{aligned}$$
(34)

for example with the Newton–Raphson method. Then, the mesh can be easily deformed by applying the parameter-dependent map \(f_{\mathbf{p},\text {de}}^\ell \) to all nodes \(({\hat{x}}_i,{\hat{y}}_i)\).

The transformation of the FE matrices and vectors is more involved compared to the affine parametrization described in Sect. 3.3.1. Each entry of the mass matrix is transformed as

$$\begin{aligned} {\mathbf {M}}_{\sigma ,i,j}(\mathbf{p}) = \int _{{\hat{V}}_D} {\hat{\sigma }} \hat{\mathbf {w}}_j\cdot \hat{\mathbf {w}}_i |J_{\mathrm{de}}^\ell (\mathbf{p})| \,\text{ d }{{\hat{V}}}, \end{aligned}$$
(35)

where it is important to emphasize that \(|J_{\mathrm{de}}^\ell (\mathbf{p})|\) is not constant on each design element. A similar expression is obtained for \({\mathbf {j}}_\text {src}(\mathbf{p})\), whereas the conforming transformation of the curl operator yields

$$\begin{aligned} {\mathbf {K}}_{\nu ,i,j}(\mathbf{p})&= \int _{{\hat{V}}_D} \frac{{\hat{\nu }}}{|J_{\mathrm{de}}^{\ell }(\mathbf{p})|} J_{\mathrm{de}}^{\ell }(\mathbf{p}) \mathbf {\nabla } \times \hat{\mathbf {w}_j}\cdot J_{\mathrm{de}}^\ell (\mathbf{p}) \mathbf {\nabla } \times \hat{\mathbf {w}_i} \,\text{ d }{{\hat{V}}}, \end{aligned}$$
(36)
$$\begin{aligned} {\mathbf {j}}_{\text {m},i}^{\ell }(\mathbf{p})&= \int _{{\hat{V}}_D}\hat{\mathbf {H}}_{\mathrm{m}}(\mathbf{p}) \cdot J_\mathrm{de}^{\ell }(\mathbf{p}) \mathbf {\nabla }\times \hat{\mathbf {w}_i} \,\text{ d }{{\hat{V}}}. \end{aligned}$$
(37)

In (36) and (37), the dependence of the integration domain on the geometry changes was eliminated. Because the Jacobian \(J_{\mathrm{de}}^{\ell }(\mathbf{p})\) can be expressed as a function of the geometric parameters \(p_i\), the analytical derivative of the system matrix and of the right-hand side with respect to the geometric parameters can be determined.

3.4 Sensitivities

After differentiating the FE system, a new linear system for the derivatives of the degrees of freedom with respect to the geometric parameters is obtained:

$$\begin{aligned} {\mathbf {K}}_\nu {\mathbf {s}}_i = \frac{\partial {}}{\partial {p_i}}\left( {\mathbf {j}}_\text {src}+{\mathbf {j}}_\text {m}\right) -\frac{\partial {{\mathbf {K}}_\nu }}{\partial {p_i}}{\mathbf {a}} \text {,}\quad \text {for}\quad i = 1,\ldots ,I \text {,}\end{aligned}$$
(38)

where \({\mathbf {s}}_i(\mathbf{p})=\frac{\partial {\mathbf{a}({\mathbf{p}})}}{\partial {p_i}}\) are the sensitivities of the FE solution. To calculate \({\mathbf {s}}_i\), I equations of the form (38) have to be solved. In the case of affine parametrization, derivatives of \({\mathbf {K}}_\nu \) are easily calculated from (23) and (29) using expressions for \(\frac{\partial {\vartheta ^\ell (\mathbf{p})}}{\partial {p_i}}\) which are known analytically as derivatives of the functions \(\vartheta ^{\ell }(\mathbf{p})\). The expressions become more involved when NURBS are involved, yet closed-form formulas also exist in this case.

Fig. 3
figure 3

TEAM Problem 25: Cross section of the inner part of the die press showing the SMP ring, the inner yoke and the outer yoke (all measures in mm). A horizontal magnetic flux is exerted on the configuration by an outer magnetic circuit (not shown)

The optimization algorithm requires the derivatives of the objective function with respect to each of the design parameters. Often, the objective function does not explicitly depend on the design parameters, i.e., \(J(\mathbf{p})={\tilde{J}}(\mathbf a(\mathbf{p}))\). In this case, the derivatives are given as

$$\begin{aligned} \frac{\partial {J(\mathbf{p})}}{\partial {p_i}} = \nabla _\mathbf{a}{\tilde{J}}(\mathbf{a(\mathbf{p})}) \cdot \mathbf{s}_i(\mathbf{p}),\quad \hbox {for}\quad i = 1,\ldots ,I. \end{aligned}$$
(39)

For a large number of parameters, an adjoint method should be used instead [57].

4 Example 1: Die press mold

As a first example, a die press mold for radially magnetizing a segment of sintered magnetic powder (SMP) is considered [54]. This problem has been proposed as testing electromagnetic analysis methods (TEAM) benchmark problem 25 [53] and has been used in numerous papers for comparing optimization algorithms. The vast majority of these publications apply and compare stochastic optimization methods [32, 52], possibly combined with surrogate models [8], uncertainty quantification [39], multi-objective optimization or a combination of them [31]. Only a few papers (see, e.g., [1, 4]) choose deterministic methods, again possibly combined with surrogate models [20], uncertainty quantification [55] or multi-objective optimization. This paper addresses one of the main drawbacks of deterministic methods, which is the consideration of geometric parameters. For this example, the design element approach is used.

The SMP segment is arranged between a cylindrical inner pole and a more generally shaped outer pole (Fig. 3). The original TEAM-25 problem considers an outer pole with an elliptical inner surface. Here, the inner surface is described by a spline. This is motivated by the fact that splines are currently the basic building block for mechanical processing. The considered design parameters are then chosen to be

$$\begin{aligned} p_1{:}&~{\text {radius of the inner yoke}}\text {;}\\ p_2,p_3{:}&~{\text {semiaxis of ellipse between points}}~i~{\text {and}}~j\text {;}\\ p_4{:}&~x{\text {-coordinate of points}}~m~{\text {and}}~k\text {.}\end{aligned}$$

Both the circle and the ellipse are exactly represented by NURBS curves. The relation between the geometric parameters and the NURBS control points is given in the Appendix.

Table 1 Results from the optimization of the die press mold with particle swarm optimization (PSO), trust region (TR) (with MATLAB\(^{\circledR }\)’s fmincon) and an own implementation of sequential quadratic programming (SQP) combined with the design element approach

The optimization aims at a homogeneous, radially oriented magnetic flux density of \(B_{\mathrm{goal}}=0.35\) T inside the SMP segment. The objective function \(J(\mathbf{p})\) is defined as the mean-squared error between the simulated magnetic field and the goal at 9 sample points equidistantly distributed along the arc with radius \(r_\text {smp}\) between points \(e=(r_\text {smp},0)\) and \(f=(r_\text {smp}\cos \varphi _f,r_\text {smp}\sin \varphi _f)\):

$$\begin{aligned} J(\mathbf{p}) = \sum _{k=1}^9 \Vert \mathbf {B}(r_{\mathrm{smp}}\cos \varphi _k,r_\mathrm{smp}\sin \varphi _k;\mathbf{p})-B_{\mathrm{goal}}\mathbf {e}_k\Vert _2^2 \text {,}\end{aligned}$$
(40)

where \(\varphi _k=\varphi _f \frac{k-1}{8}\) and \(\mathbf {e}_k=(\cos \varphi _k,\sin \varphi _k)\). The optimization problem yields:

$$\begin{aligned} \underset{\mathbf{p}}{\text {minimize}}&\quad J(\mathbf{p}) \text {,}\end{aligned}$$
(41a)
$$\begin{aligned} \text {subject to}&\quad \mathbf{p}\in {\mathcal {F}}\text {,}\end{aligned}$$
(41b)

where the admissible set is defined as:

$$\begin{aligned} {\mathcal {F}}=[5.1,9] \times [16,18] \times [14.5,16] \times [9.5,13]~\text {mm}. \end{aligned}$$

For the gradient, the derivatives of J with respect to the geometry parameters \(p_i\) are needed. Before applying the chain rule on (40), the derivatives of the degrees of freedom with respect to the geometry parameters \(\partial _{p_i} {\mathbf {a}}\) are calculated as described in Sect. 3.3.2.

The performance of a standard algorithm for particle swarm optimization (PSO), of the sequential quadratic programming (SQP) method implemented in MATLAB\(^{\circledR }\)’s fmincon function [43] and of an own implementation of SQP is compared in Table 1. Both SQP implementations use the analytical gradients, the BFGS formula for updating the Hessian and a sufficient decrease condition in a merit function. For the PSO, a set of 40 particles is considered and the implementation is multi-threaded, while the gradient-based methods are single-thread implementations. The termination criterion for the PSO algorithm is the number of stall iterations, which was set to 5. The PSO actually finds the optimum after 2 iterations. This is because the optimum is at a vertex of the box-shaped domain and all the particles leaving the admissible region are projected onto the boundary. All three methods converge to the same optimum. The deterministic algorithms are substantially faster than PSO, even though PSO exploits parallelization. On the same machine, an evaluation of the objective function \(J(\mathbf{p})\) is performed in 1.65 s, an analytical evaluation of the gradient \(\nabla J(\mathbf{p})\) in 4.69 s and a numerical evaluation of the gradient \(\nabla _{\text {num}} J(\mathbf{p})\) using a forward difference quotient in 7.48 s. All tests were done on a 64 GB RAM Intel\(^{\circledR }\) Xeon\(^{\circledR }\) E5-2630 v4 machine.

5 Example 2: Permanent-magnet synchronous machine (PMSM)

5.1 Design parameters

The second example is a 3-phase 6-pole permanent-magnet (PM) synchronous machine (PMSM) borrowed from [42] (Fig. 4) and already studied as an optimization example in [5]. The stator features two slots per pole and per phase with a conventional distributed double-layer winding. The rotor contains a buried rare-earth magnet. The yoke parts are laminated. The design parameters are

$$\begin{aligned} p_1{:}&~{\text {width of the PM}} \text {;}\\ p_2{:}&~{\text {thickness of the PM}} \text {;}\\ p_3{:}&~{\text {distance from the PM to the rotor surface}}\text {.}\end{aligned}$$
Fig. 4
figure 4

Cross section of one pole of the machine with the permanent magnet depicted in gray and the region of the affine parametrization indicated by the dashed box. On the right-hand side, the triangulation into L subdomains is shown by the dashed-dotted lines. The figure is adapted from [5]

5.2 Objective function

The optimization goal is to minimize the size \(S_{\mathrm{pm}}=p_1p_2\) of PM material while preserving a prescribed electromotive force \(E_0\). The electromotive force (EMF) \(E_0(\mathbf{p})\) is post-processed from a magnetostatic solution of a 2D FE model of the PMSM using the loading method proposed in [46]. For that purpose, the FE solution of the z-component of the magnetic vector potential is sampled at a circle (or in the case of a partial machine model, an arc) in the PMSM’s air gap, yielding \(A_z(r_\mathrm{ag},\varphi )\approx {\hat{A}}_{z,\mathrm{eff}}\sqrt{2}\sin (N_p\varphi -\varphi _{\mathrm{d}})\), where \(N_{p}=3\) is the pole-pair number, \({\hat{A}}_{z,\mathrm{eff}}\) is the rms magnitude of fundamental harmonic component and \(\varphi _{\mathrm{d}}\) is the angle of the PMSM’s direct axis. The EMF is then found from

$$\begin{aligned} E_0 = 2{\hat{A}}_{z,\mathrm{eff}}\omega _{\mathrm{syn}}N_{\mathrm{w}}k_{\mathrm{w},1} \text {,}\end{aligned}$$
(42)

where \(\omega _{\mathrm{syn}}\) is the synchronous speed and \(N_{\mathrm{w}}\) is the number of windings per phase. The winding factor is

$$\begin{aligned} k_{\mathrm{w},\nu } =\frac{\sin \left( q\nu \frac{\alpha _\mathrm{el}}{2}\right) }{q\sin \left( \nu \frac{\alpha _{\mathrm{el}}}{2}\right) } \cdot \sin \left( \nu \frac{\pi }{2}\frac{\tau _{\mathrm{c}}}{\tau _\mathrm{p}}\right) \cdot \frac{\sin \left( \nu \frac{\varepsilon }{2}\right) }{\nu \frac{\varepsilon }{2}} \text {,}\end{aligned}$$
(43)

where q is the number of coil sides per phase belt, \(\alpha _\mathrm{el}\) is the electric angle between two slots, \(\tau _{\mathrm{c}}\) is the coil pitch, \(\tau _{\mathrm{p}}\) is the pole pitch and \(\varepsilon \) is the electric skew angle [37, 46].

5.3 Optimization problem

The optimization problem reads

$$\begin{aligned} \underset{\mathbf{p}\in {\mathbb {R}}^3}{\text {minimize}}&\qquad J(\mathbf{p})=p_1 p_2 \text {,}\end{aligned}$$
(44a)
$$\begin{aligned} \text {subject to}&\qquad G(\mathbf{p}) =\left[ \begin{array}{c} p^{\mathrm {l}}_1 - p_1\\ p^{\mathrm {l}}_2 - p_2\\ p^{\mathrm {l}}_3 - p_3\\ p_3 - p^{\mathrm {u}}_3\\ p_2 + p_3 - 15\;{\text {mm}}\\ 3p_1 - 2p_3 - 50\;{\text {mm}}\\ E_\mathrm {d} - E_0(\mathbf{p},\mathbf{a}(\mathbf{p})) \end{array}\right] \le 0\text {.}\end{aligned}$$
(44b)

The first four constraints are related to the lower (\(p^{\mathrm {l}}\)) and upper (\(p^{\mathrm {u}}\)) bounds of \(\mathbf{p}\): \(\left( p^{\mathrm {l}}_1,p^{\mathrm {l}}_2,p^{\mathrm {l}}_3\right) = (1,1,5)\) mm and \(\left( p^{\mathrm {u}}_1, p^{\mathrm {u}}_2, p^{\mathrm {u}}_3\right) =(\infty , \infty , 14)\) mm. To ensure the validity of the affine parametrization (intersections are not allowed), the fifth constraint is added. The sixth constraint is a design constraint enforcing that each PM has to keep a sufficient distance to the rotor surface, especially for wide PMs. The last constraint expresses the requirement to fulfill the prescribed EMF. Since the EMF is post-processed from the FE solution, the optimization problem actually has a PDE constraint.

Table 2 Numerical results obtained for a \(\varvec{\delta }=0.2~\hbox {mm}\) [5]
Fig. 5
figure 5

Initial and optimized geometries together with the magnetic flux distribution at no-load. The figures are adapted from [5]. a Initial geometry, b nominal optimum, c robust optimum

5.4 Results

The results for 5 different optimization methods are collected in Table 2.

  1. 1.

    The first optimization run is carried out with the genetic algorithm implemented in MATLAB\(^{\circledR }\).

  2. 2.

    The second optimization run is carried out with MATLAB\(^{\circledR }\)’s PSO implementation. To circumvent the restriction to box-shaped parameter domains, the admissible set is enforced by a penalty turn. The new objective function reads

    $$\begin{aligned} J_\text {pen}(\mathbf{p})&=J(\mathbf{p}) +2J(\mathbf{p}) \big ( f\left( \max (p_2+p_3-15,0)\right) \nonumber \\&\quad +f\left( \max (3p_1-2p_3-50,0)\right) \nonumber \\&\quad +f\left( \max (g(x),0)\right) \big ) \text {,}\end{aligned}$$
    (45)

    where \(f(t)=e^{(4t^{0.1})}-1\) was chosen heuristically such that \(J_\text {pen}\) grows exponentially if one of the constraints is violated. The function \(J_\text {pen}\) was called 4740 times, but was organized as to only evaluate the nonlinear constraint if all other constraints were satisfied. The number of particles was set to 30, the maximum number of stall iterations to \(N_\text {stall}=15\) and the function change tolerance to \(10^{-6}\). The PSO characteristic constants are chosen to be \(\omega _0=0.5\) and \(\omega _1=\omega _2=1.49\). The algorithm took 157 iterations before termination.

  3. 3.

    The third optimization is carried out with an own PSO implementation, for the original objective function \(J(\mathbf{p})\) and applying the nonlinear constraints directly. Here, it is assumed that the admissible set is convex such that points inside the convex hull formed by all previous points do not need to be checked. Fifty particles were used. Termination was enforced after maximally \(N_\text {it,max}=100\) steps or when \(N_\text {stall,max}=15\) stall iterations were observed.

  4. 4.

    The fourth run was done with the deterministic method described in Sect. 2.3, relying upon FE simulations equipped with an affine parametrization of the geometry as described in Sect. 3.3.1.

  5. 5.

    The fifth run was done with the deterministic method for robust optimization expressed by (11) in Sect. 2.5, again with affine parametrization of the geometry.

The three stochastic algorithms were run on a 64 GB RAM Intel\(^{\circledR }\) Xeon\(^{\circledR }\) E5-2630 v4 machine. Both deterministic algorithms were run on a 16 GB RAM Intel\(^{\circledR }\) Core\(^{\mathrm{TM}}\) with i7-5820K processors (3.30 GHz).

The results of all optimization procedures are compared with the values of the initial design (Fig. 5). All routines achieve a substantial decrease in the PM size from \(133~\hbox {mm}^2\) up to about \(63~\hbox {mm}^2\). The price for robustness is a slightly larger size of about \(77~\hbox {mm}^2\). The deterministic methods outperform the stochastic ones by two orders of magnitude. This impressively illustrates the major message of this paper stating that deterministic optimization methods accompanied by FE analysis providing gradients with respect to geometric parameters should be favored over stochastic methods, at least for the here considered class of problems.

6 Conclusion

Affine parametrization and design element approaches are capable of parametrizing the geometry of finite-element models such that accurate derivatives with respect to geometric parameters become available. This alleviates one of the major drawbacks of gradient-type deterministic optimization methods. For the example of a die mold press, standard sequential programming combined with the design element approach outperforms particle swarm optimization by more than a factor ten. The second example illustrates the applicability of gradient-type robust optimization combined with an affine parametrization of the geometry for a permanent-magnet synchronous machine. Supported by the substantial improvement in computational efficiency, this paper stands up for a revival of deterministic methods for numerical optimization in electrotechnical design procedures.