Parallel-in-Time Simulation of an Electrical Machine using MGRIT

We apply the multigrid-reduction-in-time (MGRIT) algorithm to an eddy current simulation of a two-dimensional induction machine supplied by a pulse-width-modulation signal. To resolve the fast-switching excitations, small time steps are needed, such that parallelization in time becomes highly relevant for reducing the simulation time. The MGRIT algorithm is well suited for introducing time parallelism in the simulation of electrical machines using existing application codes, as MGRIT is a non-intrusive approach that essentially uses the same time integrator as a traditional time-stepping algorithm. We investigate effects of spatial coarsening on MGRIT convergence when applied to two numerical models of an induction machine, one with linear material laws and a full nonlinear model. Parallel results demonstrate significant speedup in the simulation time compared to sequential time stepping, even for moderate numbers of processors.


Introduction
Induction motors are electrical machines in which the magnetic field in the rotor is obtained by an asynchronous motion with respect to the field in the stator, e. g. [38]. In the design process of such machines, motion and electromagnetic fields are numerically determined by space and time discretization of Maxwell's equations. The spatial discretization, typically by finite elements, may lead to large models, e. g., due to resolving the so-called skin effect [29], but this is often mitigated by considering only two-dimensional models. More computational burden comes from the fact that large time intervals must be considered for simulating the start-up phase, i. e., until the machine reaches its steady state. Moreover, machines are commonly excited by pulsewidth modulated (PWM) excitations. PWM uses fastswitching pulses whose width is controlled such that the time-average corresponds to a specific waveform, typically a sine wave such that very small time steps are needed. As a consequence, the computational complexity of transient simulations is extremely high.
Carrying out the simulations on modern parallel compute systems using existing application codes may still result in runtimes that are beyond what is feasible in practice, since parallelism in existing codes is limited to the spatial aspect of the problem. Due to the stagnation of processor clock speeds in recent years, modern compute systems are characterized by growing numbers of processors. Existing application codes cannot take full advantage of these systems, since spatial parallelism limits the amount of processors that can be exploited. A promising approach for reducing simulation times are parallel-in-time methods that introduce parallelism in the temporal dimension. Especially for simulations over long time intervals with small time steps as considered for electrical machines, parallelization in time can become even more relevant than in space.
To this end, various approaches have been proposed for speeding up the transient simulation of electrical machines, e. g., the time-periodic explicit error correction method [47,48] or the extraction of a circuit model [6]. Also, two of the most well-known parallel-in-time methods, Parareal [33] and MGRIT [12], have been applied to eddy current problems. In particular, Parareal has been considered for RL-circuit models and an induction machine [5,20,42], and MGRIT has been applied to a coaxial cable model [18,19]. This paper focuses on MGRIT applied to two numerical models of an induction machine, providing both an extension of ideas of the work on Parareal for problems involving discontinuous or multirate excitations to a multilevel setting as well as broadening the application areas of MGRIT to include more complex eddy current problems.
The idea of MGRIT is to use a time-grid hierarchy, in which each grid level is characterized by a time integrator, with computational costs decreasing from fine to coarse levels. Expensive time-integration routines are applied in parallel on temporal subdomains, and on the coarsest level, time integration is performed over the whole temporal domain. The key difficulty for achieving a significant reduction in simulation time over timestepping algorithms is the choice of the time integrators such that the cost of the time integrators on coarse time grids is less expensive than on the fine grid. A common approach for defining a time-grid hierarchy is to use different temporal resolutions [12,13,27]. Other techniques such as considering time-integration routines of different orders of accuracy have also been applied, e. g., in power-grid simulations [32,44]. When using timestepping routines of existing application codes, the combination of various temporal and spatial resolutions is an attractive approach for defining a time-grid hierarchy, since this allows adding time parallelism nonintrusively and reducing costs on coarse levels at the same time.
The use of coarser spatial meshes has been explored for various parallel-in-time methods. For Parareal, this approach has been considered in [34,40]. Furthermore, the combination of temporal coarsening with coarsening in space is natural in space-time multigrid methods, including the methods in [17,21,25,49]. The composition of the "parallel full approximation scheme in space and time" (PFASST) [11] with spatial multigrid, including spatial coarsening, has been considered in [36], scaling results can be found in [41,45]. The use of spatial coarsening in MGRIT has been explored for the p-Laplacian [16] and for Burgers equation [28]. In this paper, we demonstrate that spatial coarsening can significantly reduce the runtime of numerical simulations of an induction machine, but it may also degrade MGRIT convergence, as observed for the p-Laplacian in [16]. Our tests show that convergence can be improved by using a stronger relaxation scheme.
The remainder of this paper is structured as follows. In Section 2, the MGRIT algorithm is introduced for ordinary differential equations, Section 3 discusses the induction machine model and its spatial discretization. Numerical results of the parallel-in-time simulation are presented in Sections 4 and 5. Finally, conclusions are given in 6.

Multigrid-Reduction-in-Time
Consider a system of ordinary differential equations (ODEs) of the form arising, for example, after the spatial discretization of a space-time partial differential equation (PDE). Discretize the time interval on a uniform distributed temporal mesh t i = i∆t, i = 0, 1, . . . , N t , with constant time step ∆t = (t f − t 0 )/N t , and let u i ≈ u(t i ) for i = 0, . . . , N t with u 0 = u(0). Consider a one-step time integration method with time integrator Φ i , propagating the solution u i−1 from time point t i−1 to time point t i , including forcing from the right-hand-side of the PDE. Abusing notation for the nonlinear case, Equation (2) can be written in matrix form as and solved by a sequential forward block solve. This method is optimal, i. e., O (N t ), and gives the discrete solution after N t applications of the time integrator. Fig. 1 Uniformly spaced fine and coarse temporal meshes. C-points are present on both grids while F -points are only on the fine grid.

The MGRIT algorithm
The multigrid-reduction-in-time (MGRIT) algorithm [12] is an iterative method that applies multigrid reduction (MGR) techniques [39] for solving time-dependent problems like (1). The key practical aspect of MGRIT is its non-intrusiveness, i. e., the algorithm allows reusing existing time propagators and their integration into a parallel framework. For the construction of a multilevel algorithm, reusing the time integrators Φ i , we require a definition of a coarse-grid system, a restriction and a prolongation operator between temporal grids and a relaxation scheme. Therefore, given a positive integer coarsening factor m > 1, we define a splitting of the fine-grid points into F -and C-points, whereby every m-th point is a C-point and the others are Fpoints, resulting in a coarse time grid T ic = i c ∆T, i = 0, 1, . . . , N T , with N T = N t /m and time step ∆T = m∆t as depicted in Figure 1. For the coarse grid, we define a coarse-grid time integrator Φ ic resulting from a re-discretization with time step ∆T of the problem; other choices, such as coarsening in the order of the discretization [15,37] are also possible, but are not considered in this paper. We define two types of block relaxation schemes. The so-called F -relaxation performs a relaxation on F -points and propagates the solution from one C-point to all F -points up to the next C-point.
Similarly, the C-relaxation propagates the solution to a C-point. Both relaxation schemes, depicted in Figure 2, are highly parallel and can be combined to new relaxation schemes. For example, the F CF -relaxation, an Frelaxation followed by a C-and another F -relaxation, is typically used in the MGRIT algorithm. We define two grid transfer operators, injection as restriction and the "ideal" prolongation, for the transfer between the fine and coarse temporal grid. The ideal interpolation is defined as injection at C-points followed by an Frelaxation.
Using these components and the full approximation storage (FAS) framework [8], the two-level MGRIT-FAS algorithm [14], as extended by spatial coarsening in [16], can be written as Algorithm 1.
In Algorithm 1, A l (u (l) ) = g (l) denotes the nonlinear space-time system of equations on levels l = 1, 2 1: Apply F -relaxation to A 1 (u (1) ) = g (1) 2: For 0 to γ: 3: Apply CF -relaxation to A 1 (u (1) ) = g (1) 4: Inject the approximation and its residual to the coarse grid: u (2) = R I (u (1) ), g (2) = R I (g (1) − A 1 u (1) ) 5: If spatial coarsening: u (2) = R s (u (2) ) g (2) = R s (g (2) ) 6: Solve A 2 (v (2) ) = A 2 (u (2) ) + g (2) 7: Compute the error approximation: e = v (2) − u (2) 8: If spatial coarsening: e = P s (e) 9: Correct using ideal interpolation: u (1) = u (1) + P(e) with each block row corresponding to one time step. Note that, in the linear case, operator A is a matrix. The relaxation scheme of the algorithm can be controlled by the parameter γ, and is, in general, for the MGRIT algorithm chosen as γ = 1, i. e., an F CFrelaxation scheme. But other choices are also possible. Note that the two-level variant with F -relaxation is equivalent to the parareal method [33]. We denote the temporal grid transfer operators by R I and P, whereby P is the ideal prolongation and R I the restriction operator based on injection. Additionally, we define R s as spatial restriction and P s as spatial prolongation (see Section 2.2).
The two-level MGRIT-FAS algorithm can be extended to a multilevel setting by applying the two-level method recursively to the system in step 6. Depending on the number of recursive calls, different multilevel schemes, e. g., V -, F -or W -cycles as depicted in Figure  3, can be defined. Note that, in exact arithmetic the MGRIT algorithm with F -or F CF -relaxation solves for the discrete fine-grid solution in N t /m or N t / (2m) iterations [12], respectively.
In Algorithm 1, an initial guess u is needed. In general, if prior knowledge of the solution u exists, this knowledge should be used as an initial guess. If no such prior knowledge exists, an improved initial guess can be computed by using the nested iterations [30,31] strategy. The idea of nested iterations is to compute an initial approximation on the coarsest grid and interpolate the approximation to the finer grids, applying one V -cycle per level. This procedure results in the same structure as the F -cycle, with skipping the down-cycle at the beginning.

MGRIT with spatial coarsening
Spatial coarsening is added to the MGRIT-FAS algorithm by the spatial restriction operator R s in line 5 and the spatial prolongation operator P s in line 8. Note, that spatial coarsening is an extra option that can be applied after semi-coarsening in time and, therefore, does not touch the non-intrusive character of the algorithm. So both operators have to be specified manually, but are independent of the time-integration scheme. Obviously, the spatial grid transfer operators directly influence the convergence behavior of the algorithm and should be chosen wisely.
In a multilevel setting, multiple coarsening strategies can be chosen depending on the number of available grids in time and space. Both coarsening dimensions are independent of each other and, therefore, can be applied in different ways. Assuming that more temporal grids than spatial grids are available, the most cost efficient approach is to use a "direct" spatial coarsening strategy, starting with spatial coarsening as early as possible. This strategy would offer the best results regarding the runtime of the MGRIT algorithm, but studies [16] have shown that this "direct" strategy can degrade the convergence behavior of the MGRIT algorithm for linear and nonlinear problems. To overcome the degradation in MGRIT convergence, in [16] a "delayed" spatial coarsening strategy is proposed. This strategy applies spatial coarsening as late as possible, i. e., only on the coarsest time grids. Figure 4 illustrates the different coarsening strategies for a five-level V -cycle for the case of three spatial grids, characterized by spatial grid spacings of ∆x, 2∆x, and 4∆x, i. e., assuming factortwo coarsening in space. The temporal coarsening factor is chosen to be m = 4, i. e., the time step on each temporal grid is given by 4 l−1 ∆t, l > 0. The left V -cycle represents the MGRIT algorithm with coarsening only in the time dimension. In the middle, the direct spatial coarsening strategy is shown, where spatial coarsening is applied on the first and second time levels. Finally, the right V -cycle illustrates the delayed strategy, with spatial coarsening starting on the third level to make use of the coarsest space grid on the coarsest level.

Simulation of induction machines
The simulation of electromagnetic fields is based on the solution of Maxwell's equations [29], but, for a variety of applications, it is sufficient to solve a (quasistatic) approximation. In this section, we introduce one of these simplifications, the so-called eddy current problem, that is typically used in the simulation of energy converters. Further, we introduce a numerical model of an induction machine that is used for computations in Sections 4 and 5.

Induction machine model
The standard approach in the simulation of electrical machines is to neglect the displacement current density with respect to the source current density. The resulting magnetoquasistatic approximation of Maxwell's equations, the so-called eddy current problem, can be conveniently defined in terms of the unknown magnetic vector potential A : with initial value A (x, t 0 ) = A 0 (x), spatial domain Ω, consisting of the rotor, stator, and the air gap in between, and where (t 0 , t f ] is the time interval. The magnetic flux B = ∇×A vanishes at the boundaries ∂Ω of the spatial domain (Dirichlet boundary condition). The scalar-valued electrical conductivity σ(x) ≥ 0 and the (nonlinear) magnetic reluctivity ν(x, |∇ × A|) > 0 encode the geometry. In three dimensions a gauging condition has to be specified to regularize the problem, e. g. ∇ · A where σ = 0 [3]. The source current density with winding function χ s : Ω → R 3 and currents i s : (t 0 , t f ] → R 3 , models the n s = 3 homogeneously distributed stranded conductors [43]. An attached electrical network establishes a relationship between the socalled flux linkage, i. e., the spatially integrated time derivative of the magnetic vector potential, and the voltage with s = 1, 2, 3. A common quantity of interest in the design of electrical machines are the Joule losses To include the rotation of the rotor, the problem above is extended by an additional motion equation and the moving band approach [35] is used to model the movement in the mesh. Movements are determined by the mechanical equation where ω is the angular velocity of the rotor, I denotes the moment of inertia, and C and κ are the fricition and torsion coefficients, respectively. The torque T mag defines the mechanical excitation of system (5) given by the magnetic field. In the following, we consider a cross-section of the electrical machine to reduce the three-dimensional (3D) domain Ω to a two-dimensional (2D) domain Ω 2D ⊂ R 2 in the x, y-plane. Discretizing (3) in space using finite elements with n a degrees of freedom yields a system of equations of the form with initial condition u 0 ∈ R n and unknown u = [a , i , θ, ω] : (t 0 , t f ] → R n . The solution u (t) ∈ R n for a time point t ∈ (t 0 , t f ] consists of the (line-integrated) magnetic vector potential a (t) ∈ R na , the currents of the three phases i (t) ∈ R 3 , the rotor angle θ (t) ∈ R, and the rotor's angular velocity ω (t) ∈ R. The field/circuit coupled problem consists of differentialalgebraic equations (DAEs) of index-1 [4,9] due to the presence of non-conducting materials, i. e., σ = 0. While the initialization and solution of higher index problems requires the use of more advanced techniques [26], index-1 DAEs can be treated by standard techniques.
Integrating the semi-discrete system (6) using the backward Euler method results in a system of the form A discussion of the index-1 DAE aspects in a parallelin-time setting can be found in [42].

Pulse-width-modulation
Electrical machines are commonly driven by PWM excitations since they have technical advantages for the power electronics that is implemented to control the machine. While multiple forms of PWM exists [7], in this paper, we consider a PWM signal that is produced by comparing a reference wave with a triangular wave.
Idealizing the circuitry to generate the PWM, i. e. diodes are replaced by switches, results in the following excitation with reference signal r (t) and where c (t) denotes the carrier signal. Furthermore, the modulation factor is defined as the ratio of the amplitude of the reference signal and the amplitude of the carrier signal. Figure 5 shows an example of a PWM signal. The red line illustrates a sine reference signal of 50Hz with an amplitude of 0.8 and the carrier signal, modeled by a sawtooth carrier [46] of 500Hz and with an amplitude of one, is shown as a blue line. The resulting PWM signal with switching frequency of 500Hz is presented in the lower subplot of Figure 5.

Numerical model
We model the semi-discrete eddy-current problem (6), supplied by a three-phase PWM voltage source, using the multi-slice FE model "im 3 kw" of an electrical machine first presented in [24] and modified in [20] for the  usage of a PWM voltage source. In more detail, the twodimensional model "im 3 kw", depicted in Figure 6, of a four-pole 3kW squirrel cage induction machine with no-load operation condition is used for numerical computations. Further, as typical for this kind of symmetric models, we consider only a quarter of the machine with periodic boundaries to reduce simulation costs. The machine is supplied by a three-phase PWM voltage source with reference signals r s (t), s = 1, 2, 3, and carrier signal c p (t) with p pulses. The reference signal for the three phases is given by with ϕ 1 = 0, ϕ 2 = 2/3π, ϕ = −4/3π and electric period T . The carrier signal, given by a bipolar trailing-edge modulation using a sawtooth carrier [46], is defined by For our simulation we consider a frequency of 50 Hz, corresponding to an electric period of T = 0.02 s, and choose p = 400 pulses for one period, resulting in a PWM voltage source of 20 kHz. The model provides a linear, i. e., ν does not depend on A, and a nonlinear version of the induction machine. In the nonlinear setting, the voltage source v p s , defined in (9), is multiplied with the time-dependent factor in the first two periods, i. e., for t ∈ [0, 2T ], which reduces the transient behavior of the motor and allows to obtain the steady-state more quickly [24]. The model computes the value of Joule losses P loss for each time point, i. e., the loss in the transfer of electrical energy. The size of Joule losses indicates a significant quality in the prototyping of designs, and, therefore, it is one possible stopping criterion for iterative methods.

Spatial discretization
Gmsh [2,23] is used to generate mesh representations of the model. We consider a hierarchy of three nested spatial grids, illustrated in Figure 7. The coarsest model, defined on the mesh Ω 3 , has n a = 4449 degrees of freedom. Two more accurate discretizations are defined on grids Ω 2 and Ω 1 that are constructed by refinements of the coarsest mesh, resulting in grids with n a = 17,496 and n a = 69,384 degrees of freedom, respectively.
We consider the standard finite element interpolation or nodal interpolation as the grid transfer operator between the spatial meshes. To avoid numerical instabilities at the boundaries between the stator and the rotor, we apply the standard finite element interpolation separately for both regions. Recall from Section 3.1 that in our model problem, only the magnetic vector potential is a function that depends on space, and, therefore, only the magnetic vector potential is discretized on the different spatial meshes, whereas currents, rotor angle, and the rotor's angular velocity are functions that are independent of space. Taking into account that System (6) consists of grid-dependent and grid-independent unknowns, we define the spatial interpolation and restriction operators as block operators with two blocks as follows: For the interpolation of grid-dependent unknowns, we use the standard finite element interpolation, whereas the grid-independent unknowns are injected.

Implementation details
The MGRIT algorithm is implemented in a Python package using the Message Passing Interface (MPI). The implementation conforms to the non-intrusive character of the MGRIT method and requires an implementation of the time stepper routine by the user. For the machine model, the time stepper routine calls the GetDP [1,10,22] library for performing the simulation of the electrical machine which implements the implicit Euler method. As we aim at demonstrating the benefits of parallelization in time, for the implementation on a distributed memory computer, we consider a domaindecomposition approach only in time, i. e., only the time interval is distributed across processors. All tests were performed on an Intel Xeon Phi Cluster consisting of 272 1.4 GHz Intel Xeon Phi processors.

Storage requirements
The MGRIT algorithm requires storing solution values at all C-points. On coarse levels, two additional vectors are stored: one for a restricted copy of the fine-grid solution and one for the FAS right-hand-side. Denoting the number of MGRIT levels by L and the storage cost of a vector on level l by s l , l = 0, . . . , L − 1, the total storage requirement of MGRIT per temporal processor is given by [16] where p is the number of processors in time. The first term corresponds to the storage requirements of solution values, and the second term is that of the additional vectors on the coarse levels. Note that spatial coarsening reduces the size of all vectors on coarse grids. In comparison, the sequential time-stepping algorithm requires storing only one solution value, corresponding to a total storage requirement of s 0 . Thus, the total storage cost of the MGRIT algorithm is larger than that of sequential time-stepping by a factor of S MGRIT /s 0 . However, note that this factor decreases with increasing the number of processors and in the case of spatial coarsening.

Linear material model
In this section, we investigate the behavior of the MGRIT algorithm for the eddy current problem of a two-dimensional induction machine with linear material characteristics. Especially, we are interested in the behavior of adding spatial coarsening to the MGRIT algorithm for system (6), containing scalar-valued as well as grid-dependent unknowns. For this study, we choose a linear model of the induction machine, also available in the model "im 3 kw", to reduce computational costs. In Section 5, we transfer our results to the nonlinear model of the machine.

Algorithmic parameters
The MGRIT-FAS algorithm has a large number of parameters and variations, considering V -, F -and Wcycles, the number of temporal grid levels, coarsening strategies in space and time, number and type of relaxation schemes, and so forth. Based on results presented in the literature as well as on personal experiences, we consider a two-and a five-level MGRIT variant in this study. The two-level algorithm offers a simple structure, while the five-level method allows for more parallelism. Further, we choose V -and F -cycles. V -cycles offer better parallel scaling than F -cycles, but F -cycles have better convergence behavior due to additional work on coarse temporal grids. We consider F CF -relaxation for all variants, and, additionally, F -relaxation is used in the two-level setting as well as in MGRIT F -cycles.
The main goal of the MGRIT algorithm is to reduce the runtime of the simulation by making use of parallel computer architectures. To achieve the best runtime results for our model problem, the temporal coarsening strategy of the algorithm has to be adapted to the number of available processors. Therefore, we choose a non-uniform temporal coarsening strategy directly connected to the number of points on the finest time grid and the number of processors. On the one hand, a large coarsening factor reduces the number of points for all following time grids and, thereby, the costs. On the other hand, to make use of all available processors, the coarsening factor cannot be chosen too large. As a compromise and based on experience with MGRIT for eddy current problems in a parallel setting [18,19], we choose a large coarsening factor on the first level, such that the number of points on the second level corresponds to the number of processors available. On coarse levels, a small coarsening factor is used. For example, consider a fine temporal grid with 129 points, 16 processors available and a five-level MGRIT method. The first temporal coarsening factor is chosen to be eight, such that the work of the first grid is distributed evenly among all processors; here, eight time points per processor. For the coarser levels, a coarsening factor of two is used to reduce the number of time points per grid, such that the coarsest grid has only three points. With this strategy, the first level is perfectly parallelized and the amount of work on coarser levels that cannot be parallelized is minimized. Note that this is a heuristic choice purely based on distributing computational work evenly among a given number of processors. Due to the high computational cost of the external solver and to the use of moderate numbers of processors, it is reasonable to neglect communication costs for this problem. Optimization of the temporal coarsening strategy that takes additional parameters that affect parallel performance, such as communication costs and MGRIT convergence behavior into account, is a topic of future work.
In MGRIT multilevel variants, we apply the direct and delayed spatial coarsening strategies described in Section 2.2, with the interpolation and restriction operators P s and R s defined in Section 3.4. Note that these operators have to be computed only once and can be used for all MGRIT variants. Furthermore, nestediterations, described in Section 2.1, is used to generate an improved initial guess. For this preprocessing step, we use the reference signal (10) as a voltage source based on the idea in [20]. For the iterations of the MGRIT algorithm, the discontinuous signal (9) is used on every grid level. Since we are interested in the algorithmic behavior of different MGRIT variants, in the following, we report on tests with MGRIT convergence measured based on the absolute space-time residual norm instead of the Joule losses (4). More precisely, the convergence tolerance is chosen to be 10 −4 , measured in the discrete L 2 -norm. We have checked that for all algorithms, this fixed space-time residual norm is sufficient to achieve discretization error for our linear model problem if the algorithms converge. That is, algorithms either converge to the time-stepping solution of the linear model of the four-pole squirrel cage induction machine with a PWM voltage source or convergence is only observed after N t /m or N t /(2m) iterations for For F CF -relaxation, respectively.

Effects of spatial coarsening
We apply several MGRIT variants to the linear machine model problem on the space-time domain Ω × [0, 0.03125] s. The time interval, corresponding to about one and a half periods, consists of 2 15 + 1 time steps of size ∆t = 2 −20 that are distributed evenly among 256 processors. Note that this time-step size resolves physical processes as well as the pulses of the PWM voltage source. With this choice of number of processors and time points, the temporal coarsening factor is chosen to be m = 256 on the first level, resulting in a second-level time grid with 256 points. The five-level algorithms use a coarsening factor of four on all coarse levels. Table 1 shows the number of iterations, setup, solve, and total time in seconds for the different MGRIT variants without spatial coarsening. The setup time consists of the setup of the algorithm and the generation of an improved initial guess using nested iterations. Note that for nested iterations, the same relaxation scheme as the algorithm itself is used, e. g., the setup for the fivelevel F -cycle with F -relaxation is faster than that for the five-level F -cycle with F CF -relaxation. The twolevel algorithm with F -relaxation reaches the desired tolerance after 12 MGRIT iterations. While the use of F CF -relaxation is beneficial in the two-level setting, reducing both the number of iterations and the runtime, for F -cycles, F -relaxation performs better than F CFrelaxation. Comparing the total runtimes, the multilevel variants are faster than the two-level methods, with the five-level V -cycle being the fastest. The problem on the second level is still very expensive, and, therefore, using the two-level method recursively for this problem allows for better performance. Now, we consider the same MGRIT variants with spatial coarsening. More precisely, we add spatial coarsening by using the grids Ω 2 and Ω 3 , following either the direct or delayed coarsening strategy. Note that in the five-level MGRIT variants, direct spatial coarsening refers to using Ω 1 on the finest grid, Ω 2 on the second level, and Ω 3 on all remaining coarse levels. Delayed spatial coarsening, in contrast, uses Ω 3 only on the coarsest time grid, Ω 2 on the second coarsest level, and Ω 1 on the three finest levels. In the two-level methods, Ω 2 or Ω 3 is used on the coarse grid. Table 2 demonstrates the effects of adding spatial coarsening to the MGRIT variants considered in Table 1. All variants with F CF -relaxation converge to the solution in the same number of iterations as the variants without spatial coarsening. Especially for this problem, the direct spatial coarsening strategy does not degrade the convergence behavior. The situation is different for all variants with F -relaxation. Neither the two-level method with spatial coarsening converges to the desired tolerance in significantly fewer than N t /m iterations, nor does the five-level F -cycle.  Table 2 Results similar to those of Table 1 but with spatial coarsening; indicates no convergence to the desired tolerance in less than N t /m = 256 iterations and speedup is measured relative to the same MGRIT variant without spatial coarsening.
To better understand the degradation in convergence when using F -relaxation, Figure 8 details the convergence behavior for all F -cycle variants considered in Table 2. Shown are the space-time residual norms over the first ten MGRIT iterations for all F -cycle variants. Dashed lines are results for using direct spatial coarsening and solid lines represent applying the delayed approach. The variants with F CF -relaxation show linear convergence behavior, and there is effectively no difference between both spatial coarsening strategies. In contrast, convergence of MGRIT with F -relaxation stagnates after five iterations. We assume this behavior comes from the scalar-valued unknowns of our system, which are injected between the spatial grids. The additional CF -relaxation in the F CF -relaxation scheme improves the spatial interpolation and fixes this problem.
Comparing the runtimes of the different MGRIT variants with and without spatial coarsening in Tables 1 and 2, the two-level algorithms show the best speedup, decreasing the total runtime by up to a factor of about 2.72, followed by the five-level F-cycle with F CF -relaxation and direct spatial coarsening, which is faster than the same variant without spatial coarsening by a factor of about 1.5. Note that, for this problem, the use of direct spatial coarsening does not degrade  MGRIT convergence and, thus, choosing this more aggressive coarsening strategy leads to higher speedups than applying delayed spatial coarsening. Applying a more aggressive spatial coarsening strategy also in the two-level method with F CF -relaxation similarly allows for a larger speedup over the same algorithm without spatial coarsening. Using the spatial mesh Ω 3 instead of Ω 2 on the coarse time grid, reduces the runtime from 76,157 seconds to 49,551 seconds and, thus, increases the speedup from a factor of about 1.77 to a factor of about 2.72. However, the scalability of the two-level method is limited by the size of the coarse-grid problem, and spatial coarsening can lead to divergence in the nonlinear setting, as will be seen in Section 5.2.

Discussion
Summarizing, the MGRIT algorithm is applied successfully to the linear material model of an induction machine with a pulsed voltage source. With our choice of spatial grid transfer operators, spatial coarsening can be added for all variants with F CF -relaxation, while the methods with F -relaxation methods do not convergence to the time-stepping solution in less than N t /m iterations. A better choice could resolve this behavior. For our problem, we see no difference in the direct and delayed strategy, and, therefore, the direct approach should be used for reducing the runtime. The speedup by adding spatial coarsening corresponds directly to the amount of work on the coarser grids, i. e., the two-level algorithm has the best speedup, followed by the F -cycle.

Full nonlinear model
In this section, we use the results from Section 4.2 of different MGRIT variants compared with one another for studying the application of MGRIT to the full nonlinear model of the induction machine "im 3 kw", i. e., with a state dependent reluctivity ν which models magnetic saturation. In particular, we are interested in effects of spatial coarsening in this setting as well as in the parallel performance of MGRIT. Furthermore, we compare MGRIT with sequential time stepping.

Numerical parameters
To limit computational costs in the full nonlinear setting, we perform experiments on a smaller space-time domain. More precisly, we consider Ω 2 as the spatial grid and choose ∆t = 2 −20 and N t = 10,753 time steps, resulting in a final time t f ≈ 0.01 s. Convergence of MGRIT is measured based on the relative change in values of joule losses at C-points of two consecutive iterations. MGRIT iterations are stopped when the maximum norm of the relative difference of two successive iterates at C-points is below 10 −2 , i. e., when the maximum relative change in joule losses is smaller than 1% for all C-points. Note that this stopping criterion is much looser compared to that in Section 4, but it corresponds to a choice made for prototyping of induction machines in a realistic setting. However, in contrast to using a convergence tolerance based on the space-time residual, it is not guaranteed that MGRIT converges to the discrete time-stepping solution. For the MGRIT variants considered in this section, we have checked that, if convergence is observed, iterates indeed converge to the discrete time-stepping solution.
In the full nonlinear setting, each application of the time propagator Φ involves an iterative nonlinear solve. GetDP, called by the time stepper routine of our MGRIT implementation, uses Newton's method with damping for the solution of the nonlinear problems. As convergence of the Newton solver can depend on the initial guess, in Section 5.2, we consider not only the effect of spatial coarsening on MGRIT convergence, but also on the convergence of the Newton solver. We then use these results in Section 5.3 to choose a set of MGRIT variants for a strong scaling study and comparison to sequential time stepping.

Effects of spatial coarsening
The results in Section 4.2 show that adding spatial coarsening in MGRIT is beneficial in the case of F CFrelaxation, whereas MGRIT with F -relaxation does not converge when spatial coarsening is added. We now consider two-and five-level MGRIT variants with F CFrelaxation and compare the performance of these methods without and with direct spatial coarsening using the spatial mesh Ω 3 on coarse temporal grids. Additionally, we look at the performance of two-level MGRIT with F -relaxation and without spatial coarsening. All tests are performed using 256 processors. Choosing again the non-uniform temporal coarsening strategy described in Section 4.1, a coarsening factor of 42 is used on the first level, and factor-four coarsening is applied on all coarse levels. Table 3 shows iteration counts and runtimes of the different MGRIT variants, with timings split up again into setup and solve times. No results are given for the two-level method with F CF -relaxation and with spatial coarsening, since during the F -relaxation step of the first MGRIT iteration, the Newton solver does not converge for at least one time step and, thus, the algorithm cannot be applied in this setting. In contrast to the setting of linear material laws, here, an initial guess that is in the region of convergence of the Newton solver is crucial. Applying a nested iterations strategy in the two-level setting, the initial guess at the C-points of the fine grid are given by the interpolated solutions of the coarse grid. As no values are given at F -points, Frelaxation is then carried out using the solution from the previous time point as the initial guess, i. e., for the first F -point in each F -interval, the solution at the preceding C-point is used as the initial guess. After solving the corresponding nonlinear problem at the first Fpoint in each interval, the solution is taken as the initial guess for the following F -point, and so on. However, this strategy appears to result in a poor initial guess for the Newton solver at some time points. There are several difficulties for obtaining a good initial guess, including the use of spatial coarsening, the discontinuities of the right-hand side of the problem, as well as the different time-step sizes on the temporal grids in the multigrid hierarchy. Improving the initial guess requires further investigation of strategies for tackling these challenges. This is a topic of future work.
Comparing the results for all convergent MGRIT solvers with those given in Tables 1 and 2, the number of iterations needed for convergence is smaller due to the much looser stopping criterion. Furthermore, in contrast to the results in Tables 1 and 2, all methods converge in roughly the same number of iterations, with the two-level algorithm with F -relaxation taking only one more iteration than all multilevel variants.
Looking at the total runtimes of the different variants without spatial coarsening, the five-level V -cycle algorithm is fastest, followed by the five-level F -cycle variant which is about a factor of 1.5 times slower than considering V -cycles. Furthermore, five-level F -cycles are already about twice as fast as the two-level method. Adding spatial coarsening in the multilevel schemes, we can benefit over the two-level algorithm even more. Considering five-level V -and F -cycles with spatial coarsening, the factor in comparison with the runtime of the two-level method can be increased from 1.9 or 2.8 when applying F -or V -cycles without spatial coarsening, respectively, to a factor of about 3.8 or 4.4 when spatial coarsening is added. Note that, compared to the results in Tables 1 and 2, the benefit of adding spatial coarsening is even higher in the full nonlinear setting than in that of linear material laws, although only two spatial coarse grids instead of three are considered.

Parallel results
Now we present strong scaling results of the five convergent MGRIT variants considered in Table 3. We perform parallel tests using between eight and 256 processors for parallelization in time. The temporal coarsening strategy in all runs is fixed at applying a coarsening factor of 42 on the finest grid and factor-four coarsening on all coarse grids. Additionally, we are interested in the benefits over sequential time stepping. Figure 9 shows total runtimes of the different MGRIT variants as a function of the number of processors, as well as the time-to-solution of the sequential block forward solve for reference purposes. For the multilevel variants, runtimes are shown for using spatial coarsening (dashed lines) and without spatial coarsening (solid lines). The runtime of all multilevel variants using eight processors as well as the time-to-solution of sequential time stepping is about five to six days, while using eightway parallelism in the two-level method results in a runtime of only about four days. However, the multilevel variants show better strong parallel scaling and, thus, lead to faster time-to-solution at large processor counts than the two-level method. Increasing the number of processors to 256, the total runtime can be reduced to about 8.72 hours or 5.57 hours, when considering MGRIT V -cycles without or with spatial coarsening, respectively. While F -cycles with spatial coarsening allow for a similar reduction, F -cycles without spatial coarsening using 256-way parallelism already take about half a the two-level variant reduces the runtime to about one day. Table 4 details speedups and parallel efficiencies for the MGRIT variants. The speedup is computed relative to sequential time stepping, representing the algorithm with the minimum runtime on one processor, i. e., the speedup using p processors is given by S p = T TS /T MGRIT (p). The parallel efficiency is measured as S p /p for p processors. Using multilevel methods, we can benefit more over sequential time stepping at larger processor counts than with the two-level algorithm. For example, while on 16 processors the speedup is similar for all methods, using 32 processors results in a speedup of up to a factor of 3.6 for a five-level variant, whereas the two-level method yields only a speedup of a factor of 2.8. Increasing the number of processors to 256, with the two-level method we achieve a speedup of a factor of about 5. In contrast, multilevel variants at least double the speedup factor. Considering spatial coarsening, leads to the best speedup with the Vcycle algorithm being nearly 22 times faster than the sequential time stepping method. Moreover, the use of spatial coarsening in MGRIT F -cycle is crucial for improving the parallel scalability for more than 64 processors, since this allows reducing computations on coarse levels that dominate over communication costs. While speedups and efficiencies degrade for F -cycles without spatial coarsening, the degredation is only modest when adding spatial coarsening. Note that we bind the temporal coarsening strategy to the maximum number of available processors and, thus, faster runtimes on eight to 128 processors may be possible by adjusting the temporal coarsening strategy to the number of processors.

Runtime [s]
Time stepping 2-level, V-Cycle, F 5-level, V-Cycle, FCF 5-level, F-Cycle, FCF Fig. 9 Total time-to-solution for the nonlinear induction machine "im 3 kw" using different MGRIT variants and sequential time stepping. Solid lines are runtimes without spatial coarsening, and dashed lines represent results with spatial coarsening. The dotted line shows the runtime of time stepping on one processor for reference purposes.  Table 4 Speedup and efficiency of different MGRIT variants using various number of processors. The speedup is given relative to the time-to-solution for the sequential time-stepping on one processor. Parallel efficiency is measured as S p /p, where S p is the speedup for p processors.

Conclusions
In this paper, the MGRIT algorithm is applied to a model of a realistic two-dimensional electrical machine with a pulsed excitation. Using the non-intrusive character of MGRIT for an existing model of an induction machine, performance of MGRIT is dominated by the cost of the time-stepping routine which carries out a nonlinear spatial solve for a given time step. To reduce the cost of spatial solves on coarse grids, the use of spatial coarsening is investigated. We demonstrate that spatial coarsening is a powerful option that can significantly reduce the runtime of the algorithm, but it also comes along with new challenges and problems, e. g., degradation in MGRIT convergence, as also observed in [16] for the p-Laplacian, and divergence of the Newton solver due to a poor initial guess. These are topics of future work. Strong scaling results show significant speedups compared to sequential time stepping. For a full nonlinear model of an induction machine and using moderate numbers of processors, we demonstrate that MGRIT allows reducing the simulation time from several days to only a few hours. While MGRIT is a non-intrusive approach, there is still a very large parameter space for the algorithm, such as the choice of the level structure, temporal and spatial coarsening strategies, and so on. Especially if computational resources are limited to small or moderate numbers of processors, good parameter choices are crucial for obtaining the best possible speedup over sequential time stepping. Future work will focus on developing a model for optimizing MGRIT performance for given numbers of processors and time points.