1 Introduction

The use of magnetic resonance imaging (MRI) [1] has been an essential tool in modern medical diagnosis due to the scanners’ high in-built resolution when imaging fractures [2], joints [3], and soft tissues, such as damaged cartilage [2] or tumors [4].

Time varying magnetic fields generated as part of the imaging process by an MRI scanner interact with the conducting shields and generate unwanted eddy currents. The purpose of these shields is to isolate or block the strong static magnetic field generated by the main MRI magnet from the region outside the scanner. However, gradient coils, which generate weaker time varying magnetic fields, generate eddy currents and give rise to electromagnetic stresses causing deformations and vibrations within the MRI scanner [5,6,7]. This, in turn, can lead to patient discomfort and ghosting. Still further, eddy currents and vibrations release energy, which can cause helium boil-off, an expensive coolant used to keep the magnets superconducting, and this may, ultimately, lead to a magnet quench. Vibrations are also caused by external machinery (e.g. lifts, cleaning equipment and building maintenance), which can lead to eddy currents being generated in conducting components of the scanner and may cause disturbance to the uniformity of the strong magnet field generated by the main magnet and also lead to ghosting effects. Manufacturers of MRI scanners want to predict and understand these vibrations to formulate a better understanding of the coupling phenomena and design new and better scanners, which exploit alternative materials. To do this, they need to carefully optimise the configuration of the model parameters in order to ensure that the new magnets meet their design specification.

To help design scanners, in collaboration with our industrial partner Siemens Healthineers, this paper addresses three pressing industrial problems: firstly, the accurate and rapid computation of the resonant frequencies associated with this coupled magneto-mechanical system, secondly, the effects of magnet motion on the Ohmic power and kinetic energy curves, and, thirdly, the prediction of the uncertainty in Ohmic power and kinetic energy curves as a function of exciting frequency for uncertain material parameters. Accurate full order model simulations require fine finite element discretisations using dense meshes and/or high order elements, and can take significant time to compute [8,9,10,11,12]. To reduce the cost of solutions for new parameters, proper orthogonal decomposition (POD) reduced order model (ROM) schemes [13,14,15,16] have been explored that use either Galerkin projection or interpolation [17, 18]. However, the projected POD (PODP) approach has an invasive computational implementation, while the interpolated POD approach (PODI) presents challenges when the dimension of the space of parameters to be investigated becomes large. As an alternative, we investigate a POD technique based on using a neural network regression [19, 20], which is not invasive but has superior approximation properties compared to PODI. Using multi-layer perceptrons (MLP) neural networks in combination with POD has been employed by Hesthaven and Ubbiali [21] and this technique is called PODNN (or POD-NN) in the literature. The use of neural networks in the context of POD range from the prediction of wind pressure and wind induced responses for high-rise buildings [22] with a emphasis on back-propagating networks, to the use of recurrent neural networks (RNN) closure of parametric POD-Galerkin ROMs [23]. In the latter, it is shown that a long short-term memory (LSTM) type of RNN can significantly improve accuracy and efficiency for non-linear problems, even beyond the time interval of training data. More recently, the use of other deep learning based reduced order models (DL-ROMs) in POD has been considered in [24, 25] with a more extensive literature review given in [26] covering artificial neural networks (ANN), physics informed neural networks (PINNs) and feed forward neural networks and the differences between them. The application of this technology to the three aforementioned problems will demonstrate the advantages of using the neural network based POD over our previous POD approaches.

The structure of the paper is as follows: Sect. 2 briefly outlines the mathematical model of the problem of the coupled magneto-mechanical problem, in the form of a transmission problem, that we wish to tackle in this project. In this section, we also describe the finite element discretisation and the structure of the algebraic system to be solved. Then, in Sect. 3, we describe the reduced order modelling approach based on POD and describe the offline and online stages. For the online stage, we focus on PODNN and show comparisons with other POD approaches. We also emphasize the importance of how the neural networks should be applied to POD to ensure accurate results efficiently. Sect. 4 presents a series of numerical examples where the PODNN technique is applied to the problems of interest in this work. This includes the prediction of resonant modes in the Ohmic power and kinetic energy curves using different parameter spaces, predicting the response in the same outputs of interest for different external vibrational loading conditions and predicting the response in outputs of interest when model parameters are uncertain. Finally, in Sect. 5 we conclude our findings and recall the work carried out in this paper.

2 Magneto-mechanical model

To computationally simulate the coupled electro-mechanical interactions in a MRI scanner we choose to follow [12] and formulate the equations in a Lagrangian setting. We recall that this mathematical model assumes that both the displacements and the velocities of the conducting components are small. The problem of interest is described by Fig. 1, where \(\Omega _\mathrm{{C}}\) denotes a conducting region, \(\Omega ^\mathrm{{c}}_\mathrm{{C}}:=\mathbb {R}^3 {\setminus } \overline{\Omega }_\mathrm{{C}}\) denotes the (unbounded) non conducting region and \(\partial \Omega _\mathrm{{C}}\) represents the boundary of the conducting region. The unknown fields for this problem are the magnetic vector potential \(\varvec{A}\) and the mechanical displacement \(\varvec{u}\), while \(\varvec{J}\) is a known external current source.

Fig. 1
figure 1

General representation of the magneto-mechanical problem, illustrating a conducting region \(\Omega _\mathrm{{c}}\) with magnetic permeability \(\mu = \mu _{*}\) and electrical conductivity \(\gamma = \gamma _{*}\) that is within an unbounded non-conducting region \(\Omega ^\mathrm{{c}}_\mathrm{{C}}\) with \(\mu = \mu _0\) and \(\gamma = 0\). An excitation is applied by a current source \(\varvec{J}^s\) that is prescribed in the coils

Continuing to follow [12], by linearising about the static (DC) solution, the transient (AC) problem becomes linear in time (t) dependent fields and the latter can be represented in a time-harmonic fashion so that

(1a)
(1b)

with a similar decomposition for \(\varvec{J}\), where \(\omega = 2 \pi f\) is the angular frequency of the driving excitation, f is the frequency in Hz and \(\mathrm {{i}}:= \sqrt{-1}\). A linearised transmission problem has already been derived in [12, 27] for this problem leading to the strong form

(2a)
(2b)
(2c)
(2d)
(2e)
(2f)
(2g)
(2h)

where \(\gamma \), \(\rho \) and \(\mu \) are the electric conductivity, mass density and magnetic permeability, respectively; \(\varvec{n}\) is a unit outward normal (pointing from the conducting to the non-conducting side); is defined as the mechanical contribution to the Cauchy stress tensor; is the linearised Maxwell stress tensor; \(\partial \Omega _\mathrm{{C}}^D\) denotes the (mechanical) Dirichlet part of \(\partial \Omega _\mathrm{{C}}\) and \([\cdot ]_{\partial \Omega _\mathrm{{C}}}:= (\cdot )|^+_{\partial \Omega _\mathrm{{C}}} - (\cdot )|^-_{\partial \Omega _\mathrm{{C}}}\) denotes the jump, with \((\cdot )|^+_{\partial \Omega _\mathrm{{C}}}\) and \((\cdot )|^-_{\partial \Omega _\mathrm{{C}}}\) representing the non-conducting and conducting side of \({\partial \Omega _\mathrm{{C}}}\), respectively. Once (2) has been solved, the complex amplitudes of the electric and magnetic AC fields in the Eulerian setting can be recovered as

(3a)
(3b)

where and \(\varvec{B}_0^\mathrm{{DC}}={{\,\textrm{curl}\,}}\varvec{A}^{DC}\) and the complete physical fields for both stages are

(4a)
(4b)

In this paper, we assume that the DC fields are known. Our goal is to provide a new reduced order model procedure to allow the rapid prediction of approximations to and for new material parameters and loading conditions. In turn, this allows us to obtain associated outputs of interest as well as to understand the uncertainty in outputs of interest for uncertain material parameters. To this end, we truncate \(\Omega _\mathrm{{C}}^\mathrm{{c}}\) a finite distance from \(\Omega _\mathrm{{C}}\), and create the truncated domain \(\Omega \). On \(\partial \Omega \), we impose as an approximation to the decay condition (2d). To circumvent the Coulomb type gauge in (2b), we add a small perturbation to the left hand side of (2a) in \(\Omega \setminus \Omega _\mathrm{{C}}\) and replace with the regularised solution . We then use the finite element method (FEM) to approximate full order model solutions to and over the \(\Omega \). For further details we refer to [12]. In the next section, we briefly recall the finite element (FE) discretisation employed.

2.1 FEM discretisation

FEM discretisation involves partitioning the domain into non-overlapping regions called elements and then approximating the solution over each of these elements by appropriate piecewise polynomials. We will use an unstructured grid of tetrahedral elements due to the automatic procedures that are available for generating these meshes around complex configurations and adopt a higher order (hp-version) FE framework allowing the combination of refining the mesh spacing (h) and increasing the order (p) of these polynomials in order to improve the accuracy of the solution [8, 10, 11], as was applied in the case in our previous work [12]. The use of such technology allows accurate solutions to be obtained and, in the case of p-refinement for smooth solutions, and combinations of h- and p-refinements, for non-smooth solutions due to sharp edges and corners, is known to lead to exponential convergence of the solution [8]. The mathematics and continuity requirements of the fields dictate that the mechanical displacement and magnetic vector potential should be approximated differently, for details of the weak form of (2) and its derivation we refer to [12]. This then means that \(\varvec{H}({{\,\textrm{curl}\,}})\) conforming basis functions be used for while \(H^1\) conforming basis functions should be used for each component of . The particular choice of basis functions employed correspond to those proposed by Schöberl and Zaglmayr [28, 29]. This means that fields and are approximated by

(5a)
(5b)

where the subscripts hq and hp are used to indicate that these are discrete approximations on a mesh of spacing h with different orders q and p, additionally, \({P^\mathrm{{AC}}_\mathrm{{global}}}\) and \(Q_\mathrm{{global}}\) refers to the number of degrees of freedom (DOF) in the AC system in the same respective manner. Still further, \(\textbf{N}^g\) and \(\textbf{L}^s\) are typical \(\varvec{H}({{\,\textrm{curl}\,}})\) and \(H^1\) basis functions respectively [17].

2.2 Full order model

Using the FEM recalled above, the application of the linearised strategy in [12] leads to the need to solve parametrised linear systems of the form: Find \(\varvec{\mathcal {A}}^{\textrm{AC}}_\epsilon \) and \(\varvec{\mathcal {U}}^{\textrm{AC}}\) such that

$$\begin{aligned} \begin{bmatrix} \textbf{K}^\mathrm{{AC}}_{\mathcal {A}\mathcal {A}} + \textbf{C}^{\textrm{AC}}_{\mathcal {A}\mathcal {A}} &{} \textbf{0} \\ \textbf{K}^{\textrm{AC}}_{\mathcal {U}\mathcal {A}} &{} \textbf{K}^{\textrm{AC}}_{\mathcal {U}\mathcal {U}} - \omega ^2 \textbf{M}^{\textrm{AC}}_{\mathcal {U}\mathcal {U}} \end{bmatrix}\begin{pmatrix} \varvec{\mathcal {A}}^{\textrm{AC}}_\varepsilon \\ \varvec{\mathcal {U}}^{\textrm{AC}} \end{pmatrix} = \begin{pmatrix} \textbf{R}^{\textrm{AC}}_\mathcal {A}\\ 0 \end{pmatrix}, \end{aligned}$$
(6)

where \(\textbf{K}^\mathrm{{AC}}_{\mathcal {A}\mathcal {A}}\), \(\textbf{C}^{\textrm{AC}}_{\mathcal {A}\mathcal {A}}\), \(\textbf{K}^{\textrm{AC}}_{\mathcal {U}\mathcal {A}}\), \(\textbf{K}^{\textrm{AC}}_{\mathcal {U}\mathcal {U}}\), \(\textbf{K}^{\textrm{AC}}_{\mathcal {U}\mathcal {U}}\) and \(\textbf{M}^{\textrm{AC}}_{\mathcal {U}\mathcal {U}}\) refer to the system matrices that are obtained by discretisation of the weak forms. The explicit definitions for these matrix blocks can be seen in [18] with \(\textbf{K}\), \(\textbf{M}\) and \(\textbf{C}\) referring to stiffness, mass and damping type matrices, respectively. The \(\textbf{R}^{\textrm{AC}}_\mathcal {A}\) term drives the solution to the problem. The structure of (6) means that a computational efficient solution strategy is first employed to solve

$$\begin{aligned} \begin{bmatrix} \textbf{A}(\varvec{w})\end{bmatrix} \begin{pmatrix} \textbf{q}(\varvec{w}) \end{pmatrix} = \begin{pmatrix} \textbf{r}(\mathbf {\varvec{w}}) \end{pmatrix}, \end{aligned}$$
(7)

for \(\textbf{q}(\varvec{w}):= \varvec{\mathcal {A}}^{\textrm{AC}}_\varepsilon (\varvec{w})\), where and \(\textbf{r}(\varvec{w}):= \textbf{R}^{\textrm{AC}}_\mathcal {A}(\varvec{w})\), we have emphasised how the solution may depend on parameters of interest \(\varvec{w}\), and have introduced the simplified notation to help with later sections. Once (7) has been solved, \(\varvec{\mathcal {U}}^{\textrm{AC}}\) can be obtained from solving

$$\begin{aligned} \begin{bmatrix} \textbf{K}^{\textrm{AC}}_{\mathcal {U}\mathcal {U}} - \omega ^2 \textbf{M}^{\textrm{AC}}_{\mathcal {U}\mathcal {U}} \end{bmatrix} \begin{pmatrix} \varvec{\mathcal {U}}^{\textrm{AC}} \end{pmatrix} = - \begin{bmatrix} \textbf{K}^{\textrm{AC}}_{\mathcal {U}\mathcal {A}} \end{bmatrix} \begin{pmatrix} \varvec{\mathcal {A}}^{\textrm{AC}}_\varepsilon (\varvec{w}) \end{pmatrix}, \end{aligned}$$
(8)

and, in the case of a non-zero Dirichlet boundary condition in (2c), to the right hand side in (8), an additional contribution corresponding to columns of \(\begin{bmatrix} \textbf{K}^{\textrm{AC}}_{\mathcal {U}\mathcal {U}} - \omega ^2 \textbf{M}^{\textrm{AC}}_{\mathcal {U}\mathcal {U}} \end{bmatrix}\) that have been moved to the right hand side multiplied by the corresponding solution coefficients found from the Dirichlet condition [9], should be subtracted. Of the systems (7) and (8), in practice, the former is larger and more computationally expensive to solve since, typically, \( P^\mathrm{{AC}}_\mathrm{{global}} \gg Q^\mathrm{{AC}}_\mathrm{{global}} \) as \(\Omega _\mathrm{{C}}\) is just a small part of \(\Omega \) and the latter only needs to be solved for those mechanical degrees of freedom in \(\Omega _\mathrm{{C}}\). Henceforth, following the justification considered in [17], the former will be our focus in the following section.

3 Reduced order modelling

To aid with the development and design of a new MRI scanner, linear systems of the form (6) must be repeatability solved a large number of times for different parameters (such as frequency or conductivity of radiation shields), which can lead to exponential growth in computational cost. To address this, we consider a reduced order model (ROM) to optimise work flow and reduce the overall computational cost. In particular, we consider a form of proper orthogonal decomposition (POD) approach e.g. [14, 30, 31], which seeks to extract modal information from a small number of solution snapshots corresponding to sets of snapshot parameters and use the most important of these to construct a problem of reduced size. These ROMs have been successfully applied to a range of engineering problems and, in the context of the linearised magneto-mechanical problem [17], interpolated (PODI) and projected versions (PODP) of the POD for the system (7) have been considered. In this work, we investigate a POD technique based on using a neural network regression (PODNN). Our implementation is equivalent to that proposed by Hesthaven and Ubbiali [21], and is less invasive than PODP and is amenable to a large number of potential parameters \(\varvec{w}\). We briefly outline the offline (shared by all POD approaches) and the differing online stages below. We also provide some practical insights in to how PODNN should be applied to achieve accurate results.

3.1 Offline stage

Computed solutions to (7) for different sets of (snapshot) parameters \(\varvec{w} = \textbf{w}_j\in {{\mathbb {R}}}^P\) are arranged in \(\textbf{D}\) as:

$$\begin{aligned} \textbf{D} = (\textbf{q}(\textbf{w}_1), \textbf{q}(\textbf{w}_2), \textbf{q}(\textbf{w}_3), \ldots , \textbf{q}(\textbf{w}_{N_\mathrm{{s}}})) \in \mathbb {C}^{\eta \times N_\mathrm{{s}}}, \end{aligned}$$
(9)

with \(N_\mathrm{{s}}\) representing the number of computed snapshots and \(\eta = P^\mathrm{{AC}}_\mathrm{{global}}\). To extract the modal information, a singular value decomposition (SVD) is applied so that \(\textbf{D} ={{\textbf{H}}} {\mathbf \Sigma } {{\textbf{G}}}^H\), where \({{\textbf{H}}}\in {\mathbb C}^{\eta \times \eta }\) and \({{\textbf{G}}}\in {{\mathbb {C}}}^{N_\mathrm{{s}} \times N_\mathrm{{s}}}\) being unitary matrices, \({\mathbf \Sigma }\in {{\mathbb {R}}}^{\eta \times N_\mathrm{{s}}}\) being a diagonal matrix padded out with zeros and \( \Sigma _{ii}\) being the singular values, arranged in descending order decay towards zero and H denoting the matrix Hermitian. Then, by dropping those singular values corresponding to \(\Sigma _{ii}/ \Sigma _{11} \le \) TOL, \(\textbf{D}\) can be well approximated by a truncated singular value decomposition (TSVD)

$$\begin{aligned} \textbf{D} \approx \textbf{D}^m = \textbf{H}^m \mathbf {\Sigma }^m (\textbf{G}^m)^H, \end{aligned}$$
(10)

where m is the number of singular values that are retained. Since \(m< N_\mathrm{{s}}<< \eta \), the truncated matrices of left and right singular vectors follow the form \(\textbf{H}^m = \left[ \textbf{h}_1, \textbf{h}_2, \ldots ,\textbf{h}_m \right] \in \mathbb {C}^{\eta \times m}\) and \((\textbf{G}^m)^H \in \mathbb {C}^{m \times N_\mathrm{{s}}}\) whilst the diagonal matrix follows \(\mathbf {\Sigma }^m \in \mathbb {R}^{m \times m}\). We would like m to be small whilst also maintaining a good approximation to \(\textbf{D}\).

3.2 Online stage

The online stage involves approximating the solution of (7) for new sets of parameters at reduced computational cost compared to the offline stage. The aim is to calculate an approximation \(\textbf{q}(\varvec{w})\) for new \(\varvec{w}\) that are different to the snapshot parameters \(\textbf{w}_j\).

3.2.1 Interpolated POD (PODI)

In the case of interpolated POD (PODI), the approximation is typically chosen as e.g [17, 32]

$$\begin{aligned} \textbf{q}^{\text {PODI}}(\varvec{w}) = \sum _{k=1}^m \textbf{h}_k \Sigma _{kk} \ I_k(\varvec{w},(\textbf{G}^m)^H), \end{aligned}$$
(11)

where \(I_k(\varvec{w},(\textbf{G}^m)^H)\) is an interpolation through the kth row of \((\textbf{G}^m)^H\) or equivalently the kth column of \(\overline{\textbf{G}^m}\), with the overbar denoting the complex conjugate, for the parameter set of interest \(\varvec{w}\). For further details of the implementation see [17].

3.2.2 Projected POD (PODP)

In the case of projected POD (PODP), the form of the approximation is chosen as e.g [15, 17, 33]

$$\begin{aligned} \textbf{q}^{\text {PODP}}(\varvec{w}) = \textbf{H}^m \textbf{p}(\varvec{w}), \end{aligned}$$
(12)

where \(\textbf{p}(\varvec{w}) \in \mathbb {C}^m\) is obtained from solving a small parameter dependent linear system obtained by Galerkin projection of (7)

$$\begin{aligned} \textbf{A}^m(\varvec{w})\textbf{p}^m(\varvec{w}) = \textbf{r}^m(\varvec{w}), \end{aligned}$$
(13)

where \(\textbf{A}^m = (\textbf{H}^m)^H \textbf{A}(\varvec{w})\textbf{H}^m\) and \(\textbf{r}^m = (\textbf{H}^m)^H \textbf{r}\) for each parameter set of interest. For further details of the implementation see [17].

3.2.3 Neural network POD (PODNN)

In the case of a neural network POD (PODNN), one possible choice is analogous to (11) i.e.

$$\begin{aligned} \textbf{q}^{\text {PODNN}}(\varvec{w}) = \sum _{k=1}^m \textbf{h}_k \Sigma _{kk} R_k\left( \varvec{w},\varvec{p}\left( (\textbf{G}^m)^H\right) \right) , \end{aligned}$$
(14)

where a Neural Network is constructed to provide a regression (instead of interpolation) through the kth row of \((\textbf{G}^m)^H\) or equivalently the kth column of \(\overline{\textbf{G}^m}) \). An alternative is to use

$$\begin{aligned} \textbf{q}^{\text {PODNN}}(\varvec{w}) = \sum _{k=1}^m \textbf{h}_k R_k\left( \varvec{w},\varvec{p}\left( \mathbf {\Sigma }^m (\textbf{G}^m)^H\right) \right) . \end{aligned}$$
(15)

The version in (15) can be seen to be equivalent to the strategy proposed by Hesthaven and Ubbiali [21], which, in the above notation, is \(R_k\left( \varvec{w},\varvec{p}\left( (\textbf{H}^m)^H \textbf{D}\right) \right) \approx R_k\left( \varvec{w},\varvec{p}\left( \mathbf {\Sigma }^m (\textbf{G}^m)^H\right) \right) \) by (10). In both the case of (14) and (15), the neural network regression involves creating a function defined by parameters \(\varvec{p}((\textbf{G}^m)^H)\) or \(\varvec{p}(\mathbf {\Sigma }^m (\textbf{G}^m)^H)\), respectively, which enables the prediction of \(\textbf{q}^{\text {PODNN}}\) at low cost for new parameters \(\varvec{w}\). To establish the advantages that (15) offers over (14), we consider the ability of neural networks to capture oscillatory functions in the next section.

3.2.4 Capture of oscillatory functions by neural networks

As a simple example, we wish to consider the ability of a feed forward neural network to capture a vector \(\textbf{y}(x) \in \mathbb {R}^{100}\) of oscillatory functions with components

$$\begin{aligned} y_k(x)&= w_k \sin (\kappa _k x), \end{aligned}$$
(16a)
$$\begin{aligned} \kappa _k&= \frac{2\pi }{k}, \end{aligned}$$
(16b)

where \(w_k\) are amplitudes of the functions and \(x \in [0, 1]\). To do this, we set up a dataset with N samples in the form of ordered pairs

$$\begin{aligned} \mathcal {D}:= \left\{ (\textbf{x}^{(1)}, \textbf{y}^{(1)}), (\textbf{x}^{(2)}, \textbf{y}^{(2)}), \ldots , (\textbf{x}^{(N)}, \textbf{y}^{(N)})\right\} , \end{aligned}$$
(17)

where \(\textbf{x}^{(j)} = \mathrm{{x}}^{(j)}\) is a scalar input feature in this case and \(\textbf{y}^{(j)}\) is the corresponding vectorial output labels. Using a mean squared error (MSE) as a loss function, the training of a neural network for this problem corresponds to finding parameters \(\varvec{p}\) such that

$$\begin{aligned} \mathcal {L}(\varvec{p}):= \min _{\varvec{p}} \frac{1}{N} \sum _{j=1}^N \left| \textbf{y}^{(j)} - \varvec{R}(\textrm{x}^{(j)},\varvec{p})\right| ^2 \end{aligned}$$
(18)

where \({\varvec{R}}(\mathrm{{x}}^{(j)}, \varvec{p})\) describes the regression function provided by a particular Neural Network architecture. We start with a neural network with \(L=1\) layers and \(n=10\) neurons in each layer and a dataset with \(N=101\) entries. In Fig. 2, the results obtained for the cases where a single network is trained to approximate \(\varvec{y}(x)\) with \(w_k=1\), separate networks are trained to approximate \(y_k(x)\) with \(w_k=1\) individually and a single network is applied to capture \(\varvec{y}(x)\) with \(w_k = 1/k^2\) (where the oscillatory functions have reducing amplitude) are compared. The MSE loss for the single network with \(w_k=1\) is \(\mathcal {L}(\varvec{p}) = 3.21 \times 10^{-1}\), then using multiple networks with \(w_k=1\) the loss for the first oscillatory function is \(\mathcal {L}(\varvec{p}) = 4.76 \times 10^{-13}\) and \(\mathcal {L}(\varvec{p}) = 4.99 \times 10^{-1}\) for the last oscillatory function. Finally, the MSE loss for a single network with \(w_k=1/k^2\) is \(\mathcal {L}(\varvec{p}) = 5.13 \times 10^{-8}\). In this figure, we observe that a single network is unable to capture the oscillatory functions for \(w_k=1\), and there are clearly errors present even in the prediction of \(y_1(x)\). On the other hand, training separate networks capture the behaviour of \(\varvec{y}(x)\) well with \(w_k=1\). However, if the oscillatory functions are such that \(w_k=1/k^2\), a single network is able to capture the behaviour of the lowest modes accurately.

Fig. 2
figure 2

Oscillatory functions; feed forward network fit \(y_k(x), k=1,\ldots ,10\) with \(L=1, n=10\) for a single network with \(w_k=1\), b multiple networks with \(w_k=1\), c single network with \(w_k=1/k^2\)

As \(L=1\) and \(n=10\) may not represent the best choice of hyper parameters, we now apply a Bayes optimisation parameter search [34] to determine the best choices of \(L_\mathrm{{opt}}\) and \(n_\mathrm{{opt}}\) in each case so that \(\mathcal {L}(\varvec{p})\) is minimised. Other alternative optimisation strategies could be used, but we expect them to lead to similar results. The resulting loss surfaces are shown in Fig. 3 and, in Fig. 4, we repeat the investigation shown in Fig. 2 for the new hyper-parameters obtained. Again, we can state the MSE loss for the optimised single network with \(w_k=1\) (\(L_\mathrm{{opt}}=3, n_\mathrm{{opt}}=16\)) as \(\mathcal {L}(\varvec{p}) = 2.19 \times 10^{-1}\), then, using multiple networks with \(w_k=1\), the optimised network trained to the first oscillatory function (\(L_\mathrm{{opt}} = 2, n_\mathrm{{opt}}=5\)) as \(\mathcal {L}(\varvec{p}) = 2.09 \times 10^{-10}\) and the optimised network trained to the last oscillatory function (\(L_\mathrm{{opt}} = 2, n_\mathrm{{opt}}=14\)) as \(\mathcal {L}(\varvec{p}) = 4.22 \times 10^{-1}\). Lastly, using an optimised single network with \(w_k=1/k^2\) (\(L_\mathrm{{opt}}=1, n_\mathrm{{opt}}=12\)) gives \(\mathcal {L}(\varvec{p}) = 3.12 \times 10^{-8}\). Similar to Fig. 2, Fig. 4 shows that a single network with optimised hyper-parameters is unable to capture well \(\varvec{y}(x)\) when \(w_k=1\), but is able to capture the behaviour well when \(w_k=1/k^2\). When networks are trained to predict \(y_k(x)\) individually, the prediction is accurate for the first few oscillatory functions but as they become more oscillatory, the loss is higher, as seen on Fig. 3 for the last oscillatory function, which hence means they are unable to capture \(y_k(x)\) for higher k.

Fig. 3
figure 3

Oscillatory functions; the MSE loss function (18) as a function of \(L = 1, \ldots , 5\) and \(n= 1, \ldots , 16\) used in the Bayes optimisation for a single network with \(w_k=1\), b multiple networks with \(w_k=1\) for the first oscillatory function, c multiple networks with \(w_k=1\) for the last oscillatory function and d single neural network \(w_k=1/k^2\)

Fig. 4
figure 4

Oscillatory functions; feed forward network fit \(y_k(x), k=1,\ldots ,10\) with optimum hyper-parameters for a single network with \(w_k=1\), b multiple networks with \(w_k=1\) and c single network with \(w_k=1/k^2\)

The computational costs of using a single neural network with \(w_k=1\), multiple networks with \(w_k=1\) and a single network with \(w_k=1/k^2\) are compared in Fig. 5 for both a fixed choice of L and n and also those obtained from the Bayes optimisation \(L_\mathrm{{opt}}\) and \(n_\mathrm{{opt}}\). These, together with the aforementioned plots, illustrate that a single network can perform well and is efficient provided that the oscillatory functions have reducing amplitudes.

Fig. 5
figure 5

Oscillatory functions; timing comparison between a single network and multiple networks based on simulations a in Fig. 2 and b Fig. 4

3.2.5 Implications for PODNN

Returning again to (14) and (15), we see that the latter involves \(R_k\left( \varvec{w},\varvec{p}\left( \mathbf {\Sigma }^m(\textbf{G}^m)^H\right) \right) \), where the rows of \(\mathbf {\Sigma }^m(\textbf{G}^m)^H\) represent functions that are both oscillatory and reducing in amplitude, whilst the former involves \(R_k\left( \varvec{w},\varvec{p}\left( (\textbf{G}^m)^H\right) \right) \), with the rows of \((\textbf{G}^m)^H\) being oscillatory but not reducing in amplitude. Thus, by Sect. 3.2.4, the latter case is preferred as it requires less computational effort and requires only the training of a single network. In practice, the datasets we will use for PODNN can be setup in a similar way to \(\mathcal {D}\) described above and, for later use, it is useful to introduce

$$\begin{aligned} \textbf{x}^{(j)} = {\left\{ \begin{array}{ll} \textbf{w}_j &{} \text { for linear spaced snapshots,} \\ \log _{10}(\textbf{w}_j) &{} \text { for logarithmically spaced snapshots,} \end{array}\right. } \end{aligned}$$
(19)

where the \(\log _{10}(\textbf{w}_j)\) should be interpreted as \(\big (\log _{10}(\textbf{w}_{1j}), \log _{10}(\textbf{w}_{2j}), \ldots , \log _{10}(\textbf{w}_{Pj})\big )\) for P parameters and use a similar approach for describing the continuous variable \(\varvec{x}(\varvec{w})\). Analogous to (18), we also state the corresponding MSE loss function for this case as

$$\begin{aligned} \mathcal {L}^{\text {PODNN}}(\varvec{p}):= & {} \min _{\varvec{p}}\frac{1}{N} \sum _{j=1}^N \left| \left( \mathbf {\Sigma }^m({\textbf{G}}^m)^H\right) _{:,j}\right. \nonumber \\{} & {} \left. - \varvec{R}\left( \textbf{x}^{(j)},\varvec{p}\left( \mathbf {\Sigma }^m({\textbf{G}}^m)^H\right) \right) \right| ^2, \end{aligned}$$
(20)

where the notation  : , j implies all rows and the jth column.

4 Numerical examples

Numerical examples that demonstrate the advantages of the PODP scheme over a PODI scheme in the magneto-mechanical problem described in Sect. 2 has been presented in [17] and so we focus on PODNN and the additional benefits this offers. We begin by describing the test-magnet geometry in Sect. 4.1, which will be the focus of our comparisons presented in this work. Then, in Sect. 4.2, we describe the software used to generate our numerical results followed by a description of the discretisation in Sect. 4.3. We present predictions of resonant modes in the Ohmic power and kinetic energy curves using different parameter spaces for the test magnet geometry in Sect. 4.4 followed by the predictions of the response in the same outputs of interest for different external vibrational loading conditions in Sect. 4.5. Then, in Sect. 4.6, the PODNN technique is then also used to predict the response in outputs of interest when model parameters are uncertain.

4.1 Test magnet geometry

Our focus will be a simplified MRI configuration called the test magnet problem, which is rotationally symmetric, and was previously considered in [12]. For the simplified model, only the rotational symmetric z-gradient coil is considered and the non-rotationally symmetric x and y gradient coils are removed. If desired, this allows the problem to be reduced to a axisymmetric problem although we will model it as one quarter of the full three-dimensional geometry.Footnote 1 An illustration of this MRI configuration can be seen on Fig. 6, which shows a typical 2D cross-section and 3D view of the general setting. In this figure, the conducting shields (4 K and 77 K) and the outer vacuum chamber (OVC) are shown in different shades of blue and these will deform under the presence of electromagnetic stresses generated due to the presence of eddy current currents in the conducting components. The main coils and gradient coils are shown in shades of red. The dimensions, exciting currents and materials of this problem are commercially sensitive, but indicative values are provided in [35]. Unless otherwise stated, we assume that in the boundary condition on \(\partial \Omega _\mathrm{{C}}^D\) in (2e) to reflect the fact that the magnet geometry is fixed in position. Furthermore, the transmission problem described by (2) must be supplemented by the symmetry boundary conditions [17]

(21a)
(21b)
(21c)

that are to be imposed on the \(x=0\) and \(y=0\) planes.

Fig. 6
figure 6

Test magnet problem with z (longitudinal) gradient coil; illustration of the components of the problem a in the axisymmetric meridian plane and b 3D view

As outputs of interest for the MRI configurations involve the output (dissipated) power and the kinetic energy which are defined as:

(22a)
(22b)

as a function of \(\omega \). In practice, our approximation to these will be evaluated for a particular conducting shield \(\Omega _\mathrm{{C}}\). At an intermediate stage, we will also consider plots of approximations to as a function \(\omega \), which is defined asFootnote 2

(23)

4.2 Software

The magneto-mechanical model will be simulated using a simulation tool that has been extended from the previously developed higher order finite element and reduced order model tool developed by our group to simulate the problems described in [12, 17, 18]. The latest additions include the extension to handle the PODI and PODNN reduced order models for large parameter space \({\varvec{w}}\), uncertainty quantification features and non-zero Dirichlet displacements of the shields. The neural networks were implemented with MATLAB’s Deep Learning Toolbox [36], which has similar functionalities to popular python packages such as tensorflow [37], scikit-learn [38] or pytorch [39], since our simulation tool was also written in MATLAB [40]. In particular, the feedforwardnet tool was used to build multilayer perceptron (MLP) neural networks. In the following, we focus on numerical results for the new reduced order models as the accuracy of the linearised strategy in (2) and the hp-finite element discretisation has already been established in [12].

For PODNN, the split between \(\mathcal {D}_\mathrm{{train}}\) and \(\mathcal {D}_\mathrm{{test}}\) is \(85\%\) and \(15\%\), respectively, and is applied throughout. Prior to training using \(\mathcal {D}_\mathrm{{train}}\), values were normalised to the standard deviation and mean of

\((\mathbf {\Sigma }^m(\textbf{G}^m)^H)_{1,:}\). Additionally, the form of the activation function followed a \(\tan \)-sigmoid function. To solve the minimisation problems (20) the Bayesian regularisation solver with the option to use the Jacobian for calculations was employed with the maximum number of epochs set to 5000 and min grad\(=10^{-10}\). Bayes optimisation, implemented with MATLAB’s Statistics and Machine Learning Toolbox [41], is applied to determine the optimum number of layers \(L_\mathrm{{opt}}\) and neurons per layer \(n_\mathrm{{opt}}\), in a similar way to Sect. 3.2.4 with other hyper-parameters set as per [36]. All computations were performed on a workstation that comprised of a 12-core Intel Xeon W-2265 Processor with 128GB (8 \(\times \) 16GB) RAM and a NVIDIA Quadro RTX 4000 8GB graphical processing unit.

4.3 Discretisation

For this problem, \(\Omega _\mathrm{{C}}^\mathrm{{c}}\) is truncated to form a finite computational domain \(\Omega \) by excluding the region outside of a suitably sized cylinder. We discretise \(\Omega \) by an unstructured mesh of \(33\,805\) tetrahedral elements and employ a discretisation with \(q = 3\) and \(p = 3\) order elements for and , respectively. To dampen the resonant peaks in the model, we employ a fixed damping ratio of \(\xi = 2 \times 10^{-3}\) and to circumvent the Coulomb gauge we use a regularisation parameter of \(\varepsilon = 10^{-4}\). This choice of discretisation has been shown to provide accurate full order model solutions to the test magnet geometry over the parameters of interest [12].

4.4 Prediction of resonant modes

4.4.1 Offline stage

We first consider the effects of different choices of snapshots and the behaviour it has on the decay of \(\Sigma _{ii}\). Considering the case of \(P=1\) and \({\varvec{w}} = (\omega ) =2\pi (f)\) with \(5 \le f \le 5000\) Hz, we will explore five different selections of snapshot frequencies \({{\textbf{w}}}_i =2\pi ({f}_i)\) and decide which will be most appropriate for our problem. From previous analysis in [17], we know that the test-magnet geometry is sensitive to lower frequencies and, hence, may require more frequencies within that lower range.

Secondly, we consider the choice of snapshot parameters for case of \(P=2\) with \({\varvec{w}} = (\omega , \gamma _0^\mathrm{{OVC}})\) where \(0.1 \le \gamma ^\mathrm{{OVC}}_0\le 10\) is a non-dimensional factor used to scale the conductivity in the OVC such that

$$\begin{aligned} \gamma ^\mathrm{{OVC}} = \gamma ^\mathrm{{OVC}}_0 \gamma ^\mathrm{{OVC}}_\mathrm{{ref}}, \end{aligned}$$
(24)

where \(\gamma ^\mathrm{{OVC}}_\mathrm{{ref}}\) is a reference value for conductivity of the shield.

In [17] a piece-wise linear choice for the snapshot frequencies was made, but it is not clear that this is the best choice. Therefore, we compare the performance of selecting the snapshots \({f}_i\) in a linear, piece-wise linear and logarithmic fashion in the following.

Linear The linear snapshot frequencies \({f}_i\) in Hz are chosen such that:

$$\begin{aligned} {f}_i = 5 + \frac{5000 - 5}{N_\mathrm{{fs}} - 1} (i-1), \end{aligned}$$
(25)

for \(i = 1, \ldots , N_\mathrm{{fs}}\), where \(N_\mathrm{{fs}}\) is the total number of snapshot frequencies and, in the case of \(P=1\), \(N_\mathrm{{s}} = N_\mathrm{{fs}}\).

Piece-wise linear The piece-wise linear snapshot frequencies \({f}_i\) are chosen by first introducing the splitting \(N_\mathrm{{fs}} = N_\mathrm{{fs}}^{(1)} + N_\mathrm{{fs}}^{(2)}\), which are then used to calculate the spacings

$$\begin{aligned} \begin{aligned} \Delta {f}^{(1)}&= \frac{1000 - 5}{N_\mathrm{{fs}}^{(1)} - 1}, \\ \Delta {f}^{(2)}&= \frac{5000 - 1000}{N_\mathrm{{fs}}^{(2)} - 1}, \end{aligned} \end{aligned}$$
(26)

where the numerators are determined the range of frequencies that we intend to split by. The frequencies \({f} _i\) are then constructed as

$$\begin{aligned} {f}_i = {\left\{ \begin{array}{ll} 10 + (i-1)\Delta {f}^{(1)} &{} \text {for } 1 \le i \le N_\mathrm{{fs}}^{(1)},\\ 1000 + (i-N_\mathrm{{fs}}^{(1)}) \Delta {f}^{(2)} &{} \text {for } N_\mathrm{{fs}}^{(1)} < i \le N_\mathrm{{fs}}^{(2)},\\ \end{array}\right. } \end{aligned}$$
(27)

with \(N_\mathrm{{fs}}^{(1)}\) and \(N_\mathrm{{fs}}^{(2)}\) chosen as in [17].

Logarithmically spaced For logarithmically-spaced frequency snapshots, \({f}_i\) is chosen such that

$$\begin{aligned} \log _{10}({f}_i) = \log _{10}(5) + \frac{\big (\log _{10}(5000) - \log _{10}(5)\big )}{N_\mathrm{{fs}} - 1}(i - 1), \nonumber \\ \end{aligned}$$
(28)

with \(i=1,\ldots ,N_\mathrm{{fs}}\).

Fig. 7
figure 7

Test magnet problem with z (longitudinal) gradient coil; Offline stage for \(N_\mathrm{{fs}} = 180\). A comparison of linear, piece-wise linear and log-spaced snapshots for \( 5\le { f}_i \le 5000\) Hz

For \(N_\mathrm{{fs}}=180\), we compare the distribution of \({f}_i\) according to (25), (27) and (28) in Fig. 7, with the first being equally spaced and second and third showing a clustering of samples for smaller frequencies as expected. Making these different choices of snapshots to generate full order model solutions \({\textbf{q}}({{\textbf{w}}}_j)\), creating \({{\textbf{D}}}\) in (9), and then applying the TSVD in (10) leads to the decay of \(\Sigma _{ii}\) shown in Fig. 8. Comparing the different choices, the log-spaced snapshots in (28) leads to fastest decay of the singular values. Setting \(TOL=10^{-6.5}\) (due to the decay in Fig. 8 indicating that the decay is no longer exponential after this point), then, by choosing m according to the largest value for which \(\Sigma _{mm} / \Sigma _{11}<\) TOL, we obtain \(m=34\) for (25), \(m=28\) for (27) and \(m=20\) for (28), hence, leading to the smallest reduced order model when the log-spaced frequencies are used. An additional benefit of (28) over (27) is that we do not need to define a priori where to split the piece-wise linear spacing of frequencies.

Fig. 8
figure 8

Test magnet problem with z (longitudinal) gradient coil; Offline stage for \(P=1\), \(N_\mathrm{{fs}} = 180\). Relative singular values \(\Sigma _{mm}/\Sigma _{11}\) obtained using linear, piecewise-linear and log-spacing for \({f}_i\)

Fig. 9
figure 9

Test magnet problem with z (longitudinal) gradient coil; offline stage for \(P=1\), \(N_\mathrm{{fs}} = 180\). Complex amplitudes \(|(\mathbf {\Sigma }^m(\textbf{G}^{m})^{H})_{ij} |\) as lines for a fixed \(i=1,\ldots ,10\) against \({f}_j\), \(j=1,\ldots , N_\mathrm{{fs}},\) for a linearly spaced \({f}_j\), b piece-wise linearly spaced \({f}_j\), c log-spaced \({f}_j\) and d against \(\log { f}_j\) for log-spaced \({f}_j\)

The complex amplitudes \(|(\mathbf {\Sigma }^m(\textbf{G}^m)^H)_{ij}|\) shown as lines for a fixed \(i=1,\ldots ,10\) against \({f}_j\), \(j=1,\ldots , N_\mathrm{{fs}}\), are included on Fig. 9 for the different choices of snapshots. The general trend for linear, piece-wise linear and logarithmically spaced snapshots is that the lower modes (e.g. \(i=1,2\)) are simple smooth functions that are not very oscillatory, but, as i increases, the modes become increasingly more oscillatory. For higher modes, the oscillatory nature is very pronounced for small \({f}_j\) and Fig. 9d indicates there are additional benefits by showing \(|(\mathbf {\Sigma }^m(\textbf{G}^m)^H)_{ij}|\) against \({f}_j\), \(j=1,\ldots , N_\mathrm{{fs}}\) on a log-log graph indicating that \(\textrm{x}^{(j)} =\log _{10} \omega _j= \log _{10} (2\pi {f}_j)\) may be a good choice in this case. Henceforth, we choose the frequency snapshots according to (28) and use \(\textrm{x}^{(j)} =\log _{10} \omega _j\), but in the following we will not only consider \(N_\mathrm{{fs}}=180\), but also cases of \(N_\mathrm{{fs}}=13, 23, 45, 90\) to reduce computational effort needed for the off-line stage. The snapshots \(N_\mathrm{{fs}}=45,90\) follow a similar decay and lead to models of size \(m=20\), but, the \(N_\mathrm{{fs}}=13,23\) cases are seen to decay faster and henceforth a reduced model of size \(m=18\). However, as seen in Sect. 4.4.2 for \(N_\mathrm{{fs}}=13\), it will not be a sufficient amount of snapshots to accurately capture the solution of the problem.

Having found that a logarithmic spacing \({f}_i\) is preferable, we now consider how best to choose the conductivity factor snapshots \(\gamma ^\mathrm{{OVC}}_{0,i}\). Given that \(\gamma \) and \(\omega \) appear as a product in (2a) we argue that the best choice for the spacing of the snapshots \(\gamma ^\mathrm{{OVC}}_{0,i}\) is also logarithmic. Hence, we propose

$$\begin{aligned} \log _{10}(\gamma ^\mathrm{{OVC}}_{0,i}) = \log _{10}(0.1) + \frac{\big (\log _{10}(10) - \log _{10}(0.1)\big )}{N_\mathrm{{cs}} - 1}i,\nonumber \\ \end{aligned}$$
(29)

with \(i=1,\ldots , N_\mathrm{{cs}} \) and \(N_\mathrm{{cs}} \) is the chosen number of snapshots for the conductivity factor. This means that for the case of \(P=2\) we have \(N_\mathrm{{s}}= N_\mathrm{{cs}} N_\mathrm{{fs}}\) snapshots in total. The decay of singular values, shown in Fig. 8 for \(P = 1\), can also be found for \(P=2\) and shows a similar behaviour. In this case, however, setting \(TOL = 10^{-6.5}\) corresponds to a reduced model of size \(m = 44\) if \(N_\mathrm{{s}} = 23 \times 23\) and, hence, still offers a considerable saving.

4.4.2 Online stage

Predictions for \({\varvec{w}}= (\omega ) = 2\pi ( f)\) with \(P=1\) In the online stage of either PODI or PODNN we can easily evaluate (23) and (22) for any frequency of interest. However, for the purpose of visualisation, we will show results for the case where the ROMs are evaluated for the following output frequencies in Hz

$$\begin{aligned} f_i = 15 + \frac{5000-15}{N_\mathrm{{fo}}-1} i, \end{aligned}$$
(30)

for \(i = 1, \ldots , N_\mathrm{{fo}}\), with \(N_\mathrm{{fo}} = 500\), unless otherwise stated.

We use Bayes optimisation as in Sect. 3.2.4 to find \(L_\mathrm{{opt}}\) and \(n_\mathrm{{opt}}\) using the search space \(L=1,\ldots ,5\) and \(n=1,\ldots ,16\). The resulting MSE loss \(\mathcal {L}^{\text {PODNN}}(\varvec{p})\) is shown on Fig. 10a for \(N_\mathrm{{fs}}=180\). As previously observed in Fig. 3, a similar \(\mathcal {L}^{\text {PODNN}}(\varvec{p})\) trend is observed with \(\mathcal {L}^{\text {PODNN}}(\varvec{p}) < 10^{-6}\) for \(n>5\) independent of L in this case. This investigation will be repeated for \(N_\mathrm{{fs}}=13, 23, 45, 90\) as each will give rise to a slightly different \(|(\mathbf {\Sigma }^m(\textbf{G}^m)^H)_{ij}|\). We additionally show in Fig. 10b the ability of the optimised network to fit \(|(\mathbf {\Sigma }^m(\textbf{G}^m)^H)_{ij} |\) for a fixed \(i=1,9,19\) against \(f_j\), \(j=1,\ldots , N_\mathrm{{fs}}\) for \(N_\mathrm{{fs}}=180\). As expected, the higher modes are more oscillatory and, hence, the fitting becomes comparatively worse for these modes. For example, \(i=9\) is shown to have reasonable fitting at higher frequencies. However, for \(i=19\), the network is partially able to capture \(|(\mathbf {\Sigma }^m(\textbf{G}^m)^H)_{ij} |\) against \(f_j\), \(j=1,\ldots , N_\mathrm{{fs}}\). Due to the fact that \(\Sigma ^m_{19,19}/\Sigma ^m_{1,1} \approx 5.9 \times 10^{-7}\), which is close enough to the TOL chosen, the partial fitting of this mode is well enough for a reasonable prediction as seen in Fig. 11.

Fig. 10
figure 10

Test magnet problem with z (longitudinal) gradient coil; online stage for \(P=1\). Showing a MSE loss as a function of \(L=1, \ldots ,5\) and \(n = 1, \ldots , 16\) and b comparisons of \(|(\mathbf {\Sigma }^m(\textbf{G}^{m})^{H})_{ij}|\), \(j=1,9,19\) for \(N_\mathrm{{fs}}=180\)

To illustrate the interdependence between \(N_\mathrm{{fs}}\) and some of the hyper-parameters of the MLP, we show in Fig. 11 the behaviour of as a function of \(\omega =2\pi f \) obtained by applying (15) and then (5a) for different \(N_\mathrm{{fs}}\) and different values of the hyper-parameters \(L=L_\mathrm{{opt}}\) and \(n=n_\mathrm{{opt}}\) found by the Bayes optimisation with their respective \(\mathcal {L}^{\text {PODNN}}(\varvec{p})\). The different lines in the figure correspond to the cases where \(\Omega _\mathrm{{C}}\) is considered to be the 4K, 77K shields and the OVC in turn, also shown are the corresponding solution snapshot solutions for . For \(N_\mathrm{{fs}}=23,45,90,180\) we get good agreement for all shields, but not in the case of the 4K shield with \(N_\mathrm{{fs}}=13\).

Fig. 11
figure 11

Test magnet problem with z (longitudinal) gradient coil; online stage for \(P=1\), comparison of . Results for a \(N_\mathrm{{fs}} = 13\), b \(N_\mathrm{{fs}} = 23\), c \(N_\mathrm{{fs}} = 45\), d \(N_\mathrm{{fs}} = 90\) and e \(N_\mathrm{{fs}} = 180\)

Fig. 12
figure 12

Test magnet problem with z (longitudinal) gradient coil; online stage for \(P=1\), comparisons of a , b online stage computational expense and c PODNN training and optimisation expense for \(N_\mathrm{{fs}}=[23, 45, 90, 180]\)

Having now established that the performance of PODNN is similar for \(N_\mathrm{{fs}} > 23\) snapshots, we show, in Fig. 12 a comparison between the accuracy and computational efficiencies of PODI, PODP and PODNN. As an approximate measure of accuracy, we introduce

(31)

where and are the evaluations of (23) with the reduced techniques and the full order model, respectively. A more precise calculation would involve the

which does not average out possible spatial errors. However, we have high confidence in the PODNN solutions given the results in Fig. 11 and those presented later in this section. To compare computational efficiencies, we consider the wall clock time of the online stages of the calculation of \(\varvec{q}^{\text {POD}}\) for the POD techniques. We see that PODP is most accurate with independent of the \(N_\mathrm{{fs}}\) considered. PODI performs worse than PODP, especially for lower \(N_\mathrm{{fs}}\) by a few orders of magnitude but improves for larger \(N_\mathrm{{fs}}\). PODNN does not perform as well as the PODP case but for small \(N_\mathrm{{fs}}\), it is shown to perform better than PODI. However, the decrease in accuracy may be due to the tolerances chosen within the network but the accuracy achieved by PODNN is sufficient for this practical application. Additionally, we show the online computational expense for the , with PODNN showing the best in terms of performance. In Figure 10 and Figure 15 of [17], the total online and offline costs of PODI and a full order solver as well as PODP with a full order solver were compared for the same problem (but using a different machine). In both cases, PODP and PODI offer similar computational costs, but just like in Fig. 12b, the accuracy of PODP was substantially better than PODI. Discounting training and optimisation time, the total costs for PODNN would be similar to PODP and PODI shown in [17] if the same machine was used. However, the cost of optimisation and training can be significant as Fig. 12c shows. Whilst the training and optimisation forms part of the offline stage in PODNN, this is significantly more expensive than the offline stages of PODP and PODI if included in the costs.

Using the most efficient and accurate PODNN strategy with \(N_\mathrm{{fs}}=23\), we show, in Fig. 13, the dissipated power \(P^0_{\Omega _\mathrm{{C}}}(\omega )\) and kinetic energy \(E^k_{\Omega _\mathrm{{C}}}(\omega )\) obtained by additionally solving (8) for each output frequency \(f_i\) of interest using the reduced order model (15), field representations (5) and applying (22). The results are in good-agreement with the full order model and, importantly, accurately predict the resonant modes.

Fig. 13
figure 13

Test magnet problem with z (longitudinal) gradient coil; Online stage for \(P=1\), comparisons of Full order and PODNN using \(N_\mathrm{{fs}}=23\). Results for a dissipated power \(P^0_{\Omega _\mathrm{{C}}}\) and b kinetic energy \(E^k_{\Omega _\mathrm{{C}}}\)

To emphasise that this is a 3D magneto-mechanical problem, we visualise the displacement and magnetic flux density fields obtained from solving the PODNN using the aforementioned network within a conducting domain. Figure 14 showcases both the variation in the magnitudes of the real displacement field and the real magnetic flux density field in the outer 4 K shield at output frequencies of \(f=1000\) Hz and \(f=5000\) Hz. Bands of higher and lower magnitudes can be seen in the displacement field at 5000 Hz, but at 1000 Hz, there is a higher displacement near the outer edges of the domain. A similar effect can be seen with the real magnetic flux density for 1000 Hz but at 5000 Hz, the domain shows almost uniform field of low displacement in the conducting domain.

Fig. 14
figure 14

Test magnet problem with z (longitudinal) gradient coil; online stage for \(P=1\) and PODNN using \(N_\mathrm{{fs}}=23\). Results for a at 1000 Hz, b at 1000 Hz, c at 5000 Hz and d at 5000 Hz

Predictions for \({\varvec{w}}= (\omega ,\gamma _0^\mathrm{{OVC}} ) = ( 2\pi f, \gamma _0^\mathrm{{OVC}})\) with \(P=2\)

In the online stage of either PODI or PODNN we can easily evaluate (23) and (22) for any frequencies and conductivity factors of interest. But, for the purpose of visualisation, we evaluate the ROMs at the frequencies stated in (30) and at the following non-dimensional conductivity factors

$$\begin{aligned} \gamma _{0,i}^\mathrm{{OVC}} = \left[ 0.35,0.75, 3.5, 7.5\right] , \end{aligned}$$
(32)

for \(i = 1,\ldots , N_\mathrm{{co}}\), with \(N_\mathrm{{co}} = 4\), unless otherwise stated. This means that we evaluate the ROM for a total of \(N_\mathrm{{fo}}N_\mathrm{{co}}\) sets of output parameters.

Fig. 15
figure 15

Test magnet problem with z (longitudinal) gradient coil; online stage for \(P=2\) and \(N_\mathrm{{fs}}=23, N_\mathrm{{cs}}=23\) comparing for each of the three conducting shields. Results for a 4 K shield at \(\gamma _0^\mathrm{{OVC}}=0.35,0.75\), b 4 K shield at \(\gamma _0^\mathrm{{OVC}}=3.5,7.5\), c 77 K shield at \(\gamma _0^\mathrm{{OVC}}=0.35,0.75\), d 77 K shield at \(\gamma _0^\mathrm{{OVC}}=3.5,7.5\), e OVC shield at \(\gamma _0^\mathrm{{OVC}}=0.35,0.75\) and f OVC shield at \(\gamma _0^\mathrm{{OVC}}=3.5,7.5\)

A similar investigation to that shown in Fig. 11 was repeated for the two-parameter case, again using the Bayes optimisation but with a search space \(L=1,\ldots ,10\) and \(n=1,\ldots , 32\). It should be noted that the optimisation time for this case takes considerably longer than for \(P=1\) and is more costly than the PODP method, with a time of \(5.29 \times 10^5\)s, which is several days. It was found that, in this case, using a \(L_\mathrm{{opt}}=3\), \(n_\mathrm{{opt}}=32\) network for \(N_\mathrm{{fs}}= 23\) and \(N_\mathrm{{cs}}=23\) produces accurate results provided that min grad=\(10^{-10}\) and the number of singular values that is retained is \(m=44\) using the same TOL \( = 10^{-6.5}\) as carried out for \(P=1\). As an example, we show the behaviour of as a function of \(\omega =2\pi f \) with lines corresponding to different conductivity factors \(\gamma ^\mathrm{{OVC}}_0 = 0.35, 0.75, 3.5, 7.5\), which is obtained by applying (15) and then (5a) in Fig. 15, with an achieved MSE loss of \(\mathcal {L}^{\text {PODNN}}(\varvec{p}) = 3.56 \times 10^{-13}\). Results are shown for the 4K, 77K shields and the OVC with a comparison against the snapshot solutions. The graphs shown in the left hand column correspond to cases where the conductivity factors are lower than unity, while those in the right hand column correspond to showing the predictive the ROM where the conductivity factors are greater than unity. The main reason for this separation is due to the log spaced nature of the conductivity snapshots, with a higher abundance of snapshots occurring less than unity than above, the idea here is to see how well the PODNN model with both those cases. In both conductivity factor regions, the PODNN model is seen to match reasonably well with the snapshot solutions at these specified conductivities.

Henceforth, we fix a PODNN using a \(L= 3\), \(n=32\) network and snapshot data corresponding to \(N_\mathrm{{fs}}= 23\) and \(N_\mathrm{{cs}}=23\) using this choice we show, in Figs. 16 and 17, the dissipated power \(P^0_{\Omega _\mathrm{{C}}}(\omega )\) and kinetic energy \(E^k_{\Omega _\mathrm{{C}}}(\omega )\) obtained by additionally solving (8) for each output frequency \(f_i\) and conductivity factor \( \gamma _{0,i}^\mathrm{{OVC}}\) of interest using the reduced order model (15), field representations (5) and applying (22), respectively. Alongside, a comparison with the snapshot solutions is evaluated and can be seen to match well with the PODNN solutions. We observe that the higher conducting factors generally lead lower values of \(P^0_{\Omega _\mathrm{{C}}}(\omega )\) and \(E^k_{\Omega _\mathrm{{C}}}(\omega )\) for each shield and frequency of interest, although the position of the resonant peaks is not substantially affected.

Fig. 16
figure 16

Test magnet problem with z (longitudinal) gradient coil; online stage for \(P=2\) and \(N_\mathrm{{fs}}=23, N_\mathrm{{cs}}=23\) comparing the dissipated power \(P^0_{\Omega _\mathrm{{C}}}\) for each of the three conducting shields. Results for a 4 K shield at \(\gamma _0^\mathrm{{OVC}}=0.35,0.75\), b 4 K shield at \(\gamma _0^\mathrm{{OVC}}=3.5,7.5\), c 77 K shield at \(\gamma _0^\mathrm{{OVC}}=0.35,0.75\), d 77 K shield at \(\gamma _0^\mathrm{{OVC}}=3.5,7.5\), e OVC shield at \(\gamma _0^\mathrm{{OVC}}=0.35,0.75\) and f OVC shield at \(\gamma _0^\mathrm{{OVC}}=3.5,7.5\)

Fig. 17
figure 17

Test magnet problem with z (longitudinal) gradient coil; online stage for \(P=2\) and \(N_\mathrm{{fs}}=23, N_\mathrm{{cs}}=23\) comparing the kinetic energy \(E^k_{\Omega _\mathrm{{C}}}\) for each of the three conducting shields. Results for a 4 K shield at \(\gamma _0^\mathrm{{OVC}}=0.35,0.75\), b 4 K shield at \(\gamma _0^\mathrm{{OVC}}=3.5,7.5\), c 77 K shield at \(\gamma _0^\mathrm{{OVC}}=0.35,0.75\), d 77 K shield at \(\gamma _0^\mathrm{{OVC}}=3.5,7.5\), e OVC shield at \(\gamma _0^\mathrm{{OVC}}=0.35,0.75\) and f OVC shield at \(\gamma _0^\mathrm{{OVC}}=3.5,7.5\)

4.5 Prediction of the response to external vibrational loading

The consideration of external vibrational loading for the mathematical model described in Sect. 2 introduces a non-zero boundary condition on \(\partial \Omega _\mathrm{{C}}^D\) in (2e), which reflects the amplitude of the external vibrational loading. Given the nature of the rotational symmetry of the problem considered in Sect. 4.1, we consider only the application of non-zero amplitudes in the z direction since specifying other non-zero components would break the rotational symmetry of the problem and necessitate a full three-dimensional computation rather than simulating quarter geometry and using the symmetry boundary conditions in (21) on the \(x=0\) and \(y=0\) planes.

4.5.1 Offline stage

The offline stage follows the approach described in Sect. 4.4.1 where a PODNN network with \(L=3\), \(n=32\) and \(N_\mathrm{{fs}} = 23\) snapshots for \({\varvec{w}}= (\omega ) = 2\pi ( f)\) with \(P=1\) is employed as described previously.

4.5.2 Online stage

Once (15) has been evaluated for a set of parameters of interest, the smaller dimensional mechanical problem (8) is solved. The magnitudes of the external vibrations are usually very small and so present results obtained by imposing m, in turn, simultaneously on all the conducting shields. To obtain, we solve (8) for these frequencies and each of interest. Following this, we obtain the dissipated power and kinetic energy using (5) and (22), respectively.

In Fig. 18, we show the resulting the dissipated power \(P^0_{\Omega _\mathrm{{C}}}(\omega )\) and kinetic energy \(E^k_{\Omega _\mathrm{{C}}}(\omega )\) curves, where the different lines correspond to different amplitudes of displacement. In these figures, we observe that the dissipated power in the 77K and OVC shields are not substantially affected by introducing non-zero , however, there is a noticeable increase in the magnitude of the kinetic energy curves for these shields when using m, although, the frequencies of the resonant peaks remain unchanged. In the case of the 4K shield, we see the most noticeable changes, with the resonant peaks all but disappearing for the case of m and the kinetic energy curve.

Fig. 18
figure 18

Test magnet problem with z (longitudinal) gradient coil; online stage for \(P=1\) and \(N_\mathrm{{fs}}=23\); comparing the distribution of kinetic energy \(E^k_{\Omega _\mathrm{{C}}}\) and dissipated power \(P^0_{\Omega _\mathrm{{C}}}\) for Dirichlet displacements in z such that m, in turn Result for a \(P^0_{\Omega _\mathrm {{C}}}\) 4 K shield, b \(E^k_{\Omega _\mathrm {{C}}}\) 4 K shield, c \(P^0_{\Omega _\mathrm {{C}}}\) 77 K shield, d \(E^k_{\Omega _\mathrm {{C}}}\) 77 K shield, e \(P^0_{\Omega _\mathrm {{C}}}\) OVC shield and f \(E^k_{\Omega _\mathrm {{C}}}\) OVC shield

4.6 Response when model parameters are uncertain

To understand the influence of an uncertain material parameter on the Ohmic power and kinetic energy curves, we may use the previously developed ROM. We focus on the situation where the conductivity \(\gamma ^\mathrm{{OVC}}= \gamma ^\mathrm{{OVC}}_0 \gamma ^\mathrm{{OVC}}_\mathrm{{ref}}\) in the OVC shield is uncertain. We consider a non-dimensional \(\gamma ^\mathrm{{OVC}}_0\sim \mathcal {N}(m, s)\) with population mean \(m=1\), population standard deviation \(s=\log _{e}(2)\) and \(\gamma ^\mathrm{{OVC}}_\mathrm{{ref}}\) is a reference value with units of S/m and of order \(10^6\).

Fig. 19
figure 19

Test magnet problem with z (longitudinal) gradient coil; online stage for \(P=2\) and \(N_\mathrm{{fs}}=23, N_\mathrm{{cs}}=23, N_\mathrm{{co}} = 50\); comparing the distribution of kinetic energy \(E^k_{\Omega _\mathrm{{C}}}\) and dissipated power \(P^0_{\Omega _\mathrm{{C}}}\) for each of the three conducting shields Results for a \(P^0_{\Omega _\mathrm{{C}}}\) 4 K shield, b \(E^k_{\Omega _\mathrm{{C}}}\) 4 K shield, c \(P^0_{\Omega _\mathrm{{C}}}\) 77 K shield, d \(E^k_{\Omega _\mathrm{{C}}}\) 77 K shield, e \(P^0_{\Omega _\mathrm{{C}}}\) OVC shield and f \(E^k_{\Omega _\mathrm{{C}}}\) OVC shield

4.6.1 Offline stage

The offline stage follows the approach described in Sect. 4.4.1 where a PODNN network with \(L=3\), \(n=32\) network for \(N_\mathrm{{fs}}= 23\) and \(N_\mathrm{{cs}}=23\) snapshots is employed for \({\varvec{w}}= (\omega ,\gamma _0^\mathrm{{OVC}} ) = ( 2\pi f, \gamma _0^\mathrm{{OVC}})\) with \(P=2\), as described previously.

4.6.2 Online stage

In this case, \(N_\mathrm{{co}} = 50\) samples of \(\gamma ^\mathrm{{OVC}}_0\sim \mathcal {N}(m, s)\) are drawn and for each sample, \(N_\mathrm{{fo}}=500\) output frequencies considered. For each combination of \(f_i\) and conductivity factor \( \gamma _{0,i}^\mathrm{{OVC}}\), the dissipated power \(P^0_{\Omega _\mathrm{{C}}}(\omega )\) and kinetic energy \(E^k_{\Omega _\mathrm{{C}}}(\omega )\) are obtained by solving (8) using the reduced order model (15), field representations (5) and applying (22), respectively. Using this data, the mean values and 95% confidence intervals of \(P^0_{\Omega _\mathrm{{C}}}(\omega )\) and \(E^k_{\Omega _\mathrm{{C}}}(\omega )\), at each output frequency of interest, are evaluated and the results shown in Fig. 19. We see, in general, the position of the peaks are not significantly changed, but there is an noticeable difference in the amplitudes, particularly between the frequencies of \(1000\le f \le 3000\) Hz for the 4 K and 77 K shields. The OVC shield does not seem to show as much variation, with the mean and 95% confidence intervals for \(P^0_{\Omega _\mathrm{{C}}}\) and \(E^k_{\Omega _\mathrm{{C}}}\) being almost indistinguishable on this scale.

5 Conclusion

This paper has presented a practical use of a POD technique based on using a neural network (PODNN), to aid in the design phase of new MRI scanner configurations. We compare the performance of PODNN to different POD techniques including projection (PODP) and interpolating (PODI) for this challenging example. The PODNN methodology employed is equivalent to that proposed by Hesthaven and Ubbiali [21]. The PODP and PODI techniques build on our groups earlier work for the coupled magneto-mechanical problem of interest [12, 17]. The PODP approach produced relatively accurate results with a small numbers of snapshots (\(N_\mathrm{{fs}}\) and \(N_\mathrm{{cs}}\)) but was heavily intrusive on the software and required recalculation of FEM matrices. The PODI method allowed the use of simple interpolants (such as a Lagrangian) and was non-intrusive, but had limitations when considering small snapshots. Instead, the PODNN method, does not have an invasive implementation and is well suited to this and produces accurate results but the optimisation time (using Bayes optimisation to obtain hyper-parameters) in the offline stage increases significantly when the parameter space is \(P = 2\), making the PODP technique more computationally efficient if overall time is considered.

Results have been presented for the prediction of resonant modes in the Ohmic power and kinetic energy curves using different parameter spaces and the prediction of the response in the same outputs of interest for different external vibrational loading conditions. The PODNN technique has also been applied to predict the response in outputs of interest when model parameters are uncertain. These results have shown that an appropriate network architecture, can provide high-fidelity solutions to the magneto-mechanical problem for both the \(P=1\) (frequency) and \(P=2\) (frequency and conductivity) parameter cases and provide rapid online prediction of the outputs of interest for the design of MRI scanners. In further work, we also plan to make experimental comparisons with the fields induced in moving magnets and make improvements to the mathematical model presented in Sect. 2 that no longer assumes that the velocities of the conducting components are small.