1 Introduction

Superconducting magnets are the most commercialized application of superconductors that are widely used in a vast range of industries, such as magnetic resonance imaging (MRI), nuclear magnetic resonance, fusion industry, and electrical machines [1,2,3]. Superconducting magnets face multiple challenges during design, manufacturing, and test stages such as manufacturing tolerances, shimming coil design, inhomogeneity of magnetic field, quench-related issues, and the extremely time-consuming magnetic field computation procedure. Artificial intelligence (AI)-based techniques are the shortcuts towards the solutions for these challenges [4, 5].

Last recently, AI methods were used in design stage of superconducting MRI magnets to gain a highly homogenous magnetic field while the size and also the manufacturing costs are minimized [6,7,8]. In fusion industry, AI methods can be used to monitor and to protect the superconducting magnets against quenches [9, 10]. Although many efforts were done to take the advantages of AI techniques for superconducting magnets, there is a lack of surrogate models to compute the magnetic field or other design parameters in an accurate, fast, and reliable manner. Surrogate or meta models are type of data-driven models that estimates a highly complex and nonlinear characteristic when it is very difficult to be computed or measured by conventional methods [11]. The existing necessity of such surrogate model is to address the long computation time by reducing the required computational burden caused by finite element method (FEM). Field calculation methods based on FEMs require an extremely high computation time (up to days) and computational resources to deliver the final results. Although Legendre series-based methods are another type of field modeling techniques which are faster than FEM-based methods, they also require hours and days to get to the final solution, for a highly accurate field calculation [12].

In this paper, a surrogate model based on multi-layer artificial neural networks (ANNs) is introduced to characterize the magnetic field distribution of a 9.4 T MRI magnet consisting of four solenoid coils made out of niobium-titanium (NbTi) wires. The proposed surrogate model is capable of estimating minimum magnetic field in diameter spherical volume (DSV), maximum and minimum total magnetic fields (i.e., MinBDSV, MaxBtot, and MinBtot, respectively) that is simulated in MATLAB software package. It should be mentioned that, in magnets used for MRI systems, the MinBDSV is used to gain the lowest value of field inhomogeneity in DSV borders for gaining a high-quality image, while MaxBtot is used to avoid exceeding the maximum magnetic field from critical field value of NbTi wires and at last, MinBtot is used to recognize the safety zone around the magnet region. To establish such surrogate model, firstly the analytical field distribution model of a superconducting magnet is derived by using Legendre polynomials method. After that, 5000 different coil geometries were considered randomly so that they are inside the Gauss lines and outside of the DSV borders and for each one of them simulations were conducted to acquire the MinBDSV, MaxBtot, and MinBtot. It is worth mentioning that these 5000 coil geometries are chosen randomly so that any magnetic field would be acquirable with the 4-coil understudied magnet. At last, based on the acquired data, an ANN-based surrogate model was developed that accurately estimates the magnetic field distribution of the magnet in the area of interest. This ANN model can be used further in design stages of the superconducting MRI magnets, as it can be used in the form of a “plug and play” code and be integrated in any other software packages to accurately estimate the magnetic field distribution around the proposed magnet, as this will significantly accelerate design process of superconducting MRI magnet. Our results show about 1850 × faster computation time for calculating magnetic field of 750 different coil geometries. This is an important feature since during design process of MRI magnets, magnetic field needs to be calculated frequently and if conventional methods such as finite elements methods and LSM are used, the design procedure could take more than half a day while by using ANN-based model this could be conducted in less than 15 min, maximum.

2 The Analytical Analysis of the MRI Magnet

2.1 The Specifications of the Magnet

To model the magnetic field distribution, a superconducting magnet is considered with 4 main coils and 5 shielding coils that are capable of generating 9.4 T magnetic field with a 10 ppm homogeneity in 400 mm DSV [8]. Figure 1 illustrates the structure of the main coils while the specifications and current excitations are tabulated in Table 1. It should be mentioned that the dimensions of coils are already optimum values [13].

Fig. 1
figure 1

The structure of the main coils in a 9.4 T superconducting MRI magnet

Table 1 Geometric and critical current density specifications of the main coils in a 9.4 T magnet, adapted from [13]

It should be mentioned that the whole magnet system is modeled in cylindrical coordination system to avoid complication of calculations in Cartesian coordination system. On the other hand, the whole DSV border is analyzed by spherical coordination system to ensure that the understudied magnet has a high level of homogeneity. Also, the excitation current must be kept constant for all the structures/configurations of the magnet and just current densities change with respect to the cross-sections of the coils. To calculate these changes, the cross section of each coil geometry is calculated and then, its ratio to the cross-section reported in [13] is calculated. After that, the calculated ratio is multiplied in current density of the main magnet reported in [13] and Table 1.

2.2 Analytical Calculation of Magnetic Field in an MRI Magnet

There are many methods to calculate the magnetic field distribution in a superconducting solenoid MRI magnet; these methods could be categorized in three major groups, which are finite element methods (FEMs), integral based methods, and using series based on Legendre functions (SBLF) methods. In this paper, the SBLF is used to calculate the magnetic field inside the Gauss lines. Figure 2 shows the geometry of a single superconducting coil with inner radius of \({a}_{1}\), outer radius of \({a}_{2}\), and \({b}_{2}-{b}_{1}\) total height of the coil.

Fig. 2
figure 2

Schematic of a single solenoid coil for a superconducting MRI magnet

By considering \(\alpha =\frac{{a}_{2}}{{a}_{1}}\) and \(\beta =\frac{{b}_{2}-{b}_{1}}{a1}\), Eq. (1) can be used to calculate the distribution of the magnetic field, in spherical coordination for field calculation point of \(\left(r,\theta , \varphi \right)\) where \(r=\sqrt{{x}^{2}+{y}^{2}+{z}^{2}}, \theta =\mathrm{arctan}\frac{y}{x}\), and \(\varphi =\mathrm{arctan}\frac{\sqrt{{x}^{2}+{y}^{2}}}{z}\) [12].

$${B}_{z}\left(r,\theta \right)={a}_{1} J\sum_{k=0}^{\infty }F{E}_{2k}\left(\alpha ,\beta \right){\left(\frac{r}{{a}_{1}}\right)}^{2k}{P}_{2k}\left(\mathrm{cos}\theta \right)$$
(1)

where \(F{E}_{2k}\) is the field error coefficient, \(J\) is the current density passing through the coil, and \({P}_{2k}\) is the Legendre function of \(2k\) order. To calculate the \(F{E}_{2k}\), the terms expressed in Eqs. (2) to (5) should be calculated [12]:

$${C}_{1}=\frac{1}{1+{\beta }^{2}}$$
(2)
$${C}_{2}=\frac{{\beta }^{2}}{1+{\beta }^{2}}$$
(3)
$${C}_{3}=\frac{{\alpha }^{2}}{{\alpha }^{2}+{\beta }^{2}}$$
(4)
$${C}_{4}=\frac{{\beta }^{2}}{{\alpha }^{2}+{\beta }^{2}}$$
(5)

After defining the \({C}_{i}\) values, \(F{E}_{2k}\) can be calculated using Eq. (6) [12] and consider \(2k=m\):

$$F{E}_{m}\left(\alpha ,\beta \right)=\frac{{\mu }_{0}}{\left(m\right)\left(m-1\right){\beta }^{m-1}}\left[{C}_{1}^\frac{3}{2}\times {f}_{m}\left({C}_{2}\right)-{C}_{3}^\frac{3}{2}{\times f}_{m}\left({C}_{4}\right)\right]$$
(6)

where \({f}_{m}\) is the polynomial function for each order and \(F{E}_{m}\) is calculated to 12th degree and the related polynomials [12] are tabulated in Table 2.

Table 2 The polynomial functions used for the calculation of the field error coefficient

3 ANN for Magnetic Field Modeling

Usually, the aim of design process, for MRI magnets, is to minimize the cost and the size of the superconducting coils while the homogeneity of magnetic field distribution in DSV border must be maximized. To gain a field distribution with a high accuracy, the design process of MRI magnets will be extremely time-consuming (usually hours to days). To address speed issues, surrogate models based on ANNs could be used in MATLAB software package. An ANN-based surrogate model uses data to characterize an extremely nonlinear behavior of a material or device by using data as inputs and outputs. In this paper, inputs are considered to be widths of coils \({{w}_{coil}=a}_{2}-{a}_{1}\), heights of coils \({{h}_{coil}=b}_{2}-{b}_{1}\), and the air-gap between the coils. The understudied magnet consists of 4 coils and so the number of inputs is 11, i.e., 4 widths and 4 heights values for each coil, and 3 gaps between coils. Different geometric design scenarios for 11 inputs are considered and simulations are conducted based on the SLBF. The results of these simulations, as shown in Fig. 3, are fed as inputs to ANN model. After receiving the inputs, they are fed into hidden layers consisting of multiple neurons, each neuron has weight factors, and bias factor to make a mathematical correlation between the inputs or outputs, as shown in Fig. 3. In a simple and single layer ANN, the relation between input layers and output layer is as shown in Eqs. (7) and (8) [14]:

$$net=\left[{w}_{1} {w}_{2}\dots {w}_{n}\right]\left[\begin{array}{c}{x}_{1}\\ {x}_{2}\\ \vdots \\ {x}_{n}\end{array}\right]+b={W}^{T}x+b$$
(7)
$$output=f({W}^{T}x+b)$$
(8)

where \({w}_{n}\) is the weight factor, \({x}_{n}\) is the inputs, \(b\) is the bias factor, and \(f\) is the activation function.

Fig. 3
figure 3

Proposed surrogate model establishing procedure: simulations, data acquisition process, and ANN model of the MRI magnet

The proposed ANN model is a data-driven model which could be updated by re-training the model when new data are fed into the system. This means that by adding new main and shimming coils, the proposed model could be adapted by imposing the new results to the model and re-train it. Thus, the ANN model could be used as a fast modeling method for the geometry, electromagnetic, and geometrical design of the superconducting MRI magnets.

The data processing using ANN model consists of three phases, train, validation, and test phases. In train phase, 70% of data were used to train ANN model, to gain a highly accurate and precise model while the computation speed must be kept as fast as possible. Next phases were validation and test, which are used to show that the proposed model is not only accurate for data in the train set but also shows an acceptable performance for any new data out of that set. The validation phase guarantees the quality of training phase and assures that ANN model shows a highly accurate performance for the trained data already available in database and can reproduce the outputs with high accuracy. The test phase is used to check and testing performance of ANN against data outside of the training phase; therefore, model performance would be tested against data which it never seen before by the ANN.

There are several controlling parameters or hyper-parameters that could change the accuracy and computation speed of the proposed surrogate model. Maximum allowable number of epochs and epsilon are the first group of controlling parameters that are used to minimize the difference between the estimated and real values. The number of epochs is the number of iterations that is allowed to solve the minimization problem, while epsilon is the maximum allowable difference between the estimated values and real values. Considering higher numbers for epochs and lower numbers for epsilon results in a highly accurate model with minimum possible error; however, higher numbers for epochs and lower numbers for epsilon drastically increase the training time and make the ANN model unrealistic for any future real-time applications. Apart from that, numbers of neurons and hidden layers could change the accuracy and training time of ANN. However, there is no guarantee that monotonic increasing the number of neurons and hidden layers results in higher accuracy for model, and there is a need for further analyses called sensitivity analysis. Another controlling parameter is ratio of training data to the number of total data, which will be further analyzed in this paper. At last, there is activation function that is categorized into two subclasses, (i) activation function of hidden layers that is used to make a mathematical correlation between different hidden layers and (ii) output objective function that is used as connection between hidden layers and outputs. Figure 4 shows different activation functions that are used in sensitivity analysis process of this paper. These functions are used to correlate the inputs to outputs by means of hidden layers and neurons and it can be seen that different pure linear and non-linear functions are used to get the highest modeling accuracy.

Fig. 4
figure 4

Activation functions used in this paper for estimating magnetic field indices

4 Results and Discussions

4.1 Data Preparation Procedure

To train the ANN-based surrogate model, different data in various ranges must be fed to as inputs the developed ANN that are shown in Fig. 5. In this figure upper and lower bounds of inner radii, outer radii, and heights of four coils are shown while the horizontal red line indicates the mean value of all different geometries for each coil. By applying these inputs to an SBLF-based method, 3 magnetic field indices can be calculated, i.e., MinBDSV, MaxBtot, and MinBtot, that are shown in Fig. 6. As a result, different magnetic field indices are calculated and used in ANN model that are shown in Fig. 7 based on their mean value and maximum/minimum values.

Fig. 5
figure 5

Distribution of different coil geometries as inputs of ANN, a inner radius, b outer radius, c height, different data are considered for each coil

Fig. 6
figure 6

Magnetic field distribution of the understudied magnet

Fig. 7
figure 7

The outputs range for 5000 different coil geometries

4.2 Error Index and Accuracy Metric to Evaluate the Proposed Model

Before presenting and analyzing ANN estimation results, it should be mentioned that two performance indices are considered in this paper to essentially compare the estimated values by ANN with the output of the analytical model. These two indices are root mean squared error (RMSE) and Pearson correlation coefficient (R), as defined in Eqs. (9) and (10) [15]:

$$RMSE=\sqrt{\sum_{k=1}^{N}\frac{{\left({t}_{k}-{e}_{k}\right)}^{2}}{N}}$$
(9)
$$R=\frac{\sum_{k=1}^{N}{(t}_{k}-\overline{t })({e}_{k}-\overline{e }) }{\sqrt{{\sum_{k=1}^{N}{(t}_{k}-\overline{t })}^{2}{\sum_{k=1}^{N}{(e}_{k}-\overline{e })}^{2}}}$$
(10)

where \({t}_{k}\) is the real value, \({e}_{k}\) is the estimated value by ANN, \(\overline{t }\) is the mean of the output of the analytical model, \(\overline{e }\) is the mean value of estimated values, and \(N\) is the number of total data [15].

4.3 Sensitivity Analysis

The very first step is to perform a sensitivity analysis for selecting the best structure and controlling parameters of the ANN-based surrogate model. Figure 8 shows RMSE and computation value for estimating different magnetic field indices when epsilon values and maximum number of epochs are changing. Here, epsilon presents the maximum allowable difference between target values and estimated values. Any reduction of this factor causes a longer training time and also a model with more accuracy. So, to select the optimum value for these parameters, two trade-offs must be considered, the first one is training time, and the second one is accuracy that could be referred as RMSE. Figure 8a, b show the RMSE and training time values of MinBDSV estimation. This index has the lowest RMSE value among other field indices, as shown in Fig. 8d, f for MinBtot and MaxBtot, respectively. As shown in RMSE related figures, epoch’s number increase could not reduce the RMSE significantly and thus around 100 epochs are selected to keep the training time as low as possible. On the other hand, epsilon increase worsens the RMSE value and reduce the accuracy of model. Therefore, to avoid any accuracy reduction, epsilon is selected to be \({10}^{-8}\). It should be noted that for this level of sensitivity analysis 15 neurons and 3 hidden layers are considered while ratio of training data to whole data is 70%.

Fig. 8
figure 8

Sensitivity analysis on the ANN parameters, i.e., numbers of epochs and value of epsilon to find the best performance of ANN model, a training time of MinBDSV, b RMSE of MinBDSV, c time of MinBtot, d RMSE of MinBtot, e time of MaxBtot, f RMSE of MaxBtot

The next step is to select the optimal numbers of neurons and hidden layers in ANN aiming for the highest possible estimation accuracy while number of epochs were selected previously to be 100 and epsilon value was \({10}^{-8}\) also ratio of training data to whole data is 70%.

Figure 9a, b illustrate the training time and RMSE for MinBtot estimation procedure while number of neurons are 1, 5, 10, and 15 and 1 to 5 different hidden layers (nH) are considered for each neuron number scenarios. A similar trend about training time and accuracy can be observed for MaxBtot and MinBDSV, in Fig. 9b to f, respectively. Eventually after considering the results of sensitivity analysis shown in Fig. 9, the numbers of neurons and hidden layers are selected to be 15 and 3, respectively. In this structure, 15 neuron numbers have the highest accuracy for all cases while increasing the hidden layer to be more than 3 increases the training time to be around a minute while RMSE stays almost intact. Thus, it is a waste of time to increase the number of hidden layer to any number more than 3.

Fig. 9
figure 9

Sensitivity analysis on the structure of ANN to find the optimal number of hidden layers (nH) and neurons of ANN model, a training time of MinBtot, b RMSE of MinBtot, c time of MaxBtot, d RMSE of MaxBtot, e time of MinBDSV, f RMSE of MinBDSV

In addition, the impact of ratio of training data to the number of total data is studied and the results are listed in Table 3 where 15 neurons and 3 hidden layers were selected to form the structure of ANN model while 100 epochs and \({10}^{-8}\) value for epsilon were considered. As shown in Table 3, the best performance of ANN model is achieved when training ratio is 70% and other training ratios for training reduce the accuracy. Therefore, in this paper, we have selected 70% total data to be used in training phase of ANN model for all field indices.

Table 3 The impact of ratio of training data to all data on the accuracy of ANN model in train, test, and validation phases

Last step of sensitivity analysis is to study the impact of type of activation functions on the performance of the ANN-based model that is presented in Table 4. Different combinations of activation functions are considered, so that they result in highest Pearson correlation coefficient, lowest RMSE, etc. For MinBDSV, the best performance is related to the satlin/tansig group of activation functions, and for MinBtot the best performance is related to the satlin/purelin and at last, purelin/tansig is the best activation function for estimation of MaxBtot. Satlin/tansig combination of activation functions is selected as the activation function that has an acceptable performance for all the magnetic field indices.

Table 4 Sensitivity analysis on type of activation function in ANN model

After analyzing and evaluating all controlling parameters evolving in ANN-based estimation of magnetic field, an optimal ANN structure is selected with the properties tabulated in Table 5.

Table 5 The final parameters and optimal structure of the proposed ANN model

4.4 Magnetic Field Estimation by ANN

After sensitivity analysis, MinBtot, MaxBtot, and MinBDSV are estimated by using the proposed optimal structure of ANN. Figure 10 illustrates the Pearson correlation coefficient for the estimations, separately for each one of the magnetic field values. As can be seen in this figure, most of the data points are located on the y = x regression line. This line is an index that shows how well the estimation have been accomplished, i.e., the more the data lays on this line, the less the error is. There are some violations from y = x regression line, in Fig. 10a, b that is due to the higher order of non-linearity of magnetic field in regions out of DSV.

Fig. 10
figure 10

The Pearson correlation coefficient for the magnetic field estimation by ANN in test phase, a MinBtot, b MaxBtot, c MinBDSV

The values of RMSE, R, and training time for the estimation of three magnetic field indices are reported in Table 6, for train, validation, and test phases of ANN model. As can be seen, it takes around 16 to 17 s to calculate the magnetic field for different coil geometries using proposed ANN model. For calculating magnetic fields for these numbers of different geometries by using Legendre series method, around 97,850 s is required that means more than 27 h. On the other hand, if we just consider 750 data points that were used in test phase of the ANN model, LSM require 14,805 s or more than 4 h. As can be seen, ANN model is extremely faster than LSM which is already known to be faster in comparison to FEMs. The calculation of magnetic fields for these numbers of different geometries using FEM would take up to weeks. The fast nature of estimation of the proposed ANN model is very useful when a superconducting MRI magnet is going to be designed using a design optimizer program. Under such circumstances, magnetic field would need to be calculated many times under different scenarios usually using an iterative simulation in FEM-based software package such as COMSOL Multiphysics. By using LSM or FEMs, this process could take hours to days while by training and developing a surrogate ANN-based model this could be done in less than 15 min.

Table 6 Magnetic field estimation for understudied MRI magnet using proposed ANN model

4.5 Stability Analysis of the Estimation Results of the Proposed ANN Model

The next step is to analyze the stability, reproducibility, and repeatability of the estimated results. For this purpose, the ANN model for the estimation of MinBDSV was run for 100 times and in each run, RMSE, R, and training time were stored and analyzed. Table 7 tabulates the mean and standard deviation values for R, RMSE, and training time to estimate the magnetic field indices. For MinBtot, the reported mean values are very close to the results that were previously reported in this paper, while the standard deviations have a low value that shows the high stability of results. The same is also valid for MinBDSV that has low values of standard deviation. On the other hand, MaxBtot seems a bit more unstable in comparison to other magnetic field indices. This originates in a higher level of nonlinearity about the maximum values if magnetic field, as shown previously in Fig. 6.

Table 7 Results of stability analysis of the proposed surrogate model for field estimation

5 Conclusion

Regardless of using finite element-based methods or Legendre series methods (LSMs), design of magnetic resonance imaging (MRI) superconducting magnets is a time-consuming process. This paper has proposed a surrogate model based on multi-layer artificial neural networks (ANNs) to reduce the computation time of design process in MRI magnets. This model is based on feeding 5000 different coil geometries of a 9.4 T NbTi magnet as inputs into an ANN model while the outputs are minimum magnetic field in diameter of spherical volume (DSV), maximum and minimum magnetic fields at the whole field calculation region. The most important findings of this paper are:

  • The ANN model is capable of estimating the magnetic field values with a high accuracy and extremely low root mean squared error (RMSE) (i.e., more than 99.999% Pearson correlation coefficient and RMSE less than 0.005).

  • The training time for the estimation of magnetic fields for data points (i.e., different geometries) is less than 17 s while the same value for LSM is more than 4 h and the same thing for FEM is more than days.

  • Estimations were repeated 100 times to show the stability of results. More than 96% of data for stability analysis have the Pearson correlation coefficient around 99.999% while the remaining ones have at least 99.988%.