Introduction

Submerged arc furnaces (SAF) are metal-producing units in which the energy needed for the primary reactions are delivered by electric currents through large electrodes. These type of furnaces are used for example in the production of steel, ferroalloys, calcium carbide and silicon.[1,2] Typically, SAFs operate at the grid frequency of 50 or \(60\,{\text{Hz}}\) with three-phase alternating current (AC) and three or more electrodes.

Although SAFs are robust and reliable, cost-effective operations require correct and stable electrical conditions.[3,4] For example, it is important to understand and control current densities, thermal conditions and mechanical stresses to prevent problems such as electrode breakages and to minimize electrode consumption.[5,6,7] Furthermore, the design of a new furnace requires a good understanding of the current paths to deliver energy in the required zones for optimal operations.[8,9,10,11]

Provided that some iron will not contaminate the product, large furnaces utilizes self-backing Søderberg electrodes with diameters up to 2 m. Normally, SAFs operate at comparatively low voltages (100–150 V) but with high currents that can exceed 130 kA. These conditions produce strong electromagnetic fields that cause induced eddy currents in the conductive materials in the furnace. The induced eddy currents can have high intensity and effectively modify the distribution of the current. One normally identifies two distinct effects: (a) the skin effect, that causes the currents to accumulate near the surface of conductors, and (b) the proximity effect, that induces currents in surrounding conductors.[2] The proximity effects typically yield an asymmetric current density in the electrodes.[12,13,14] Although these effects are significant and well-known,[15,16,17,18,19] they are often neglected in electrode models which typically concentrate on a single electrode.[7,20,21,22] To further complicate the picture, furnaces have a (magnetic) steel shell enclosing a conductive lining. This determines that there may be strong eddy currents induced in the periphery of the furnace.[23] When present, these shell currents may also modify the currents of the electrode, effectively creating electrode-shell proximity effects.[24]

Review papers about two-dimensional (2D) analytical models of skin and proximity effects, together with comparison papers that considered three-dimensional (3D) case studies of large industrial furnaces[25] and that conducted 3D simulations of ferromanganese and ferrosilicon furnaces[24] concluded that while 2D analytic solutions have the merit to capture part of the effects in play, 3D simulations are recommended to gain proper insight into the current distribution within the electrodes in large industrial furnaces.[25]

The value of 3D models for SAFs have been discussed also by other authors: for example, Dhainaut applied a DC model to investigate current and power distribution for the point in time where the current enter one electrode and distribute evenly to the other two.[26] Halvorsen et al. has shown that the results from two DC computation can be combined to find the condition for any point in time during an AC cycle or the time averaged power and current distribution. This approach is valid as long as electromagnetic induction can be neglected.[27]

Moreover Bezuidenhout et al. developed a 3D computational fluid dynamic model to investigate the internal dynamics of an electrical furnace as used for the smelting of Platinum Group Metal concentrates.[28] Toh et al. used Maxwell’s equations in connection with a Finite Volume Method approach to model steel-making process.[29] Darmana et al. developed a Multiphysic model (thermodynamics, electricity, hydrodynamics, heat radiation and chemical reactions) for a SAF.[30] More recently, Tesfahunegn et al. simulated the effects of electrode shape, pitch circle diameter and frequency on the current distribution in submerged arc furnaces for silicon production.[31,32,33,34,35,36]

In short, physics-based models as these presented in Reference 24 are capable to accurately estimate the induced currents in the steel shell, the alternating current distributions in the material layers, and the active and reactive power densities within the furnace. However, the explicit solution of the equations on which these model are based upon are generally very demanding in terms of time and computational resources. This makes the physics-based model difficult to employ during control operations and in fast prototyping.

To obviate these inconveniences a possible attempt is to develop a multivariate metamodel of the physics-based model, i.e., a statistical model of the input–output behavioral repertoire of the physics-based model. In this context, a metamodel is an approximation of an original physics-based model obtained by applying a suitable statistical analyses on a database generated with the original model. Ideally the metamodel should retain the same generalization capabilities as the original model, while being computationally lightweight.

A sufficiently informative database requires that the input space of the physics-based model is spanned properly: in other words the training data must contain information about how the original model’s set of outputs changes when its main set of input parameters change, both individually and in combination, over a relevant, sufficiently wide range of parameter values. The database should capture moreover these changes in sufficient detail. This means that the computations should be performed at many relevant input conditions (different combinations of its parameters, initial conditions and computational controls) according to a cost-effective statistical design. Also the output space must be properly spanned: All relevant simulation outputs from the physics-based model are to be stored.

In this paper we consider linear multivariate metamodels, i.e., metamodeling through sets of multivariate linear subspace equations linking input parameters to a output space or vice versa.

For this we employ the Partial Least Squares (PLS) regression approach.[37,38] The PLS regression (PLSR) relates one or more regressand variables \({\varvec{Y}} = [(y_1, \ldots , y_M)]\) to a (large) set of regressor variables \({\varvec{X}} = \left[ x_1, \ldots , x_K \right] \) in data from a set of N objects, conditions or time points. For real-world data, it is not known beforehand which of the many \({\varvec{X}}\)-variables are important. Moreover, both the \({\varvec{X}}\)- and the \({\varvec{Y}}\)-variables are often both noisy and strongly intercorrelated.

The PLS regression approach transforms the intercorrelation patterns among the \({\varvec{X}}\)-variables from being a problem (“multi-collinearity”) to being an advantage (modeling stabilization and graphical insight). This is attained by basing the \({\varvec{X}} \mapsto {\varvec{Y}}\) regression problem on a small set of automatically identified \({\varvec{Y}}\)-relevant linear combinations of the \({\varvec{X}}\)-variables (“weighted averages” or “PLS components, PCs”) that have maximum covariance with a corresponding set of linear \({\varvec{Y}}\)-variable combinations. Cross-validation is used for finding the optimal number of such PCs in order to guard against over-fitting.

We thus consider here PLSR given its ability to discover, in multivariate regression problems, fundamental relations between input and output variables in terms of eigenstructures in the covariances that exist between these input/output variables in the simulation-based training data. Later, the obtained PLSR models may be used for predicting, e.g., outputs from new, hitherto untested input combinations by multivariate interpolation. We note that the conversion of a chosen physics-based model into its multivariate metamodel implies converting the former mathematical description of the expected input/output behavior – that is formal, implicit, and thus complicated to interpret – into a simpler relation under the same set of conditions. The direction Outputs = f(Inputs) represents “classical” or “direct” multivariate metamodeling. It is normally fairly straight forward, at least for deterministic physics-based models. (Physics-based models with stochastic elements may of course give precision problems in the output predictions.) The opposite direction, Inputs = f(Outputs) represents “indirect” multivariate metamodeling. It works well when the physics-based model has a unique, one-to-one mapping between the profile of input states and the profile of output states. But if many different input profiles give the same output profile, then an overall indirect metamodel cannot be expected to give correct predictions of the inputs from the outputs; additional input variables is then required.

Summarizing, the purpose of this paper is thus to investigate: (1) how to metamodel physics-based models of SAFs through PLS regression approaches; (2) which capabilities and limits such linear PLS regression approaches have in approximating the original model; (3) which computational advantages are brought through introducing the PLS regression schemes.

To serve the purpose of the manuscript, i.e., to investigate how to metamodel SAFs, plus the capabilities and limits of such metamodels, we organized the text as follows: Section II introduces a Finite Element Method (FEM) model of a SAF; Section III discuss the possibility of constructing both direct and inverse metamodels; Section IV describes the basis of the Partial Least Square Regression techniques; Section V analyzes the statistical performance of the overall PLS regression approach to metamodeling; Section VI concludes the manuscript by summarizing the main findings and by highlighting future investigation directions.

A Finite Element Method Model of a SAF

The present work focuses on a specific AC electromagnetic model of a 41 MW ferromanganese (FeMn) furnace. The main purpose of such a FEM is to analyze power density distributions in the various material layers of the furnace. The model is a derivation of what has been presented in Reference 24. In collaboration with representatives of the ferroalloy industry, we have conducted several case studies in which the results from comparable FEM simulations were found in good agreement with furnace observations. This includes electrode resistance, furnace active power, the formation of hot spots on the steel shell in regions where the simulations show high concentration of induced currents, and general trends in the magnetic fields outside the shell, as observed in a measurement campaign.[39]

The model (shown in Figure 1) represents a typical large FeMn furnace with a cylindrical shape (diameter approx.  15 m) and three large (1.9 m in diameter) electrodes arranged in an equilateral triangle. Around the tip of each electrode there are zones rich in carbon called coke beds (bell-shaped in the model). Below there are layers of slag (molten oxides) and of molten metal alloys. The core of the furnace is encased in carbon lining, oxide lining and a steel shell. To complete the physical system, the model includes a steel roof.

Fig. 1
figure 1

FEM model. Left panel: 3D rendering of the furnace where charge and other sections are hidden to reveal the coke beds (orange and green), two electrodes (dark gray), the metal pool (white), the furnace linings (light and dark purple) and the steel shell (silver). Right panel: 2D slice showing a typical power dissipation density (Color figure online)

The model equations and boundary conditions were implemented and solved numerically using COMSOL 5.5 with the Magnetic and Electric Fields interface.[40] The low-frequency Maxwell’s equations are solved using the magnetic vector potential and the scalar electric potential formulation. The equations are then discretized with the finite element method using quadratic elements. To impose proper boundary conditions on the magnetic field, a large volume of air (with a diameter that is 5 times larger than the furnace diameter) is included around the furnace.

The effects of transformers, busbars, power lines and other structural elements that can strongly impact the electrical conditions in real furnaces, are not accounted for in this model.

The total current is given for each of the electrode and an appropriate phase shift is imposed to ensure current conservation. The induced currents (and the associated impedance) of the steel shell and roof have been modeled as surface currents, applying COMSOL’s impedance boundary condition.

In the definition of the model a set of 12 variables were selected to perform parametric studies. The selected variables are geometrical and electrical (currents and conductivities), and are listed in Table I. Briefly, the parameters of the FEM control the electrical conditions for each of the electrode by defining the amount of current in each electrode (named \({{I}}_x\)), the distance of the electrode tip from the alloy bath (named \({{z}}_x\)), and the shape (\({{S}}_x\)) and conductivity (\(\sigma _1\)) of the coke bed.

Figure 2 gives a graphical representation of the parameters for the Electrode 1—Coke Bed 1 system. Furthermore, one additional “global” parameter, \(\sigma _{\text{Ch}}\), defines the conductivity of the charge directly above the coke beds.

Table I Input Parameters of the FEM Model
Fig. 2
figure 2

Graphical representation of the parameters pertaining Electrode 1

We note that not all parameters of Table I are independent; \({{I}}_{\text{AV}}\) (average of the three rms currents), \(Phi_{21}\), \(Phi_{32}\), \(Phi_{13}\) (Phase angle between pair of currents) can be computed from \({{I}}_1\), \({{I}}_2\), \({{I}}_3\). This lack of independence should not cause problems in the PLSR subspace model, except for some ambiguity in graphical interpretation of causality. Table II lists a chosen set of observables that can be extracted after each FEM calculation. The solution of the FEM equations give access to the total power dissipation density \(p (W/m^3)\) and the reactive power density \(q (W/m^3)\), computed as

$$\begin{aligned} q=(\pi f/\mu )|{\mathbf{B}}|^2 \end{aligned}$$
(1)

where f is the current frequency, \(\mu \) is the magnetic permeability and \({\mathbf{B}}\) is the magnetic field. By integrating p and q over different material domains and the entire furnace, it is possible to study power consumption in different material layers and the total power consumption and power factor for the full furnace.

To further subdivide the terms among the electrodes the furnace was divided in 3 cylindrical sectors with \(120 \, {\text{deg}}\) opening with the planes defining the sectors equally splitting the space between each electrode pairs. From this definition it follows, for instance, that The total active power equals the sum of the active power associated to electrode 1, 2 and 3, respectively (Eq. [2a]) Resistances and reactances for each electrode are obtained dividing active and reactive powers by the square of the average current (Eqs. [2b] and [2c]). In turns these quantities give access to the voltage of each electrode calculated by multiplying the electrode current by the electrode impedance (Eq. [2d]).

$$ \text {AP}_{\text {Tot}} = \text {AP}_1 + \text {AP}_2 + \text {AP}_3 $$
(2a)
$$ {R}_{\text {1}} = \text {AP}_1 / {I}_{\text {AV}}^2 $$
(2b)
$$ {X}_{\text {1}} = \text {RP}_1 / {I}_{\text {AV}}^2 $$
(2c)
$$ {V}_1= {I}_1 {Z}_1 = {I}_1 \sqrt{ {R}_{\text {1}}^2 + {X}_{\text {1}}^2} $$
(2d)

Volumes are finally obtained by integration over the corresponding domains.

Table II Observables of the FEM Model

Design of Experiments

In this context, the purpose of the FEM model is to identify how variations in the parameters listed in Table I affect the observables listed in Table II. In this way the FEM model acts as a map transforming inputs (the parameters) into outputs (the observables).

We recall that to metamodel the FEM model we shall collect a database of input–output data from the original FEM model, and create a model of the model by statistically analyzing this database. This raises the natural question of how to design the “experiments”: namely, how which input variables to vary, at which levels, and in which combinations.

In the present case 12 of the input parameters in Table I were deemed to be the most relevant; all other model parameters, initial conditions and computational control were kept constant at values deemed “safe, realistic but uninteresting”.

As for how to explore the inputs space, we recall that fixing the number of samples introduces a trade-off between how many values we make each input variable assume, and the number of combined effects that can be explored. To make a rigorous choice we follow an optimized multi–level binary replacement (OMBR) design.[41]

For each of the 12 chosen input parameters, a reasonable upper and lower value was defined. In order to detect and describe curvature and other, more complex input/output relationships, it was decided to also include two intermediate levels of each parameter; each variable thus assumes one of four possible values from an equidistant grid (for example, the conductivity of the charge was set to {12, 19, 26, 33} S/m).

Note that even with this relatively simple setup of 12 parameters and 4 levels of each, exploring all the combined effects (i.e., simulating all the potential combinations of the variables) would lead to \(4^{12}\) =16777216 FEM calculations. In practice, we limited the number of simulations to 640, considering that this number may reasonably mimic the choice that a practitioner would do in real-life settings given the computational complexity associated to each sample generation.Footnote 1

Extending the Databases Through Exploiting Symmetry Considerations

We note that by construction the FEM model is symmetric, it has a rotational symmetry of order 3 with respect to the central vertical axis. This means that a rotation by an angle of \(120 \, {\text{deg}}\) does not change the model.

This feature may be included in the metamodel in two ways: either opportunely constraining the parameters of the model so to guarantee the symmetries above, or by extending the original database with augmented data that balance the information. More precisely, as an example, assume to have access to the results of a simulation where A, B, and C are the inputs pertaining to Electrode 1, 2 and 3 respectively, and D are the inputs for the entire system (e.g., A represents the values for \({{I}}_1\), \({{z}}_1\),\(\sigma _1\) and \({{S}}_1\), whereas D contains \(\sigma _{\text{Ch}}\)) and similarly, \(\alpha \), \(\beta \), \(\gamma \) are the outputs for Electrode 1, 2 and 3 and \(\delta \) are output for the entire system). In other words, assume the simulation to represent the “inputs outputs” transformation

$$ \left( A, B, C, D \right) \mathop\rightarrow\limits^{\text {FEM}} \left( \alpha , \beta , \gamma , \delta \right) . $$
(3)

Then the symmetry considerations above imply that we may immediately extend the original dataset through adding the transformations

$$ \left( C, A, B, D \right) \mathop\rightarrow\limits^{\text {SYM}} \left( \gamma , \alpha , \beta , \delta \right) $$
(4a)
$$ \left( B, C, A, D \right) \mathop\rightarrow\limits^{\text {SYM}} \left( \beta , \gamma , \alpha , \delta \right) $$
(4b)

In practice, this means that the 640 FEM simulations considered above actually lead to a database composed by 1920 entries.

We note that mirroring along three \(\sigma _v\) vertical planes corresponding to the electrodes can not be used in this context because the order of the current phases will be inverted in mirror images.

Metamodeling, in Generic Terms

In Section 2, we outlined how the original Finite Element Model was used to obtain a database where the input parameters of the FEM model are collected in a \(1920 \times 17\) matrix (\( {\text{Inputs}} \)) and the results are in a corresponding \(1920 \times 37\) matrix (\( {\text{Outputs}} \)). Notation-wise, this can be represented as a relation

$$ {\text{Outputs}} = {\text{FEM}}({\text{Inputs}}) $$
(5)

The term metamodeling, sometimes referred as ‘surrogate modeling’, is in this context considerable as a re-modeling of the original model. This operation is in practice usable to: (i) perform a model complexity reduction, (ii) perform a sensitivity analysis, iii) compare different competing models.

As mentioned, there exist two types of metamodels:

(i) direct (or classical) metamodel, where one searches for an approximated map, indicated with \({{\mathbf{D}}}{\mathbf{M}}\), that captures a relation of the kind

$$ {\text{Outputs}} \approx {\mathbf{D}}{\mathbf{M}}({\text{Inputs}}) $$
(6)

Here the term Direct highlights that the metamodel has the same causal direction of the original FEM model;

(ii) inverse metamodel, where one searches for an approximated map, called \({\mathbf{I}}{\mathbf{M}}\), that reverses the causal direction above. This operation inverts the original model by searching \({\mathbf{I}}{\mathbf{M}}\) so that

$$ {\text{Inputs}} \approx {\mathbf{I}}{\mathbf{M}}({\text{Outputs}}) $$
(7)

and is typically useful when one wants to estimate what has produced certain outputs - in other words, construct observers of the internal state or unknown inputs of a real plant. For example, estimate the actual position of the electrodes’ tips given measurements that are available from the plant SCADA system. Both \({\mathbf{D}}{\mathbf{M}}\) and \({\mathbf{I}}{\mathbf{M}}\) can work well for deterministic FEM models. But the \({\mathbf{I}}{\mathbf{M}}\) may give bad models when several different input combinations to the FEM model give the same outputs.

For a more thorough discussion on the merits of Direct and Inverse Metamodeling we refer to Reference 42. We, however, here recall that both maps \({\mathbf{D}}{\mathbf{M}}\) and \({\mathbf{I}}{\mathbf{M}}\) in general work as estimators. There may thus be different strategies for obtaining them, and to each of these is associated an opportune strategy-dependent statistical performance.

Using Partial Least Square Regression to Construct Direct and Inverse Metamodels

Linear regression models are likely the most widely used statistical structures in practical settings. In the simplest case, these structures are so that the outputs are computed through linear combinations of the inputs, i.e., through structures of the kind

$$ {\varvec{Y}} = {\varvec{b}}_0 + {\varvec{X}} {\varvec{B}} + {\varvec{F}} $$
(8)

where: \({\varvec{Y}}\) (N x J) and \({\varvec{X}}\) (N x K) are the outputs and the inputs, respectively; \({\varvec{b}}_0\) (1 x J) and \({\varvec{B}}\) (K x J) are the coefficients of the affine map \(\mathbf{D} \mathbf{M }\), and \({\varvec{F}}\) (N x J) is an opportune residual matrix that captures the modeled residuals in \({\varvec{Y}}\). In the present case of deterministic simulations, the unmodeled residuals \({\varvec{F}}\) are mainly due to the fact that the map (5), i.e., \({\varvec{Y}} = \mathbf{FEM} ( {\varvec{X}} )\), is nonlinear.

The parameters \({\varvec{b}}_0\) and \({\varvec{B}}\) in the regression model usually have to be estimated from a training set consisting of \({\varvec{x}}_k\) variables (\(k = 1, \ldots , K\)), \({\varvec{y}}_j\) variables (\(j = 1, \ldots , J\)), and a finite number of training samples, N. In other words in the most basic statistical learning setting the training set consists of an inputs matrix \({\varvec{X}}_{\text {train}}\) of dimensions \(N \times K\), and an outputs matrix \({\varvec{Y}}_{\text {train}}\) of dimensions \(N \times J\).

Later, for each new \({\varvec{X}}\)-vector (\(1 \times K\)), the corresponding \({\varvec{Y}}\)-vector (\(1 \times J\)) may then be predicted from these estimates of \({\varvec{b}}_0\) and \({\varvec{B}}\).

If the K \({\varvec{X}}\)-variables vary independently of each other over the N observations in the training set, or only display weak intercorrelations, then the parameters \({\varvec{b}}_0\) and \({\varvec{B}}\) may be estimated by ordinary least squares (OLS) regression, i.e., simple projection of \({\varvec{Y}}\) on \({\varvec{X}}\) over the N training samples:

$$ {[}{\varvec{b}}_0, {\varvec{B}}] = ([1,{\varvec{X}}]^T[1,{\varvec{X}}])^{-1} [1,{\varvec{X}}]^T {\varvec{Y}}. $$
(9)

But quite often, the \({\varvec{X}}\)-variables have strong intercorrelations over the N observations in the training set. This is almost always true if the \({\varvec{X}}\)-variables come from one given multichannel instrument (e.g., a spectrophotometer or a camera). But it may also be the case if the \({\varvec{X}}\)-variables come from factorial design, but some of the planned experiments or simulations had to be dropped, for whatever reason. Then this full-rank projection of \({\varvec{Y}}\) on \({\varvec{X}}\) does not give good results, due to collinearity problems: The matrix \(([1,{\varvec{X}}]^T[1,{\varvec{X}}])\) simply cannot be inverted in a stable way. This, in turn, can give erroneous and meaningless estimates \([{\varvec{b}}_0, {\varvec{B}}]\) and a needless amplification of noise in future predictions.

The collinearity problem can be solved with a wide range of regression methods, ranging from variable selection methods via ridge /LASSO regression, Elastic Nets to reduced-rank subspace methods like Principal Component Regression (PCR) and Partial Least Square Regression (PLSR).[37] The latter is used here, due to its ability to give compact subspace regression models under a wide range of collinearity conditions.

The PLSR searches for an orthogonal sequence of PCs, where each PC is the linear combination of \({\varvec{X}}\)-variables that show maximum covariance with a linear combination of the previously unmodeled residuals in the \({\varvec{Y}}\)-variables. This gives compact statistical models that are easy to plot graphically. This covariance maximization is sensitive to the relative scaling within the set of \({\varvec{X}}\)-variables and the set of \({\varvec{Y}}\)-variables. Here we therefore employ the common practice of standardizing both \({\varvec{X}}\) and \({\varvec{Y}}\) i.e., rescaling each input variable to a standard deviation of 1, in order to remove the effect of which unit the individual variables are given in. For a recent review on PLS regression we refer to Reference 43.

As for an outline of the method, we recall that the models obtained through a PLS regression may be summarized by the above-mentioned linear model \({\varvec{Y}} = {\varvec{b}}_{0} + {\varvec{X}} {\varvec{B}} + {\varvec{F}}\). However, the estimation actual structure of the underlying model identified through a PLS regression approach is given by

$$\begin{aligned} {\varvec{X}}&= 1*{\varvec{x}}_{\text {mean}} + {\varvec{T}} {\varvec{P}}^T + {\varvec{E}} \nonumber \\&= 1* {\varvec{x}}_{\text {mean}} + \sum _{a = 1}^{A} {\varvec{t_a}} {\varvec{p_a^T}} + {\varvec{E}} \end{aligned}$$
(10a)
$$\begin{aligned} {\varvec{Y}}&= 1*{\varvec{y}}_{\text {mean}} + {\varvec{T}} {\varvec{Q}}^T + {\varvec{F}} \nonumber \\&= 1* {\varvec{y}}_{\text {mean}} + \sum _{a = 1}^{A} {\varvec{t_a}} {\varvec{q_a^T}} + {\varvec{F}} \end{aligned}$$
(10b)

and where

$$ {\varvec{T}} = \left( {\varvec{X}} - 1 * {\varvec{x}}_{\text {mean}} \right) {\varvec{V}} . $$
(11)

More precisely,

  • \(a = 1,\ldots ,A\) is the model rank for \({\varvec{X}}\) (i.e., the number of linear combinations of the \({\varvec{X}}\)-variables to be used as regressors for both Y and \({\varvec{X}}\);

  • \({\varvec{T}}\) (of dimension \(N \times A\)) is the orthogonal set of A “\({\varvec{X}}\) supervariables” (i.e., linear combinations of the \({\varvec{X}}\)-variables;

  • \({\varvec{t_a}}\) (of dimension Nx1) is equal to \(\left( {\varvec{X}} -{\varvec{X}}_{\text {mean}} \right) {\varvec{v_{a}}}\), where \({\varvec{v_a}}\) (of dimension \(K \times 1\)) are weights defining \({\varvec{t_a}}\) so that it has maximal covariance with a corresponding auxiliary linear combination of the \({\varvec{Y}}\)-variables;

  • \({\varvec{V}}\) is the \({\varvec{X}}\)-weight matrix (K x A);

  • \({\varvec{P}}\) is the \({\varvec{X}}\)-loadings matrix (K x A);

  • \({\varvec{Q}}\) is the \({\varvec{Y}}\)-loadings matrix (J x A);

  • \({\varvec{x}}_{\text {mean}}\) ( 1 x K) is the mean of \({\varvec{X}}\) over the N observations;

  • and \(\ {\varvec{y}}_{\text {mean}}\) (1 x J) is the mean of \({\varvec{Y}}\) over the N observations.

In practice the matrices \(\ {\varvec{x}}_{\text {mean}}\), \(\ {\varvec{y}}_{\text {mean}}\),\({\varvec{T}}\), \({\varvec{P}}\), \({\varvec{U}}\) and \({\varvec{Q}}\) are estimated through one of the many equivalent learning algorithms leading to the PLS regression solution from the training set \({\varvec{X}}_{\text {train}}, {\varvec{Y}}_{\text {train}}\).

Usually, one first chooses to estimates “more than enough” model dimensions, e.g., AMax=min(K,N-1). This solution is usually over-optimistic in the sense that it gives too good prediction of \({\varvec{Y}}\) from \({\varvec{X}}\) when the training data are used, but too bad prediction of \({\varvec{Y}}\) from \({\varvec{X}}\) in new, “secret” samples.

Then, one searches for a simpler model within this over-fitted AMax-dimensional subspace model. To find the model rank A that gives minimal prediction error in Y, one usually employs so-called cross-validation.[38] Thereby one avoids confusing results and bad predictive ability due to unwarranted over-optimism.

In this case for which, usually, \({A}\ll{B}\), the resulting final model uses only the first A of the AMax column vectors in \({\varvec{V}}\),\({\varvec{T}}\),\({\varvec{P}}\) and \({\varvec{Q}}\). This means that all the components that are subsequent of the optimal rank, A, are ignored in the model. This corresponding “optimal rank model” may then be summarized into \({\varvec{b}}_0\) and \({\varvec{B}}\) in Eq. [8]. We note that some slightly different, but equivalent PLSR algorithm versions exist, each with their pros and cons. For more information we send the interested reader back to Reference 42.

The linear model summary \({\varvec{b}}_0\) and \({\varvec{B}}\) at optimal rank can then be used to estimate which outputs \(\widehat{{\varvec{Y}}}_{\text {new}}\) would be obtained if we were having, as inputs, the values \({\varvec{X}}_{\text {new}}\). Alternatively, the same prediction results are attained by going via the bilinear model.

Statistically speaking, the hope is that \({\varvec{b}}_0\) and \({\varvec{B}}\) will be so that the estimated \(\widehat{{\varvec{Y}}}_{\text {new}}\) will be close to the actual \({\varvec{Y}}_{\text {new}}\) that would be computed through the actual FEM model in Section II.

When following the PLS regression approach, then, \({\varvec{b}}_0\) and \({\varvec{B}}\) are chosen as follows: intuitively, each row in \({\varvec{P}}\) and \({\varvec{Q}}\) can be thought as a direction in the row spaces of \({\varvec{X}}\) and \({\varvec{Y}}\). Then to follow the PLSR approach means to find \({\varvec{P}}\) and \({\varvec{Q}}\) as that directions in the row spaces of \({\varvec{X}}\) and \({\varvec{Y}}\) that maximally capture the variance of the \({\varvec{X}}\)- and \({\varvec{Y}}\)-data simultaneously (i.e., their covariance).[44]

Note that the PLSR approach can be used for both direct and inverse metamodeling purposes. More precisely, doing direct metamodeling corresponds to setting \({\varvec{X}} = {\text{Inputs}} \) and \({\varvec{Y}} = {\text{Outputs}} \), whereas doing inverse metamodeling means setting \({\varvec{X}} = {\text{Outputs}} \) and \({\varvec{Y}} = {\text{Inputs}} \). From logical perspectives, though, the PLSR operations are the same.

Numerical Results

As mentioned before, we consider a situation for which direct and inverse metamodels are built on a database constructed by symmetrizing 640 FEM simulations (thus a final database of 1920 entries). The corresponding PLS regression models were computed using Unscrambler X,[45] and the following results about the statistical performance of the metamodels have been obtained considering a leave-one-out cross-validation strategy.[46]

Assessing the Statistical Performance of the Direct Metamodel

The obtained results suggested an optimal number of factors for the PLS regression model of 9; corresponding to this level of complexity of the model we obtained a root mean squared error (rmse) about testing the different output quantities, as shown in Table III.

Table III Database Characteristics for the Properties of Electrode \(1^a\) and Root Mean Squared Error (rmse) for the Direct PLS Regression Model

For comparison, the PLS regression modeling results show that the rmse for the active and reactive power of Electrode 1 are 1.1 MW and 0.17 MVAr, respectively. Since the two quantities have the same order of magnitude, the rmse clearly indicates a better performance of the model for the reactive power. One possible explanation for this is the spread of the underlying database. Taking as example Electrode 1, Table III shows that the entries in the database are much more centered around the relevant values, (cfr. the standard deviation, and the Min–Max spread), whereas the Active Power terms are largely spread. It is likely that an improved strategy on the selection of the database points may improve the performances of the metamodel.

While the above results are general metamodel properties, we now consider the ability of the metamodel in simulating two specific study cases that, to the best of our knowledge, have important relevance for the management of SAF systems:

case 1::

the system is balanced, in the sense that all the electrodes are behaving equivalently (case1);

case 2::

the three electrodes have different conditions, both in their associated coke bed shapes and properties, and in the currents flowing through them.

For completion, we report in Table IV the input parameters associated to the FEM simulations that the direct metamodel should approximate.

Table IV Configuration of the Most Important Inputs Associated to the Cases Above

Table V compares then the results obtained with an explicit FEM simulation for case 1 and case 2, and the ones obtained through the direct metamodel. The comparisons show that the direct metamodel overestimates the active powers and the resistance by 3 to 4 pct, whereas the reactive powers and the reactances are very close to the reference values. The volume of the coke beds are overestimated in all cases.

Table V Comparing the Results of the FEM Model and the Corresponding Direct Metamodel

On the other hand, Reactive powers, reactances and induced powers in the steel structures are predicted with high accuracy.

Fig. 3
figure 3

Active Power on Electrode 1 as function on the Electrode position: FEM vs metamodel results

Finally, we investigated how the metamodel compares to the FEM target for a linear scan of one of the input parameters. In Figure 3 the active power computed for Electrode 1 is plotted as function of the Electrode 1 tip position, with the other input parameters matching case 2 presented above. The scan shows that, as the electrode is lifted 50 cm, the power increase from 9.4 to 12.9 MW as consequence of the increase resistance encountered by the electric current. The value computed with the direct metamodel slightly overestimated the reference values as seen already in Table VII, but it is noteworthy that the error remain constant at ca. 0.65 MW for a large section of the span before decreasing to 0.35 MW.

These results show that even if the absolute values computed with the metamodel may have significant errors, the trends are robust and reliable. The reason for the slight overestimation can be tracked to the face that there is a slight curvature in the prediction vs true Power plot as can be seen in Figure 4. Here, each point represents one input parameter combination; the red points represent the prediction and true Power data in Figure 3. There are several ways to deal with this in multivariate metamodeling, if needed, e.g.: (1) Nonlinear transformation of the predicted Power (e.g., logistic or polynomial) to maximize its correlation to the true Power. (2) Nonlinear transformations of the input parameters used as X-variables. (3) Polynomial extension of the set of X-variables, including, e.g., quadratic terms and cross-product terms. (4) Nonlinear PLSR regression. (5) Hierarchical cluster-based PLS regression/locally weighted regression, forming different, simpler regression models in different regions in the input/output space.

In summary, the classical, “direct” multivariate metamodel gave quite good predictions of many of the observables, already in its simplest linear form modeling all the observable variables jointly. It can probably be improved further, by nonlinear modifications and by regression modeling of each observable separately. When deemed sufficiently successful, such a direct multivariate metamodel may be used for global sensitivity analysis, for graphical identification of unexpected intercorrelation patterns and for performance comparison of different parameter combinations.

Used in an analysis-by-synthesis setting, it may also be used for identification of input parameters from observable model outputs or from actually observed versions of these model outputs, thus optimizing the process originally modeled by, e.g., FEM. By repeatedly calling the DM with different input parameter values in a nonlinear hill-climbing process (e.g., Simplex optimization), one can find the combination(s) of the input parameters that maximize some desirable function of the predicted observables. But it should be noted that for mathematically sloppy physics-based models there may be several different input parameter combinations that give more or less the same “optimal” output characteristics.

Fig. 4
figure 4

Calibration (blue) and Test (red) sets in the Predicted vs Reference Power plot (Color figure online)

Assessing the Statistical Performance of the Inverse Metamodel

Inverse multivariate metamodeling is a way to simplify the identification of input parameters from observable model outputs or from actually observed versions of these model outputs.

If there exists a unique one-to-one mapping between input parameter values and output observables, then it should be possible to predict the parameter values of the FEM model from a (linear) combination of the observables. If that were the case, one could predict the parameter values without the repeated analysis-by-synthesis search process.

The inverse metamodel was constructing using the same dataset used for the direct one, i.e., a 1920-samples long database containing the observables as \({\varvec{X}}\) and inputs as \({\varvec{Y}}\), and validated once again using a leave-one-out cross-validation strategy.[46]

When all the design variables were used as Y-variables, the optimal number of factors was found to be equal to 13. At this level of complexity the rmse of validation are shown in Table VI.

Table VI Database Characteristics for the Properties of Electrode \(1^a\) and Root Mean Squared Error (rmse) for the Inverse PLS Regression Model

In this case, we evaluate the statistical performance by comparing the inputs predicted by the inverse metamodel against the ones used in the physics-based FEM model to run the simulation. In other words, we test the capability of the inverse metamodel of recovering the inputs associated to cases 1 and 2 above starting both from the FEM output (i.e., compare \( {\text{Input}} \) against \({\mathbf{I}}{\mathbf{M}}\,(\mathbf{FEM}\,({\text{Input}}))\)) and the direct metamodel output (i.e., compare \( {\text{Input}} \) against \({\mathbf{I}}{\mathbf{M}}\,({\mathbf{D}}{\mathbf{M}}\,({\text{Input}}))\)).

Table VII Performance of the Inverse Metamodel About Reconstructing the Inputs

Analysis of the results of Table VII shows that the inverse metamodel has a fair performance in predicting the input of the FEM model given the outputs. Currents and shape of the coke beds are in very good agreement, whereas conductivities in the coke beds are generally overestimated.

With the results of the linear scan discussed in the previous section, we investigated the performance of the inverse metamodel in determining that only the electrode position parameter was modified in the experiment.

Fig. 5
figure 5

Position of Electrode 1 during the linear scan: actual FEM vs inverse metamodel results

When analyzing the data from the linear scan, the inverse metamodel correctly identified that the Electrode position was being scanned. However, as shown in Figure 5, the variation in the electrode height is underestimated: The metamodel predict a rate of 10 mm/step compared to the 25 mm/step that was actually employed. The inverse metamodel compensates this by assuming that the conductivity of the coke bed 1 was decreasing for each scan step as seen in Figure 6, we note that all the remaining parameters are correctly identified as constants. In practice, the inverse metamodel interprets the scan results (increase in Power/resistance for electrode 1) as due by the simultaneous lift of the electrode and decrease of coke bed conductivity.

Although it is correct that both parameters could lead to the same output, this shows that the inverse metamodel was unable to recognize the nuances on the other observables to identify that only the electrode position was scanned, at least with the current training set/level of complexity.

The most probable reason for this failure is that the FEM model is ambiguous (“mathematically sloppy”) in some respects. For instance, if certain combinations of the FEM inputs, e.g., the lift of the electrode and decrease of coke bed conductivity, give more or less similar responses for all the FEM outputs, then the outputs cannot be expected to give unique predictions of the these FEM inputs under these conditions.

Fig. 6
figure 6

Conductivity of Coke bed 1 during the linear scan: actual FEM vs inverse metamodel results

Summary and Concluding Remarks

In this work, we devised a procedure to construct metamodels and inverse metamodels linking the input parameters of a Finite Element Method calculations and the resulting observables. To obtain a metamodel/inverse metamodel, one need to implement a physics-based model that gives a suitable representation of the system at hand. With this model, a database of numerical experiments is obtained by the help of experimental design. The extension of the database depends on the dimension of the input space. When possible, inherent symmetry of the model can be exploited to formally increase the size of the database, alternatively numerical constrains can be implemented in the procedure determining the data-based model coefficients to ensure the proper behavior. The final step consists in applying statistical analysis methods to devise a data-driven approximation of the physics-driven model.

In the specific, the system investigated in this work is a large FeMn furnace for which the electrical conditions as function of the operational parameters have been investigated with a FEM model implemented in COMSOL 5.5. Here the input space includes position, current, size and conductivity of the coke bed for each electrodes and conductivity of the charge, whereas the output space contained active and reactive powers, voltages, resistances, reactances, and coke beds volumes. An optimized multi–level binary replacement design was used to direct the computation of 640 FEM simulations, the size of the database was then triplicated exploiting the symmetry of the physics-based model. Partial Least Square Regression, a linear multivariate model was selected to perform the statistical analysis that gave the direct and inverse metamodels.

The performances of the direct and inverse metamodel where investigated on 2 typical cases and on a linear scan with encouraging results. Applied to a small, independent test set of new input combinations, chosen in the most relevant parameter ranges, the direct metamodel predicted with high accuracy reactive powers, reactances and induced powers but overestimated the active powers and the resistance by 3 to 4 pct. This was possibly due to unmodeled curvature, which can be corrected for. The inverse metamodel predicted with fair accuracy the input used to generate selected outputs but could not resolve the observables generated by simulating the lifting of an electrode. This was probably due to mathematical ambiguity in the FEM model, whereby different input parameter combinations gave more or less the same output observables, and it will require further investigations.

Future efforts will focus on: (i) improve the construction of the FEM database by implementing a strategy in which the simulated points are selected adaptively, i.e., it the model dictates which point to simulate to strengthen its performances. (ii) Using squared terms and interactions to go beyond the linear model.