1 Introduction and Motivations

Numerical methods for solving parametric partial differential equations are relevant for all the engineering applications that can be framed into a multi-query or a real-time context. Recurrent instances range from the estimation of the parameters which govern physical phenomena (e.g., in biomedical engineering, where the known velocity of the blood and the Navier-Stokes equations for incompressible fluids are combined to draw conclusions about the blood viscosity inside a cardiovascular vessel) to the solution of optimal control problems (e.g., in environmental engineering, where experimental measurements of the concentration of a pollutant in the water and the Navier-Stokes equations for incompressible fluids with a parametrized forcing term are combined to compute the maximum amount of pollutant that can be released without compromising the ecosystem inside a river).

The curse of dimensionality characterizing full order models to simulate this type of physical systems raised the necessity to propose specific numerical methods in order to sustain the computational cost. Reduced order models have garnered interest in the scientific computing community as an effective way to compress complex partial differential models in a projection subspace, and thus make it more computationally convenient to solve [2, 4, 6, 14, 17, 35].

Here, we focus on a class of reduced order models conceived to describe flows in pipes or, more in general, phenomena with a privileged dynamics aligned with the centerline of the pipe, which may be locally modified by secondary dynamics evolving along the transverse sections. We are referring to the Hierarchical Model (HiMod) reduction [11, 27, 29, 30, 33, 36], and, in particular, to a parametric counterpart of such an approach, known as HiPOD [3, 21]. HiPOD offers a possible remedy to the well-known bottleneck of an offline/online paradigm, i.e., the computational burden characterizing the offline phase. The idea is to replace the “truth” model in the offline phase of the Proper Orthogonal Decomposition (POD) [18,19,20, 38] with a HiMod discretization, namely, more in general, with a reduced-order model characterized by a high accuracy and a contained computational demand. Then, the online phase recovers the HiMod approximation for a not yet sampled value of the selected parameter, after solving a problem of a very small dimensionality. The computational advantages led by a HiPOD approximation have been numerically investigated both on scalar and vector problems [3, 21].

In [21] two HiPOD model reduction procedures are presented, namely the basic and the directional HiPOD approaches. Here, we focus on the directional method, that combines HiMod and POD by exploiting the decoupling between leading and secondary dynamics at the basis of a HiMod formulation. Such an approach proved to outperform the basic one when dealing with phenomena which are of interest in this paper, i.e., characterized by a significant horizontal dynamics [21].

In this work, we show that the presence of noise in the data can make directional HiPOD unstable, due to an interpolation over the parametric space to estimate the online solution. Actually, interpolation reproduces, instead of filtering out, the noise affecting data. To overcome this issue, we propose to replace interpolation with Machine Learning (ML) fitting models which better discriminate relevant physical features in the data from irrelevant unstructured noise, thus allowing the model to retain and leverage the former and discard the latter. The numerical assessment we carried out shows that the ML methodology reduces the \(L^2\)- and the \(H^1\)-norm of the HiPOD relative error up to an order of magnitude with respect to state-of-the-art interpolation techniques, thus confirming the strong potentiality of the proposed approach.

The paper is structured as follows. Section 2 addresses the mathematical background exploited to upgrade the directional HiPOD procedure in [21], namely the basics to perform a HiPOD model reduction and some machine learning regression models. Section 3 introduces the new HiPOD approach and analyzes the effect of the noise propagation onto the response matrix. In Sect. 4, we carry out a numerical assessment of the proposed methodology, by comparing the state-of-the-art HiPOD approach with the new ML version, both in terms of accuracy and robustness to the level of noise. Finally, some conclusions and perspective are supplied in the last section.

2 Mathematical Background

HiPOD model reduction and Machine Learning (ML) models for a regression analysis represent the main methodological tools supporting the new approach proposed in this paper. The next sections are devoted to introduce such tools, for the reader completeness.

In particular, we choose as a reference problem a parametrized elliptic Partial Differential Equation (PDE), defined on a domain \(\varOmega \subset {\mathbb {R}}^2\), whose weak form is

$$\begin{aligned} \text{ find } u(\alpha )\in V\quad \text{ s.t. } \quad a(u(\alpha ), v; \alpha ) = f(v; \alpha ) \quad \forall v\in V, \end{aligned}$$
(1)

where \(\alpha \in {{\mathcal {P}}}\subset {\mathbb {R}}^p\) is the selected parameter varying in the set \({{\mathcal {P}}}\) of the admissible values; \(a(\cdot , \cdot ; \alpha ): V \times V \times {{\mathcal {P}}}\rightarrow {\mathbb {R}}\) and \(f(\cdot ; \alpha ): V \times {{\mathcal {P}}} \rightarrow {\mathbb {R}}\) denote the parametrized bilinear and linear forms characterizing the differential problem at hand, the linearity being meant with respect to the variables different from \(\alpha \), and with \(V\subseteq H^1(\varOmega )\) a suitable Hilbert space depending on the specific PDE problem as well as on the assigned boundary conditions, standard notation being adopted for function spaces [10]. To simplify the exposition, we focus here on the case of a standard scalar linear advection-diffusion-reaction (ADR) problem, completed with full homogeneous Dirichlet boundary conditions (see Sect. 4 for a more general setting), so that the bilinear and the linear forms in (1) are

$$\begin{aligned} a(w, z; \alpha ) = \displaystyle \int _\varOmega \mu \nabla w \cdot \nabla z \, d\varOmega + \int _\varOmega \big ( \textbf{b} \cdot \nabla w + \sigma w \big ) z \, d\varOmega , \quad f(z; \alpha ) = \displaystyle \int _\varOmega fz \, d\varOmega , \end{aligned}$$
(2)

with w, \(z\in V=H^1_0(\varOmega )\), and where parameter \(\alpha \) coincides with some of the problem data, i.e., the viscosity \(\mu \), the advective field \(\textbf{b}=[b_1, b_2]^T\), the reaction \(\sigma \), the source term f (or a boundary value when non homogeneous or more general boundary conditions are assigned).

Suitable assumptions are advanced on the problem data in order to guarantee the well-posedness of formulation (1), for any \(\alpha \in {{\mathcal {P}}}\). Finally, we conjecture an affine parameter dependence [17, 35].

2.1 The HiPOD Approach

The HiPOD method provides the parametric counterpart of a Hierarchically Model (HiMod) reduction, by properly combining a HiMod discretization with the standard Proper Orthogonal Decomposition (POD) [18,19,20, 38]. In the next section we recap the main features characterizing a HiMod reduction, being instrumental in the setting of HiPOD procedures.

2.1.1 Hierachical Model Reduction

HiMod reduction proved to be an ideal method for the modeling of scenarios where a dominant direction is evident in the global dynamics of the considered phenomenon. This modeling property is, in general, mirrored by the geometric design of the computational domain, which is assumed to coincide with a pipe where the centerline is parallel to the dominant dynamics. Thus, \(\varOmega \), is identified by the fiber bundle \(\bigcup _{x \in \varOmega _{1D}} \{x\} \times \varSigma _x \), where \(\varOmega _{1D}\) is the one-dimensional (1D) supporting fiber paraller to the leading dynamics, while \(\varSigma _x\) denotes the transverse section at the generic point x along \(\varOmega _{1D}\) [11, 29, 30, 33]. For simplicity, we focus on rectilinear domains, so that \(\varOmega _{1D}\equiv (a,b)\subset {\mathbb {R}}\) (we refer the interested reader to [7, 28, 31] for pipes with a bent centerline).

HiMod reduction exploits the geometric requirement on \(\varOmega \) to discretize a PDE problem, so that the leading and the transverse dynamics are approximated with different methods, according to a separation of variable criterion. In this work, we adopt the approach proposed in [1] where finite elements model the main dynamics, while a basis of customized modal functions describes the dynamics parallel to the transverse sections of the pipe. This decoupling of leading and secondary dynamics allows us to commute the full problem into a system of coupled 1D problems, independently of the original model dimensionality, with a consequent considerable benefit in terms of computational effort [1, 7, 16, 23].

Additionally, as it is recurrent in several well-known contexts, computations are performed in a reference domain \({\widehat{\varOmega }}\) (where, e.g., constants can be explicitly computed) and successively moved to the physical domain \(\varOmega \). This is performed by means of a map \(\varPsi : \varOmega \rightarrow {\widehat{\varOmega }}\) which is assumed to be differentiable with respect to both the independent variables x and y. The domain \({\widehat{\varOmega }}\) shares a fiber structure as \(\varOmega \), being \({{\widehat{\varOmega }}}=\varOmega _{1D} \times {\widehat{\varSigma }}\), with \({\widehat{\varSigma }}\) the reference transverse fiber and where the supporting fiber is the same as for \(\varOmega \). In particular, for any point \(\textbf{z}=(x, \textbf{y})\in \varOmega \), there exists a point \(\widehat{\textbf{z}}=({\widehat{x}}, \widehat{\textbf{y}})\in {\widehat{\varOmega }}\), such that \(\widehat{\textbf{z}}=\varPsi (\textbf{z})\), with \({\widehat{x}}\equiv x\) and \(\widehat{\textbf{y}}=\psi _x(\textbf{y})\), \(\psi _x: \varSigma _x \rightarrow {\widehat{\varSigma }}\) being the map between the generic and the reference transverse fiber. Hereafter, we assume \(\psi _x\) to be a \(C^1\)-diffeomorphism, for all \(x\in \varOmega _{1D}\) (more details about maps \(\varPsi \) and \(\psi _x\) are available in [30]).

The separation of variables combined with the mapping to \({{\widehat{\varOmega }}}\) leads us to define the parametric HiMod reduced space

$$\begin{aligned} V_{m}(\alpha )=\left\{ v_{m}(x, \textbf{y}; \alpha ) = \displaystyle \sum _{k=1}^{m} \sum _{j=1}^{N_h} {{\tilde{v}}}_{k, j}^\alpha \vartheta _j(x) \varphi _k(\psi _x(\textbf{y})),\ (x, \textbf{y}; \alpha )\in \varOmega _{1D} \times \varSigma _x\times {{\mathcal {P}}} \right\} , \end{aligned}$$
(3)

where \(m\in {\mathbb {N}}^+\) is the modal index setting the level of detail of the HiMod approximation in the hierarchy; \({{\mathcal {B}}_1}=\{\vartheta _j\}_{j=1}^{N_h}\) is a basis for the 1D space, \(V_{1D}\subset H^1_0(\varOmega _{1D})\), of the finite element functions associated with the supporting fiber \(\varOmega _{1D}\) and vanishing at a and b, with \(\textrm{dim}(V_{1D})=N_h<+\infty \); \({{\mathcal {B}}_2}=\{ \varphi _k\}_{k\in {\mathbb {N}}^+}\) denotes the basis of modal functions defined on the reference transverse fiber \({{\widehat{\varSigma }}}\), orthonormal with respect to the \(L^2({{\widehat{\varSigma }}})\)-scalar product and including, in an essential way [1], the data assigned on the lateral boundary \(\Gamma _{L}=\cup _{x\in \varOmega _{1D}} \partial \varSigma _x\) of \(\varOmega \) (the reader interested to different possible choices both for basis \({{\mathcal {B}}_1}\) and \({{\mathcal {B}}_2}\) may refer, e.g., to [7, 11, 16, 30, 31]). In particular, function \({{\tilde{v}}}_{k}(x; \alpha ) = \sum _{j=1}^{N_h} {{\tilde{v}}}_{k, j}^\alpha \vartheta _j(x) \in V_{1D}\) identifies the frequency coefficient associated with the k-th modal function \(\varphi _k\).

As far as the modal index m is concerned, it may be assigned thanks to a trial-and-error procedure (see, e.g., [11, 30]) or starting from some preliminary (geometric or physic) information about the problem at hand (as, e.g., in [16]) or via an automatic selection based on an a posteriori modeling error analysis (we refer, e.g., to [32, 34]). We adopt the same value for m in the whole \(\varOmega \), although the modal index may be locally varied along the domain to match possible heterogeneities of the solution (see [29, 30, 32,33,34] for further details).

Thus, the HiMod approximation to problem (1), associated with the modal coefficient m, reads

$$\begin{aligned} \text{ find } u_{m}(\alpha )\in V_{m}(\alpha )\quad \text{ s.t. } \quad a(u_{m}(\alpha ), v_{m}; \alpha ) = f(v_{m}; \alpha ) \quad \forall v_{m}\in V_{m}(\alpha ), \end{aligned}$$
(4)

with \(u_{m}(\alpha )=u_{m}(x, \textbf{y}; \alpha )\). To ensure the well-posedness of formulation (4), we endow the space \(V_m(\alpha )\) both with a conformity and a spectral approximability assumption, while the convergence of the HiMod approximation \(u_{m}(\alpha )\) to the full solution \(u(\alpha )\) in (1) is guaranteed by introducing a standard density assumption on the discrete space \(V_{1D}\).

After applying the HiMod representation in (3) to \(u_{m}(\alpha )\) in (4), and choosing the test function \(v_{m}\) as \(\vartheta _t \varphi _q\), for \(t=1, \ldots , N_h\) and \(q=1, \ldots , m\), the HiMod formulation turns into the system of m coupled 1D problems,

$$\begin{aligned} A_m(\alpha )\textbf{u}_m(\alpha )=\textbf{f}_m(\alpha ), \end{aligned}$$
(5)

with \(A_m(\alpha )\in {\mathbb {R}}^{mN_h\times mN_h}\) denoting the HiMod stiffness matrix, \(\textbf{f}_m(\alpha )\in {\mathbb {R}}^{mN_h}\) representing the HiMod right-hand side, and where vector

$$\begin{aligned} \textbf{u}_m(\alpha )=\big [ {{\tilde{u}}}_{1, 1}^\alpha , \ldots , {{\tilde{u}}}_{1, N_h}^\alpha , {{\tilde{u}}}_{2, 1}^\alpha , \ldots , {{\tilde{u}}}_{2, N_h}^\alpha , \ldots , {{\tilde{u}}}_{m, 1}^\alpha , \ldots , {{\tilde{u}}}_{m, N_h}^\alpha \big ]^T\in {\mathbb {R}}^{mN_h} \end{aligned}$$
(6)

collects the modal coefficients \(\{ {{\tilde{u}}}_{k, j}^\alpha \}_{k=1, j=1}^{m, N_h}\), namely the actual unknowns of the HiMod discretization

$$\begin{aligned} u_{m}(x, \textbf{y}; \alpha ) = \displaystyle \sum _{k=1}^{m} \sum _{j=1}^{N_h} {{\tilde{u}}}_{k, j}^\alpha \vartheta _j(x) \varphi _k(\psi _x(\textbf{y})) \end{aligned}$$
(7)

[11, 30]. It has been numerically checked that, when the mainstream dominates the transverse dynamics (i.e., for small values of m), the HiMod approximation demands a considerably lower computational effort if compared with a standard (e.g., finite element) discretization of problem (1), without giving up the accuracy of the simulation [1, 7, 16, 23].

2.1.2 Directional HiPOD Reduction

A HiPOD approach offers a new way to yield a HiMod approximation, as an alternative to the resolution of the HiMod system (5). According to a data-driven procedure, the idea is to replace the HiMod discretization in (7) with a surrogate solution obtained by resorting to a POD reduced basis generated by HiMod approximations. A standard offline/online paradigm drives the computation of the HiPOD approximation. In particular, during the offline phase, we compute the HiMod solution in (5) for different choices of \(\alpha \), to extract the POD basis. In the online step, such a basis is employed to approximate the HiMod solution in (5) associated with a value of the parameter not sampled during the offline phase.

As shown in [3, 21], a HiPOD procedure considerably lowers computational costs, without compromising the quality of the reduced solution.

Two HiPOD approaches have been explored so far. The basic method coincides with a straightforward application of a projection-based POD to HiMod solutions [3]. An advanced procedure, referred to as directional HiPOD, takes advantage of the separation of variables implied by a HiMod discretization [21], the SVD being used to remove the redundancy along the main stream and the transverse direction, separately. In addition, the online phase is carried out by interpolation instead of projection, thus relieving us from assembling the HiMod stiffness matrix and the right-hand side associated with the online parameter, as expected by the basic HiPOD approach.

Below, we detail the directional HiPOD method, since instrumental to the new approach proposed in Sect. 3.

The goal of the offline phase is to identify the POD basis to be used in an online mode. To this aim, we build the response matrix by collecting the HiMod solution to problem (1) for p different values, \(\alpha _i\), of the parameter \(\alpha \), with \(i=1, \ldots , p\). In more detail, the generic HiMod solution \(u_{m}(x, \textbf{y}; \alpha _i)\) is identified with the corresponding modal coefficients, \(\{ {{\tilde{u}}}_{k, j}^{\alpha _i} \}_{k=1, j=1}^{m, N_h}\), collected by mode into the m vectors

$$\begin{aligned} \textbf{U}^k(\alpha _i)=\left[ {\tilde{u}}_{k, 1}^{\alpha _i}, {\tilde{u}}_{k, 2}^{\alpha _i}, \ldots , {\tilde{u}}_{k, N_h}^{\alpha _i}\right] ^T\in {\mathbb {R}}^{N_h} \quad k=1, \ldots , m, \end{aligned}$$
(8)

so that the response matrix is assembled as

$$\begin{aligned} \begin{array}{rcl} U&{}=&{}\left[ \textbf{U}^1(\alpha _1) \cdots \textbf{U}^m(\alpha _1) |\textbf{U}^1(\alpha _2) \cdots \textbf{U}^m(\alpha _2) |\cdots \cdots \cdots |\ \textbf{U}^1(\alpha _p) \cdots \textbf{U}^m(\alpha _p)\right] \qquad \qquad \\ &{}=&{} \left[ \begin{array}{ccc|ccc|cc|ccc} {\tilde{u}}_{1, 1}^{\alpha _1} &{} \cdots &{} {\tilde{u}}_{m, 1}^{\alpha _1} &{} {\tilde{u}}_{1, 1}^{\alpha _2} &{} \cdots &{} {\tilde{u}}_{m, 1}^{\alpha _2} &{} \cdots &{} \cdots &{} {\tilde{u}}_{1, 1}^{\alpha _p} &{} \cdots &{} {\tilde{u}}_{m, 1}^{\alpha _p} \\ {\tilde{u}}_{1, 2}^{\alpha _1} &{} \cdots &{} {\tilde{u}}_{m, 2}^{\alpha _1} &{} {\tilde{u}}_{1, 2}^{\alpha _2} &{} \cdots &{} {\tilde{u}}_{m, 2}^{\alpha _2} &{} \cdots &{} \cdots &{} {\tilde{u}}_{1, 2}^{\alpha _p} &{} \cdots &{} {\tilde{u}}_{m, 2}^{\alpha _p}\\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ {\tilde{u}}_{1, N_h}^{\alpha _1} &{} \cdots &{} {\tilde{u}}_{m, N_h}^{\alpha _1} &{} {\tilde{u}}_{1, N_h}^{\alpha _2} &{} \cdots &{} {\tilde{u}}_{m, N_h}^{\alpha _2} &{} \cdots &{} \cdots &{} {\tilde{u}}_{1, N_h}^{\alpha _p} &{} \cdots &{} {\tilde{u}}_{m, N_h}^{\alpha _p} \end{array} \right] . \end{array} \end{aligned}$$
(9)

The blocks are associated with the different parameters, while, for each block, columns run over modes, rows over the finite element nodes. In order to extract the POD basis, we apply the Singular Value Decomposition (SVD) [15] to matrix U, so that

$$\begin{aligned} U=\varXi \varLambda K^T, \end{aligned}$$
(10)

with \(\varXi \in {\mathbb {R}}^{N_h \times N_h}\) and \(K \in {\mathbb {R}}^{(mp)\times (mp)}\) unitary matrices, and \(\varLambda \in {\mathbb {R}}^{N_h \times (mp)}\) a pseudo-diagonal matrix. Each column in U can be expanded in terms of the left singular vectors \(\{ {\varvec{\xi }}_j\}_{j=1}^{N_h}\) of U since constituting an orthogonal basis for \({\mathbb {R}}^{N_h}\), so that

$$\begin{aligned} \textbf{U}^k(\alpha _i) = \displaystyle \sum _{j=1}^{N_h} T_j^k(\alpha _i) {\varvec{\xi }}_j \quad k=1, \ldots , m, \ i=1, \ldots , p. \end{aligned}$$
(11)

It is customary to select the first, say L with \(L\le N_h\), most meaningful singular vectors of U to identify the reduced POD space \(V_\textrm{POD, 1}^L= \text {span} \{{\varvec{\xi }}_1, \dots , {\varvec{\xi }}_{L}\}\), with \(\dim (V_\textrm{POD, 1}^L)=L\). As a consequence, the vectors in (11) are approximated as

$$\begin{aligned} \textbf{U}^k(\alpha _i) \cong \displaystyle \sum _{j=1}^{L} T_j^k(\alpha _i) {\varvec{\xi }}_j \quad k=1, \ldots , m, \ i=1, \ldots , p, \end{aligned}$$
(12)

the equality being ensured for \(L=N_h\). The directional HiPOD reduction involves a reorganization of coefficients \(\{T_j^k(\alpha _i)\}\) first into the vectors \(\textbf{T}_j(\alpha _i)=[T_j^1(\alpha _i), \ldots , T_j^m(\alpha _i)]^T\in {\mathbb {R}}^m\) with \(i=1, \ldots , p\), and successively into the matrices

$$\begin{aligned} S_j=\left[ \textbf{T}_j(\alpha _1), \ldots , \textbf{T}_j(\alpha _p)\right] =\left[ \begin{array}{ccc} T_j^1(\alpha _1) &{} \ldots &{} T_j^1(\alpha _p)\\ \vdots &{} &{} \vdots \\ T_j^m(\alpha _1) &{} \ldots &{} T_j^m(\alpha _p) \end{array} \right] \in {\mathbb {R}}^{m\times p}, \end{aligned}$$

with \(j=1, \ldots , L\), in order to associate a reduced POD basis with each index j. To this aim, we factorize the matrices \(S_j\) via SVD, so that

$$\begin{aligned} S_j=R_j D_j P_j^T, \end{aligned}$$
(13)

with \(R_j \in {\mathbb {R}}^{m\times m}\) and \(P_j\in {\mathbb {R}}^{p\times p}\) unitary matrices, and \(D_j\in {\mathbb {R}}^{m\times p}\) the pseudo-diagonal matrix collecting the singular values of \(S_j\). It follows that each column \(\textbf{T}_j(\alpha _i)\) of \(S_j\) can be approximated by resorting to the POD orthogonal basis \(\{ \textbf{r}_j^k \}_{k=1}^{\mu _j}\), with \(\mu _j\le m\), constituted by the most significant \(\mu _j\) left singular vectors of \(S_j\), namely

$$\begin{aligned} \textbf{T}_j(\alpha _i) \cong \displaystyle \sum _{k=1}^{\mu _j} Q_j^k(\alpha _i) \textbf{r}_j^k \quad j=1, \ldots , L,\ i=1, \ldots , p. \end{aligned}$$
(14)

Thus, a POD space \(V_{\textrm{POD, 2}, j}^{\mu _j}= \text {span} \{\textbf{r}_j^1, \dots , \textbf{r}_j^{\mu _j}\}\) can be defined for each j, with \(\dim (V_{\textrm{POD, 2}, j}^{\mu _j})\) \(=\mu _j\).

The offline phase ends with the overall generation of \((L+1)\) POD reduced bases to be used in the online phase in order to predict the HiMod approximation to problem (1) for a value, \(\alpha ^*\), of the parameter, such that \(\alpha ^*\ne \alpha _i\) for \(i=1, \ldots , p\). The idea is to go backward through the directional procedure, starting from the coefficients \(Q_j^k(\alpha ^*)\) in (14), with \(j=1, \ldots , L\), \(k=1, \ldots , \mu _j\), which are approximated by interpolating the (known) values \(Q_j^k(\alpha _i)\) for \(i=1, \ldots , p\). Coefficients \(Q_j^k(\alpha ^*)\), together with the L POD bases \(\{ \textbf{r}_j^k \}_{k=1}^{\mu _j}\), allow us to compute the vectors

$$\begin{aligned} \textbf{T}_j(\alpha ^*) = \left[ T_j^1(\alpha ^*), \ldots , T_j^m(\alpha ^*)\right] ^T= \displaystyle \sum _{k=1}^{\mu _j} Q_j^k(\alpha ^*) \textbf{r}_j^k \quad j=1, \ldots , L \end{aligned}$$
(15)

in \({\mathbb {R}}^m\) and, consequently, by exploiting the POD basis, \(\{ {\varvec{\xi }}_j\}_{j=1}^{L}\), generated first, to assemble the m vectors

$$\begin{aligned} \textbf{U}_{\textrm{HiPOD}}^k(\alpha ^*) = \left[ u_{\textrm{POD}, k, 1}^{\alpha ^*}, \ldots , u_{\textrm{POD}, k, N_h}^{\alpha ^*}\right] ^T= \displaystyle \sum _{j=1}^{L} T_j^k(\alpha ^*) {\varvec{\xi }}_j \quad k=1, \ldots , m \end{aligned}$$
(16)

in \({\mathbb {R}}^{N_h}\), which represent the online counterpart of vectors in (12). The HiMod solution \(u_m(\alpha ^*)\) can thus be approximated by means of the expansion

$$\begin{aligned} u_{\textrm{HiPOD}}^{L, M_L}(\alpha ^*)=\displaystyle \sum _{k=1}^{m} \left[ \sum _{j=1}^{N_h} {u}_{\textrm{POD}, k, j}^{\alpha ^*}\, \vartheta _j(x) \right] \varphi _k(\psi _x(\textbf{y})), \end{aligned}$$
(17)

with \(M_L=\{\mu _j\}_{j=1}^L\). In particular, coefficients \(u_{\textrm{POD}, k, j}^{\alpha ^*}\) provide an approximation of the actual coefficient \({\tilde{u}}_{k, j}^{\alpha ^*}\) in (8) for \(\alpha _i=\alpha ^*\).

The procedure adopted to predict the coefficients \(Q_j^k(\alpha ^*)\) plays an important role. When the offline solution data is not affected by noise, interpolation techniques are an effective tool. In [21], we assess different intepolations, namely, standard linear interpolation, a piecewise cubic Hermite (PCH) interpolant, and interpolating Radial Basis Functions (RBF), to infer that PCH and RBF interpolants slightly outperform the linear approach.

Vice versa, when the offline data is affected by noise, interpolation is not recommended because it does not discriminate between relevant features of the problem and noise, thus not being able to retain the former and discard the latter. This limit motivated us in the proposal of a new tool, in order to guarantee a reliable HiMod approximation also in the presence of noisy data.

Finally, to select the dimension of the POD spaces \(V_\textrm{POD, 1}^L\) and \(V_{\textrm{POD, 2}, j}^{\mu _j}\), we resort to a control on the variance, i.e., after setting the tolerances \(\epsilon _1\) and \(\epsilon _2\), with \(0\le \epsilon _1, \epsilon _2 \le 1\), we keep the first L left singular vectors \({\varvec{\xi }}_j\) of U and the first \(\mu _j\) left singular vectors \(\textbf{r}_j^k\) of \(S_j\), such that

$$\begin{aligned} \displaystyle \frac{\displaystyle \ \ \sum _{j=1}^L \lambda _j^2\ \ }{\displaystyle \sum _{j=1}^{N_h} \lambda _j^2}\ge \epsilon _1,\quad \frac{\ \ \displaystyle \sum _{j=1}^{\mu _j} d_{j, k}^2\ \ }{\displaystyle \sum _{j=1}^{m} d_{j, k}^2}\ge \epsilon _2, \end{aligned}$$
(18)

respectively, with \(\lambda _j\) the singular value of U associated with \({\varvec{\xi }}_j\), for \(j=1, \ldots , N_h\), and \(d_{j, k}\) the singular value of \(S_j\) corresponding to the k-th singular vector \(\textbf{r}_j^k\), with \(k=1, \ldots , m\).

2.2 Machine Learning Models for Regression

In this section we focus on Machine Learning (ML) models to approximate data distributions.

The final goal is to replace the interpolation step in the online phase of the directional HiPOD procedure with a regression technique, in order to address situations where the data in the offline phase may be noisy. In particular, to estimate the coefficients \(Q_j^k(\alpha ^*)\) in (15), we resort to a ML fitting model since the offline solution is related to parameter \(\alpha _i\) by a highly nonlinear relation.

Now, in order to understand how the noise in the data may affect the accuracy of the predictions yielded by a ML fitting model, we have to make some preliminary assumption on the noise properties. The most common hypothesis leads us to consider an additive independent identically distributed (i.i.d.) Gaussian noise, \({\tilde{\eta }}\sim {\mathcal {N}}(0,\eta )\), in the output, with \(\eta >0\), namely, we assume to have

$$\begin{aligned} Q_j^k(\alpha _i) + {\tilde{\eta }}\quad k=1, \ldots , m,\ j=1, \ldots , L,\ i=1, \ldots , p. \end{aligned}$$
(19)

In the next sections, we consider two ML regression models that operate under the additive noise assumption, i.e., the polynomial [12, 13, 22, 37] and the Gaussian process regression [8, 9, 24, 25].

2.2.1 Polynomial Regression

Polynomial regression is a form of regression analysis where the relationship between the independent and the dependent variables is modeled as a polynomial in the independent variable, of a certain degree n. With reference to our specific context, a polynomial regression model of degree n estimates the nonlinear dependence of \(Q_j^k(\alpha _i)\) from the parameter \(\alpha _i\) according to the formula

$$\begin{aligned} Q_j^k(\alpha _i) = \sum _{\ell = 0}^{n} \beta _{j, \ell }^k\, \alpha _i^\ell + {{\tilde{\eta }}}, \quad k=1, \ldots , m,\ j=1, \ldots , L,\ i=1, \ldots , p, \end{aligned}$$
(20)

where \(\beta _{j, \ell }^k\) are unknown parameters to be computed in order to optimize the matching between predictions and observations of the dependent variable, while \({\tilde{\eta }}\) denotes a zero-mean Gaussian noise as in (19). Although relation (20) is nonlinear in the independent variable \(\alpha _i\), this model is categorized as linear since the regression function is linear in terms of the unknown parameters \(\beta _{j, \ell }^k\).

With a view to the directional HiPOD procedure, the quantity

$$\begin{aligned} \sum _{\ell = 0}^{n} \beta _{j, \ell }^k\, [\alpha ^*]^\ell \quad k=1, \ldots , m,\ j=1, \ldots , L \end{aligned}$$
(21)

will be used as an estimate for the online coefficient \(Q_{j}^k(\alpha ^*)\).

2.2.2 Gaussian Process Regression

The goal of a Gaussian process regression is to determine the best set of random variables that describes the relation between input features and outputs. In the specific setting of interest, we denote the random variables we are looking for by \({\mathcal {Q}}_{j}^k\), while \(\alpha _i\) and \(Q_j^k(\alpha _i)\) represent the inputs and the outputs, respectively.

A Gaussian process is fully characterized by its mean function and covariance function. Therefore, Gaussian process regression reduces to calculating the best values for the mean and the covariance functions in order to minimize the mismatch between outputs (\(Q_j^k(\alpha _i)\)) and the predictions produced by the finite set of random variables selected from the Gaussian process, evaluated at the available inputs (\(\alpha _i\)).

Bayesian statistics combines prior Gaussian processes, that retain preliminary knowledge and information from the offline data samples (also known as likelihood), to construct an updated posterior Gaussian process. According to the Bayes’ rule [5], a prior Gaussian process is iteratively updated using the information from the data till it converges to a stationary state defined by a posterior Gaussian process.

Now, we specify such a workflow onto our setting of interest. We denote a prior Gaussian process by \({\mathcal {Q}}_{j,\text {prior}}^k\), so that

$$\begin{aligned} {\mathcal {Q}}_{j,\text {prior}}^k \sim {{\mathcal {G}}}{{\mathcal {P}}} \left( q^k_{j,\text {prior}}, z^k_{j,\text {prior}} + \eta \right) \quad k=1, \ldots , m,\ j=1, \ldots , L, \end{aligned}$$
(22)

where \(q^k_{j,\text {prior}}=q^k_{j,\text {prior}}(\alpha )\) is the mean function, \(z^k_{j,\text {prior}}=z^k_{j,\text {prior}}(\alpha , {{\tilde{\alpha }}})\) is the covariance function, and \(\eta \) is the noise function. The restriction of the prior Gaussian process in (22) at the input points \(\alpha _i\), for \(i=1,\ldots ,p\), is a multivariate p-dimensional Gaussian distribution \({\mathcal {N}}({\textbf{m}}_{j, \text {prior}}^k, Z_{j, \text {prior}}^k)\), where the mean vector, \({\textbf{m}}_{j, \text {prior}}^k=[m_{j, \text {prior}, r}^k]\in {\mathbb {R}}^p\), and the covariance matrix, \(Z_{j, \text {prior}}^k=[Z_{j, \text {prior}, rs}^k]\in {\mathbb {R}}^{p\times p}\), are identified by relations

$$\begin{aligned}{} & {} m_{j, \text {prior}, r}^k = q^k_{j,\text {prior}}(\alpha _r)=\frac{1}{4}\alpha _r ^ 2 \quad r=1,\ldots , p, \end{aligned}$$
(23)
$$\begin{aligned}{} & {} Z_{j, \text {prior}, rs}^k = z^k_{j,\text {prior}}(\alpha _r, \alpha _s)=\exp \Big [-\frac{1}{2}(\alpha _r-\alpha _s)^2\Big ] \quad r, s=1,\ldots , p, \end{aligned}$$
(24)

respectively. The Gaussian process \({\mathcal {Q}}_{j,\text {like}}^k\) associated with the data, is defined as

$$\begin{aligned} \begin{aligned} {\mathcal {Q}}_{j,\text {like}}^k \sim {{\mathcal {G}}}{{\mathcal {P}}} \left( q_{j,\text {like}}^k, z_{j,\text {like}}^k\right) \quad&k=1, \ldots , m,\ j=1, \ldots , L, \end{aligned} \end{aligned}$$
(25)

with \(q^k_{j,\text {like}}\) denoting the mean function and \(z^k_{j,\text {like}}\) the covariance function. The Gaussian process \({\mathcal {Q}}_{j,\text {like}}^k\) is not fully characterizable using the data, meaning that the mean and the covariance functions cannot be uniquely determined. However, the values attained by \(q^k_{j,\text {like}}\) and \(z^k_{j,\text {like}}\) at the input points \(\alpha _i\), for \(i=1,\ldots ,p\), are known and correspond to a p-dimensional Gaussian distribution \({\mathcal {N}}({\textbf{m}}^k_{j,\text {like}}, Z^k_{j,\text {like}})\), where the mean vector \({\textbf{m}}^k_{j,\text {like}}=[m^k_{j,\text {like}, r}]\in {\mathbb {R}}^p\) and the covariance matrix \(Z^k_{j,\text {like}}=[Z^k_{j, \text {like}, rs}]\in {\mathbb {R}}^{p\times p}\) are defined by

$$\begin{aligned} m^k_{j,\text {like}, r}=Q_j^k(\alpha _r)\quad r=1,\ldots , p, \end{aligned}$$
$$\begin{aligned} Z^k_{j, \text {like}, rs} = \exp \left[ -\frac{1}{2}\left( Q_j^k\left( \alpha _r\right) -Q_j^k\left( \alpha _s\right) \right) ^2\right] \quad r, s=1,\ldots , p. \end{aligned}$$
(26)

Now, the posterior can be used to make predictions for unseen values of \(\alpha \) (namely, to predict coefficients \(Q_j^k(\alpha ^*)\) with \(\alpha ^*\) the online parameter). The joint distribution of the Gaussian processes evaluated at the points \(\alpha _i\), for \(i=1,\ldots ,p\), is a 2p-dimensional Gaussian distribution

$$\begin{aligned} {\mathcal {N}} \left( \begin{bmatrix} {\textbf{m}}_{j,\text {prior}}^k\\ {\textbf{m}}^k_{j,\text {like}} \end{bmatrix}, \begin{bmatrix} Z^k_{j,\text {prior}} &{} Z^k_{j,\text {pl}} \\ Z^{k}_{j,\text {pl}}&{} Z^k_{j,\text {like}} \end{bmatrix} \right) , \end{aligned}$$
(27)

where the symmetric matrix \(Z^k_{j,\text {pl}}\in {\mathbb {R}}^{p\times p}\) is the correlation matrix between prior and likelihood. Using Bayes’ Theorem, the posterior p-dimensional Gaussian distribution is

$$\begin{aligned} {\mathcal {N}} \left( {\textbf{m}}_{j,\text {prior}}^k - Z^{k}_{j,\text {pl}}\, \left[ Z^k_{j,\text {prior}}\right] ^{-1}{\textbf{m}}^k_{j,\text {like}}, Z^k_{j,\text {prior}} - Z^k_{j,\text {pl}} \left[ Z^k_{j,\text {like}}\right] ^{-1}Z^{k}_{j,\text {pl}}\right) . \end{aligned}$$
(28)

The posterior Gaussian process associated with the distribution in (28) is consequently given by

$$\begin{aligned} {\mathcal {Q}}^k_{j,\text {posterior}} \sim {\mathcal {G}}{\mathcal {P}} \left( q^k_{j,\text {posterior}}, z^k_{j,\text {posterior}}\right) . \end{aligned}$$
(29)

The evaluation of the mean function \(q^k_{j,\text {posterior}}\) at the new input parameter value \(\alpha ^*\) is

$$\begin{aligned} q^k_{j,\text {posterior}}\left( \alpha ^*\right) = q^k_{j,\text {prior}}\left( \alpha ^*\right) - \left[ Z\left( \varvec{\alpha }, \alpha ^*\right) \right] ^T\, \left[ Z_{j, \text {prior}}^k\right] ^{-1}\, \textbf{m}^k_{j,\text {like}}, \end{aligned}$$
(30)

with \(q^k_{j,\text {prior}}(\alpha ^*)\) defined as in (2324), \(Z(\varvec{\alpha }, \alpha ^*)\in {\mathbb {R}}^p\) denoting the covariance vector between the offline data sampled at \(\varvec{\alpha }=[\alpha _1, \ldots , \alpha _p]^T\in {\mathbb {R}}^p\) and the new sampled parameter \(\alpha ^*\). The covariance function in (29) evaluated at \(\alpha ^*\) is

$$\begin{aligned} z^k_{j,\text {posterior}}\left( \alpha ^*\right) = z^k_{j,\text {prior}}\left( \alpha ^*, \alpha ^*\right) - \left[ Z\left( \varvec{\alpha }, \alpha ^*\right) \right] ^T \left[ Z_{j, \text {like}}^k\right] ^{-1}\, Z\left( \varvec{\alpha }, \alpha ^*\right) , \end{aligned}$$
(31)

with \(z^k_{j,\text {prior}}(\alpha ^*, \alpha ^*)=1\). The value \(q^k_{j,\text {posterior}}(\alpha ^*)\) will be used as an estimate for \(Q_{j}^k(\alpha ^*)\).

3 HiPOD Reduction for Data Affected by Gaussian Noise

To address situations where the offline data is noisy, we propose here a variant of the directional HiPOD reduction presented in Sect. 2.1.2.

The idea is very straightforward. In the online phase we replace the initial interpolation step used to estimate coefficients \(Q_j^k(\alpha ^*)\) with a regression technique. In particular, formulas (21) and (30) provide us the desired estimate for the coefficients \(Q_j^k(\alpha ^*)\), when resorting to a polynomial or to a Gaussian process regression, respectively. Successively, the online phase is performed exactly as in Sect. 2.1.2, going through the reconstructions (15)-(16), to obtain the final expansion in (17).

The choice for the ML regression models in Sects. 2.2.12.2.2 is motivated by the highly nonlinear dependence of the offline HiMod solution onto the offline parameters.

The improvement led by the new HiPOD approach is numerically checked in Sect. 4. In the next section, we list some numerical quantities that can help us monitor the noise propagation throughout the HiPOD procedure.

3.1 Noise Propagation on the Response Matrix

In this section we compare the standard directional HiPOD procedure and the new variant proposed in this paper for different noise levels in the offline data. In particular, hereafter, we refer to the standard and to the new directional HiPOD approach as to the interpolation-HiPOD and the regression-HiPOD, respectively.

The data noise affects the HiPOD approximation by perturbing the \((L+1)\) SVD’s involved in the offline phase. In [21], we showed that an inaccurate calculation of the first SVD (i.e., of the SVD of the response matrix U) compromises the reliability of the directional interpolation-HiPOD. For this reason, here we focus on the effect of the data noise onto the decomposition in (10).

Let us assume that a Gaussian noise affects the linear form in (1) (e.g., by perturbing the source term f or the boundary data of the PDE problem at hand). As a consequence, the HiMod system in (5) is replaced by the perturbed problem

$$\begin{aligned} A_m(\alpha ){\tilde{\textbf{u}}}_m(\alpha )=\textbf{f}_m(\alpha ) + \varvec{\eta }_m, \end{aligned}$$
(32)

where \(\varvec{\eta }_m \sim {\mathcal {N}}({\textbf{0}}_{mN_h}, \eta I_{mN_h})\) identifies the noise, with \({\textbf{0}}_{mN_h} \in {\mathbb {R}}^{mN_h}\) the null vector, \(I_{mN_h} \in {\mathbb {R}}^{{mN_h}}\times {\mathbb {R}}^{{mN_h}}\) the identity matrix, \(\eta >0\) the noise level, and \({\tilde{\textbf{u}}}_m(\alpha )\) denotes the associated noisy HiMod discretization.

The solution to (32) can be regarded as a perturbation of the HiMod solution, \(\textbf{u}_m(\alpha )\), in (5) by an additive white noise (\(A^{-1}_m(\alpha ) \varvec{\eta }_m\)), being

$$\begin{aligned} {\tilde{\textbf{u}}}_m(\alpha ) = A^{-1}_m(\alpha )\textbf{f}_m(\alpha ) + A^{-1}_m(\alpha ) \varvec{\eta }_m = \textbf{u}_m(\alpha ) + A^{-1}_m(\alpha ) \varvec{\eta }_m. \end{aligned}$$
(33)

To simplify the notation, we define the random variable \(\tilde{\varvec{\eta }}_m(\alpha ) = A^{-1}_m(\alpha ) \varvec{\eta }_m\), with

$$\begin{aligned} \tilde{\varvec{\eta }}_m(\alpha ) \sim {\mathcal {N}}\left( {\textbf{0}}, \eta \left[ A_m(\alpha ) A^T_m(\alpha )\right] ^{-1} \right) , \end{aligned}$$
(34)

so that the solution \({\tilde{\textbf{u}}}_m(\alpha )\) to the perturbed HiMod linear system (32) can be recast as

$$\begin{aligned} \tilde{\textbf{u}}_m(\alpha ) = \textbf{u}_m(\alpha ) + \tilde{\varvec{\eta }}_m(\alpha ). \end{aligned}$$
(35)

This decomposition finds a counterpart when assembling the response matrix, \({\tilde{U}}\), associated with the offline noisy data. Indeed, thanks to the additive property assumed for the noise \({\varvec{\eta }}_m\), matrix \({\tilde{U}}\), which collects the perturbed HiMod solutions \({\tilde{\textbf{u}}}_m(\alpha _i)\) in (32), for the p values, \(\alpha _1, \ldots , \alpha _p\), of the parameter \(\alpha \), can be conceived as a perturbation of matrix U in (9). Thus, after introducing the noise matrix

$$\begin{aligned} \begin{array}{rcl} E&{}=&{}\left[ \textbf{E}^1(\alpha _1) \cdots \textbf{E}^m(\alpha _1) |\textbf{E}^1(\alpha _2) \cdots \textbf{E}^m(\alpha _2) |\cdots \cdots \cdots |\ \textbf{E}^1(\alpha _p) \cdots \textbf{E}^m(\alpha _p)\right] \\ &{}=&{} \left[ \begin{array}{ccc|ccc|cc|ccc} {{\tilde{\eta }}}_{1, 1}^{\alpha _1} &{} \cdots &{} {{\tilde{\eta }}}_{m, 1}^{\alpha _1} &{} {{\tilde{\eta }}}_{1, 1}^{\alpha _2} &{} \cdots &{} {{\tilde{\eta }}}_{m, 1}^{\alpha _2} &{} \cdots &{} \cdots &{} {{\tilde{\eta }}}_{1, 1}^{\alpha _p} &{} \cdots &{} {{\tilde{\eta }}}_{m, 1}^{\alpha _p} \\ {{\tilde{\eta }}}_{1, 2}^{\alpha _1} &{} \cdots &{} {{\tilde{\eta }}}_{m, 2}^{\alpha _1} &{} {{\tilde{\eta }}}_{1, 2}^{\alpha _2} &{} \cdots &{} {{\tilde{\eta }}}_{m, 2}^{\alpha _2} &{} \cdots &{} \cdots &{} {{\tilde{\eta }}}_{1, 2}^{\alpha _p} &{} \cdots &{} {{\tilde{\eta }}}_{m, 2}^{\alpha _p}\\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ {{\tilde{\eta }}}_{1, N_h}^{\alpha _1} &{} \cdots &{} {{\tilde{\eta }}}_{m, N_h}^{\alpha _1} &{} {{\tilde{\eta }}}_{1, N_h}^{\alpha _2} &{} \cdots &{} {{\tilde{\eta }}}_{m, N_h}^{\alpha _2} &{} \cdots &{} \cdots &{} {{\tilde{\eta }}}_{1, N_h}^{\alpha _p} &{} \cdots &{} {{\tilde{\eta }}}_{m, N_h}^{\alpha _p} \end{array} \right] , \end{array} \end{aligned}$$
(36)

where the vectors

$$\begin{aligned} \textbf{E}^k(\alpha _i)=\left[ {{\tilde{\eta }}}_{k, 1}^{\alpha _i}, {{\tilde{\eta }}}_{k, 2}^{\alpha _i}, \ldots , {{\tilde{\eta }}}_{k, N_h}^{\alpha _i}\right] ^T\in {\mathbb {R}}^{N_h} \quad k=1, \ldots , m,\ i=1, \ldots , p, \end{aligned}$$

gather, by mode, the noise affecting the modal coefficients \(\{ {{\tilde{u}}}_{k, j}^{\alpha _i} \}_{k=1, j=1}^{m, N_h}\) in (6) for \(\alpha =\alpha _i\), the response matrix \(\tilde{U} \in {\mathbb {R}}^{N_h\times (mp)}\) associated with the noisy HiMod solutions coincides with

$$\begin{aligned} \tilde{U} = U + E, \end{aligned}$$
(37)

with U the response matrix in (9). The SVD

$$\begin{aligned} {\tilde{U}} = {{\tilde{\varXi }}} {{\tilde{\varLambda }}} {\tilde{K}} \end{aligned}$$
(38)

will consequently replace the factorization of matrix U in (10), with \({{\tilde{\varXi }}}\in {\mathbb {R}}^{N_h\times N_h}\) and \({\tilde{K}}\in {\mathbb {R}}^{(mp)\times (mp)}\) the unitary matrices of the left and of the right singular vectors of \({\tilde{U}}\), and \({{\tilde{\varLambda }}}\in {\mathbb {R}}^{N_h\times (mp)}\) the pseudo-diagonal matrix of the singular values.

Remark 1

The injection of a noise in the problem data involved in the bilinear form in (2) would still affect the HiMod solution \(\textbf{u}_m(\alpha )\), in (5), but not in an additive fashion. This would unavoidably make the perturbation propagation analysis more complex, and is beyond the purpose of this paper.

Some results are available in the literature which relate both the singular values and the singular vectors of matrices U and \({\tilde{U}}\).

As for the singular values, we remind the Weyl theorem [40] and the Mirsky theorem [26]. The first result controls the discrepancy between the i-th singular value \(\lambda _i\) of U and the corresponding singular value \({{\tilde{\lambda }}}_i\) of \({\tilde{U}}\) in terms of the spectral norm, \(\Vert E \Vert _S\), of the noise matrix, being

$$\begin{aligned} |{{\tilde{\lambda }}}_i -\lambda _i |\le \Vert E \Vert _S \quad i=1,\ldots ,\min \{N_h, mp\}. \end{aligned}$$
(39)

Mirsky theorem provides an upper bound on the sum of the quadratic deviations of values \({{\tilde{\lambda }}}_i\)’s with respect to \( \lambda _i\)’s in terms of the Frobenius norm, \(\Vert E \Vert _F\), of the noise matrix, given by

$$\begin{aligned} \sum _{i=1}^{\min \{N_h, mp\}}\left( {{\tilde{\lambda }}}_i -\lambda _i \right) ^2 \le \Vert E \Vert _F. \end{aligned}$$
(40)

Of course, inequalities (39) and (40) are completely useless when norms \(\Vert E \Vert _S\) and \(\Vert E \Vert _F\) become larger and larger.

The reference result on the singular vectors is represented by the generalized \(\sin \theta \) theorem [39]. To state such a result, it is instrumental to introduce an appropriate rewriting of the SVD’s in (10) and (38). By exploiting that matrices \(\varXi \), K, and \({{\tilde{\varXi }}}\), \({\tilde{K}}\) are unitary, it follows that

$$\begin{aligned} U = U_1 + U_0,\quad {\tilde{U}} = {\tilde{U}}_1 + {\tilde{U}}_0, \end{aligned}$$

where \(U_s = \varXi _s \varLambda _s K_s^T\), \({\tilde{U}}_s = {{\tilde{\varXi }}}_s {{\tilde{\varLambda }}}_s {\tilde{K}}_s^T\), for \(s=0, 1\), with \(K_1 = [{\textbf{k}}_1, \ldots , {\textbf{k}}_r]\), \(K_0 = [{\textbf{k}}_{r+1}, \ldots , {\textbf{k}}_{mp}]\), \(\varXi _1 = [\varvec{\xi }_1, \ldots , \varvec{\xi }_r]\), \(\varXi _0 = [\varvec{\xi }_{r+1},\ldots , \varvec{\xi }_{N_h}]\),

$$\begin{aligned}{} & {} \varLambda _1=\left\{ \begin{array}{ll} \begin{bmatrix}\textrm{diag}(\lambda _1, \ldots , \lambda _r, 0, \ldots ,0) \\ {\textbf{0}}_{(N_h -r)\times N_h}\end{bmatrix} \qquad &{}\text {if} \ N_h \ge mp \\ \begin{bmatrix}\textrm{diag}(\lambda _1, \ldots , \lambda _r, 0, \ldots ,0), {\textbf{0}}_{mp\times (mp -r)}\end{bmatrix} &{} \text {if} \ N_h< mp,\\ \end{array} \right. \\{} & {} \varLambda _0=\left\{ \begin{array}{ll} \begin{bmatrix}\textrm{diag}(0, \ldots , 0,\lambda _{r+1}, \ldots , \lambda _{\min \{N_h, mp\}})\\ {\textbf{0}}_{(N_h -r)\times N_h}\end{bmatrix} \quad &{}\text {if} \ N_h \ge mp \\ \begin{bmatrix}\textrm{diag}(0, \ldots , 0,\lambda _{r+1}, \ldots , \lambda _{\min \{N_h, mp\}}),{\textbf{0}}_{mp\times (mp -r)}\end{bmatrix} &{}\text {if} \ N_h < mp,\\ \end{array} \right. \end{aligned}$$

matrices \({\tilde{K}}_1\), \({\tilde{K}}_0\), \({{\tilde{\varXi }}}_1\), \({{\tilde{\varXi }}}_0\), \({{\tilde{\varLambda }}}_1\), \({{\tilde{\varLambda }}}_0\) being defined accordingly.

Moreover, we denote by \({{\mathcal {C}}}_1\) and \({{\mathcal {C}}}_2\) two generic Euclidean subspaces of \({\mathbb {R}}^{N_h}\), and by \(P_{{{\mathcal {C}}}_1}\), \(P_{{{\mathcal {C}}}_2}\in {\mathbb {R}}^{N_h}\) the associated orthogonal projection operators. Thus, the angle \(\theta \) between a vector \({\textbf{x}}\in {\mathbb {R}}^{N_h}\) and the subspace \({{\mathcal {C}}}_1\) (which, among all the mathematically equivalent formulations, by convention is always taken acute and positive) can be defined by

$$\begin{aligned} \sin \theta ({\textbf{x}},{{\mathcal {C}}}_1)=\min _{{\textbf{y}}\in {{\mathcal {C}}}_1} \Vert {\textbf{x}}-{\textbf{y}} \Vert _2, \end{aligned}$$
(41)

with \(\Vert \cdot \Vert _2\) denoting the Euclidean norm of a vector, and with \(\Vert {\textbf{x}}\Vert _2=1\). From the projection theorem, it follows that

$$\begin{aligned} \min _{{\textbf{y}}\in {{\mathcal {C}}}_1} \Vert {\textbf{x}}-{\textbf{y}} \Vert _2 = \Vert \left( I-P_{{{\mathcal {C}}}_1}{\textbf{x}}\right) \Vert _2. \end{aligned}$$
(42)

When considering the angle between the subspaces \({{\mathcal {C}}}_1\) and \({{\mathcal {C}}}_2\), it is common to define

$$\begin{aligned} \sin \theta ({{\mathcal {C}}}_1, {{\mathcal {C}}}_2) = \Vert \left( I-P_{{{\mathcal {C}}}_2}\right) P_{{{\mathcal {C}}}_1} \Vert _S. \end{aligned}$$
(43)

Now, if there exists a pair of values \(\gamma >0\), \(\delta >0\) such that

$$\begin{aligned} \lambda _{\min }({\tilde{U}}_1) \ge \gamma + \delta , \quad \lambda _{\max }(U_0) \le \gamma , \end{aligned}$$
(44)

with \(\lambda _{\min }({\tilde{U}}_1)\) and \(\lambda _{\max }(U_0)\) the minimum and the maximum eigenvalue of \({\tilde{U}}_1\) and \(U_0\), respectively, the generalized \(\sin \theta \) theorem [39] ensures that the following perturbation bound on the column space of \({{\tilde{\varXi }}}\) holds

$$\begin{aligned} \sin \theta \left( R({{\tilde{\varXi }}}_1),R\left( \varXi _1\right) \right) \le \frac{\Vert E \Vert _S}{\delta }={{\mathcal {P}}}_{B1}, \end{aligned}$$
(45)

with R(W) the column space associated with the generic matrix W. The relation (45) is not always computationally convenient to provide meaningful insight on the actual perturbation triggered by the noise in the data. Indeed, computing the perturbation between the two column spaces \(R({{\tilde{\varXi }}}_1)\) and \(R( \varXi _1)\) might be unpractical from a computational view point, due to the infinite vectors to be spanned.

In order to make the control in (45) more computationally convenient, i.e., to restrict the dimensionality of the spaces to be spanned, we consider the alternative inequality

$$\begin{aligned} \sin \theta \Big (R({{\tilde{\varXi }}}_1),R( \varXi _1) \Big ) \le \sin \theta \left( \tilde{\varvec{\xi }}_1,\varvec{\xi }_1 \right) ={{\mathcal {P}}}_{B2}, \end{aligned}$$
(46)

with \(\varvec{\xi }_1\) and \(\tilde{\varvec{\xi }}_1\) denoting the first left singular vectors of the response matrix U and \({\tilde{U}}\), respectively. The result in (46) can be easily proved, moving from the following chain of inequalities:

$$\begin{aligned} \sin \theta \left( \tilde{\varvec{\xi }}_1,\varvec{\xi }_1 \right)&\ge \sin \theta \left( \tilde{\varvec{\xi }}_1,\text {span}\{\varvec{\xi }_1, \varvec{\xi }_2\} \right) \ge \sin \theta \left( \tilde{\varvec{\xi }}_1,\text {span}\{\varvec{\xi }_1, \varvec{\xi }_2, \varvec{\xi }_3\} \right) \\&\ge \ldots \ge \sin \theta \left( \tilde{\varvec{\xi }}_1,R\left( \varXi \right) \right) \ge \sin \theta \left( \text {span}\{\tilde{\varvec{\xi }}_1, \tilde{\varvec{\xi }}_2\},R\left( \varXi \right) \right) \\&\ge \sin \theta \left( \text {span}\{\tilde{\varvec{\xi }}_1, \tilde{\varvec{\xi }}_2, \tilde{\varvec{\xi }}_3\},R(\varXi ) \right) \ge \ldots \\ {}&\ge \sin \theta \left( R({{\tilde{\varXi }}}_1),R\left( \varXi _1\right) \right) . \end{aligned}$$

Bounds (39), (40), (45), together with the new one in (46), will be used in the next section to assess the sensitivity both of the interpolation-HiPOD and of the regression-HiPOD to the noise level.

4 Numerical Results

In this section we compare the interpolation-HiPOD with the regression-HiPOD methods. The comparison in carried out on two test cases, in terms of accuracy and robustness to the noise level. For the former method, we resort to the PCH interpolant, while the latter approach is assessed both with the polynomial and with the Gaussian process regression.

4.1 Test Case 1

We choose as reference setting the ADR problem (1)-(2) solved on the rectangular domain \(\varOmega =(0,6)\times (0,1)\), after setting the problem data to

$$\begin{aligned} \begin{array}{c} \mu (x, y)=0.24, \quad {\textbf{b}}(x, y)=[5, \sin (6x)]^T, \quad \sigma (x, y) = 0.1,\\ f(x,y)=10\chi _{E_1}(x, y)+10\chi _{E_2}(x, y), \end{array} \end{aligned}$$
(47)

with \(\chi _{\omega }\) the characteristic function associated with the generic region \(\omega \subset {\mathbb {R}}^2\), \(E_1\) and \(E_2\) the ellipsoidal areas in \(\varOmega \) given by \(\{ (x,y):(x-0.75)^2+0.4(y-0.25)^2 < 0.01\}\) and \(\{ (x,y):(x-0.75)^2+0.4(y-0.75)^2 <0.01 \}\), respectively. The ADR problem is completed with a homogeneous Neumann data on \(\Gamma _N=\{ (x,y): x=6, 0\le y \le 1\}\), while a homogeneous Dirichlet condition is assigned on \(\Gamma _D=\partial \varOmega \setminus \Gamma _N\), so that space V in (1) coincides with \(H^1_{\Gamma _D}(\varOmega )\).

From a modeling viewpoint, this setting can be adopted to simulate the propagation of a pollutant released by two localized sources within a straight channel, under the effect of a sinusoidal horizontal convection.

We parametrize the ADR problem with respect to the diffusivity \(\mu \). The offline phase is set up so that the coefficient \(\mu \) uniformly spans the range [0.2, 0.8]. All the other problem data remain the same as in (47) through the entire offline phase. We hierarchically reduce 100 ADR problems, after discretizing the main dynamics with linear finite elements on a uniform partition of \(\varOmega _{1D}\) into 40 sub-intervals, while using 20 sinusoidal modal basis functions to approximate the transverse dynamics. To investigate the robustness to the noise of the interpolation- and of the regression-HiPOD procedures, we carry out different offline phases, where the source term f is injected by different levels of white noise, thus changing f into \(f+\eta \), with \(\eta = 0.01\), 0.05, 0.1, 0.25. Independently of the selected noise, the online phase is used to recover the reference setting in (47), i.e., to reconstruct the HiMod solution for \(\mu ^* = 0.24\). The top panel in Fig. 1 shows the approximation yielded by the HiMod discretization adopted for the offline phase. The resolution of such a discretization is sufficiently accurate to capture the oscillatory dynamics induced by the sinusoidal field, together with the presence of the two localized sources in \(E_1\) and \(E_2\). Such a discretization represents the reference solution HiPOD approximations will be compared with.

A unique threshold tolerance, \(\varepsilon \), is used to automatically select the number of POD modes to be preserved at the first and at the second stage of the directional method, which is equivalent to setting \(\epsilon _1=\epsilon _2=\epsilon \) in (18) (we refer the reader to [21] for a thorough investigation about the interplay between tolerances \(\epsilon _1\) and \(\epsilon _2\)). Table 1 collects the distribution of the number \(\mu _j\) of the left singular vectors retained at each finite element node, for different choices of the tolerance \(\varepsilon \) (by rows) and of the noise \(\eta \) (by columns). As expected, number \(\mu _j\) increases for larger and larger values both of \(\varepsilon \) and \(\eta \). Indeed, in the former case, the procedure is intrinsically requested to retain more information about the variability of the offline dataset. In the latter case, the quality of information retained by the offline solution deteriorates with increasing values of noise. This leads the HiPOD procedure to retain more singular vectors with respect to the case of a low (or of the absence of) noise, in order to capture the same amount of information from the offline data.

Fig. 1
figure 1

Test case 1. HiMod reference solution (top panel); interpolation-HiPOD approximation for different noise levels: \(\eta = 0.01\), 0.1, 0.25 (second-fourth row)

Table 1 Test case 1

Figure 1 displays the interpolation-HiPOD approximation reconstructed from offline data with an increasing level of noise, i.e., \(\eta = 0.01\), \(\eta =0.1\), \(\eta = 0.25\) (the contourplot for \(\eta =0.05\) is omitted since it is very similar to the one associated with \(\eta =0.1\)). As expected, the quality of the HiPOD solution deteriorates with increasing values of \(\eta \). In particular, the wake behind the sources is completely lost by the HiPOD approximation for \(\eta = 0.25\).

Figure 2 highlights the benefits obtained by replacing the interpolation step in the HiPOD online phase with a regressione process. In particular, the two panels show the regression-HiPOD approximation when resorting to a cubic polynomial fitting and to a Gaussian process regression, for the largest level of noise analyzed in Fig. 1 (i.e., \(\eta =0.25\)). Both fitting models clearly outperform the interpolation-HiPOD. In particular, the Gaussian process generates a solution of a better quality with respect to the one yielded by the cubic polynomial fitting.

Fig. 2
figure 2

Test case 1. Regression-HiPOD approximation for a cubic polynomial fitting (top) and for a Gaussian process regression (bottom); the noise level is set to \(\eta = 0.25\)

Table 2 Test case 1

This trend is confirmed by the values in Table 2, which gathers the \(L^2(\varOmega )\)- and the \(H^1(\varOmega )\)-norm of the relative modelig error obtained when replacing the reference HiMod discretization in Fig. 1 (top) with the interpolation- rather than the regression-HiPOD approximation. To ensure a high accuracy to the HiPOD solutions, we have set the thresholds driving the selection of the POD bases in (18) very close to 1, picking \(\epsilon _1=\epsilon _2=\epsilon =0.9999\). The interpolation-HiPOD scheme is compared with the regression-HiPOD approach when both a cubic polynomial fitting and a Gaussian process are used, for the four levels of noise considered above. Although the general trend confirms that the error increases with the noise, cubic polynomial fitting and Gaussian process regression outperform considerably the interpolation-HiPOD, in particular for high noise levels.

Table 3 Test case 1

Finally, we quantify the perturbation bounds in (45) and (46), for the considered four levels of noise. Actually, Table 3 shows that the new bound allows us to gain up to three orders of magnitude with respect to quantifier \({{\mathcal {P}}}_{B1}\), thus offering an effective tool to evaluate the noise propagation on the response matrix U in (9).

4.2 Test Case 2

As a second reference setting, we hierarchically reduce the ADR problem in (1)-(2) identified by the data

$$\begin{aligned} \begin{array}{c} \mu (x, y)=0.24, \quad {\textbf{b}}(x, y)=[20, 2 \sin (6x)]^T, \quad \sigma (x, y) = 0.1,\\ f(x,y)=1000\chi _{R_1}(x, y) + 1000\chi _{R_2}(x, y), \end{array} \end{aligned}$$
(48)

on the same domain, \(\varOmega =(0,6)\times (0,1)\), as in Test case 1, with \(R_1=\{ (x,y): 1<x<2, \; 0<y<0.1 \}\) and \(R_2=\{ (x,y): 1<x<2, \; 0.9<y<1.0 \}\) two rectangular regions of interest. The problem is completed by homogeneous Neumann data on \(\Gamma _N=\{ (x,y): x=6, 0\le y \le 1\}\) and by a homogeneous Dirichlet condition on \(\Gamma _D=\partial \varOmega \setminus \Gamma _N\), so that we have \(V\equiv H^1_{\Gamma _D}(\varOmega )\) in (1).

We can adopt this configuration to model, for instance, the transport of a drug released by a medical stent applied to the walls of a cardiovascular vessel, under the effect of incompressible fluid convection.

The HiMod discretization adopted to build the reference solution as well as to perform the offline phase of the HiPOD procedure employs linear finite elements along \(\varOmega _{1D}\) after subdiving the supporting fiber into 60 uniform subintervals, and a modal basis consisting of 20 sinusoidal functions to capture the dynamics along the transverse direction. The top panel in Fig. 3 displays the contour plot of the HiMod reference solution. We clearly distinguish the drug release in the regions \(R_1\) and \(R_2\), together with the transport of the medicine along the pipe. The offline phase is driven by the diffusivity, which is identified with parameter \(\alpha \). In particular, we uniformly cover the range [0.2, 0.8] with 100 samples, while keeping the same values as in (48) for all the other problem data.

The performance of the interpolation- and of the regression-HiPOD approaches in the presence of noise is analyzed by affecting the forcing term in (2) with an increasing white noise \(\eta \), set to \(\eta =0.01\), 0.05, 0.1, 0.25, respectively. For each choice of \(\eta \), we replicate the offline phase set above, before predicting the HiMod approximation associated with parameter \(\mu ^*=0.24\) (i.e., with the reference configuration) in the online phase.

The selection of the POD bases \(\{ {\varvec{\xi }}_j\}_{j=1}^L\) and \(\{ \textbf{r}_j^k\}_{k=1}^{\mu _j}\), for \(j=1, \ldots , L\), is driven by the variance-based criteria in (18), for a unique choice of the threshold tolerance (i.e., for \(\epsilon _1=\epsilon _2=\epsilon \)). The trend of the number \(\mu _j\) of the left singular vectors selected by the HiPOD approach at the finite element nodes for different values of \(\epsilon \) and \(\eta \) is very similar to the one in Table 1 (and, consequently, skipped for shortness).

Figure 3 shows the interpolation-HiPOD solution for the four noise levels \(\eta =0.01\), 0.05, 0.1, 0.25. As for Test case 1, the quality of the HiPOD approximation deteriorates very quickly when increasing the level of noise. In particular, for \(\eta =0.25\), the HiPOD solution is fully noisy, the problem dynamics being completely lost.

Fig. 3
figure 3

Test case 2. HiMod reference solution (top panel); interpolation-HiPOD approximation for different noise levels: \(\eta = 0.01\), 0.05, 0.1, 0.25 (second-fifth row)

On the contrary, the regression-HiPOD discretization significantly outperforms the approximation quality provided by the interpolation. Figure 4 shows such an improvement when resorting to a cubic polynomial fitting and to a Gaussian process regression. Analogously to Fig. 2, Gaussian process regression yields a solution of a better quality than cubic polynomial fitting.

These qualitative considerations are confirmed by the values in Table 4, which gathers the \(L^2(\varOmega )\)- and the \(H^1(\varOmega )\)-norm of the relative modeling error associated with both the interpolation- and the regression-HiPOD approximations, for the considered noise levels. The results corroborate what already remarked for the first test case, namely cubic polynomial fitting and Gaussian process regression outperform the interpolation-HiPOD for high noise levels.

Fig. 4
figure 4

Test case 2. Regression-HiPOD approximation for a cubic polynomial fitting (top) and for a Gaussian process regression (bottom); the noise level is set to \(\eta = 0.25\)

Table 4 Test case 2

As a last check, we compute the quantifiers \({{\mathcal {P}}}_{B1}\) and \({{\mathcal {P}}}_{B2}\) of the noise propagation in the response matrix defined in (45) and (46), respectively (see Table 5). Also for this test case, the bound in (45) is relatively large, and the discrepancy between \({{\mathcal {P}}}_{B1}\) and \({{\mathcal {P}}}_{B2}\) grows with the level of noise.

Table 5 Test case 2

5 Conclusions and Future Developments

In this work we present a first-of-its-kind approach to make the directional HiPOD approach robust when the offline data is noisy. To this aim, we modify the interpolation-HiPOD proposed in [21] by replacing interpolation techniques with ML regression, which gives rise to what we name the regression-HiPOD method.

Numerical results, although preliminary, showcase the performance of the new HiPOD approach in ideal situations, where both the noisy and clean data is available. This allows for a practical validation of the quality of the reconstruction yielded by interpolation-HiPOD against regression-HiPOD.

In particular, when the level of noise in the data is large, regression-HiPOD outperforms interpolation-HiPOD in terms of accuracy, by gaining up to an order in the \(L^2(\varOmega )\)- and \(H^1(\varOmega )\)-norm of the relative modeling error. Moreover, the regression-HiPOD succeeds in accurately reproducing the HiMod solution also in the presence of localized abrupt dynamics (where interpolation-HiPOD plainly fails), as in the second test case where the forcing term is localized in narrow regions close to the boundary.

Additionally, we provide a new upper bound that estimates the effect of the noise level on the deformation of the subspace spanned by the left singular vectors of the response matrix. The new quantity we propose is more practical to compute and provides a more meaningful estimate with respect to some upper bounds available in the literature, as confirmed by the values in Tables 3 and 5.

Concerning possible future developments, we plain to include uncertainty quantification in the analysis, to extend the directional regression-HiPOD approach to handle multiple parameters simultaneously and to model vector PDEs such as the incompressible Stokes and Navier-Stokes equations, with a view to haemodynamics modeling [7].