1 Introduction

Functional Data Analysis (FDA, Ramsay 2005) is an area of statistics developed in the recent years to deal with infinite-dimensional data, such as functions and curves. In many applications, functional data are also geo-referenced, i.e. observed at specific locations within the spatial domain of interest. In these cases, the spatial association among functional data has to be modeled. To deal with functional data with a spatial dependence, a number of geostatistical methods for modeling and prediction of scalar data (Cressie 2015) have been extended to functional data, leading to functional Ordinary Kriging and functional Universal Kriging predictors (Caballero et al. 2013; Menafoglio et al. 2013, 2016). These techniques belong to the more general framework of Object Oriented Spatial Statistics (O2S2, Menafoglio and Secchi 2017, 2019), which allows for the analysis of general types of object data (e.g., functional, distributional, or Riemannian data) by considering them as points of an appropriate mathematical space (e.g., Hilbert space or Riemannian manifold), through a geometrical and topological approach, see, for instance, Menafoglio et al. (2014).

In many processes over spatial domains, not only data are observed, but also prior information about the phenomenon under study is available in terms of a physical model, often formulated through Partial Differential Equations (PDEs). These physical models can be used to predict a functional response at any location of the domain, the spatio-temporal evolution of the phenomenon being precisely modeled through the governing physical laws, such as mass or energy conservation. Of course, one may expect a discrepancy between these predictions and the observations, due to many factors such as over-simplified model assumptions, uncertain model parameters, numerical discretization errors and aleatory variability (Quarteroni and Valli 1994; Smith 2013). Still, the physical model can provide meaningful information and, when taken into account, can potentially lead to better predictions than a pure data-driven model.

In the recent years, a number of authors presented approaches which exploit physical models to improve the modeling of spatial dependent data. A notable example is the so-called regression with differential regularization (Arnone et al. 2019; Azzimonti et al. 2015; Bernardi et al. 2017), which models spatially dependent data by means of a spatial regression penalized with a differential operator, which in turn expresses the physical information on the problem under study. In this case, the differential operator is used to support predictions in regions sparsely observed; the obtained predictions are typically biased towards the physical model solution. Another approach is illustrated in Chen and Baker (2019); Rodat et al. (2018); Yang et al. (2018, 2019), where the physical model is used to improve estimations of the spatial dependence among the (scalar) data, especially when few observations are available. In this case, the spatial covariance is estimated from multiple physical model realizations, thus avoiding the fit of a parametric model to empirical variograms or covariograms. Constantinescu and Anitescu (2013) propose a way to derive valid covariance models which incorporates physical relations between the variables in a multivariate setting. In this broad framework, none of the works presented in the literature specifically deals with problems where the response variables are functional objects, sequentially observed over time.

As a first element of innovation, we here propose an original approach, that we term Physics-based Residual Kriging (Phy-RK) for functional data, which incorporates the physical model for the phenomenon in a functional geostatistical setting. Having collected the observation of the process at a set of locations, we decouple the observed functional process in a physically-driven term and in a stochastic residual, and accordingly model the latter by means of functional Universal Kriging. We show that this method, beside being unbiased, can lead to substantial improvements, in terms of prediction error, when compared to the sole physical model or to the purely data-driven Kriging approach. Furthermore, we show that the information from the physical model can be effectively used also in the residuals modeling, through the inclusion of covariates derived from the physical model solution.

As a key additional element of novelty, we formulate the Phy-RK approach for sequential problems, namely for problems where sets of functional observations are sequentially observed along a number of time intervals. Here, the experimental design proceeds incrementally, including at each time interval a new set of observed locations, which are added to the locations observed at previous intervals. In this case, our methodology aims to address problems where (i) the sequential design itself implies a dynamic change in the process generating the data, and, consequently, (ii) repeated observations at the same locations are generated by differing random fields when collected at different time intervals. Due to the intrinsic dynamical nature of the problem, we develop an incremental formulation of the Phy-RK predictor which not only accounts for the residuals between the physics-based drift and the latest observations (first-order residuals), but also for the residuals between the previous predictors and the actual observations (higher-order residuals), allowing for a dynamic correction of Kriging predictions.

Starting from the work of Kennedy and O’Hagan (2001), a similar approach has been developed under the name of model calibration. In this framework, the parameters of a computational simulation (physical model) are estimated (calibrated), by jointly modeling its residual with a Gaussian random field. Both Bayesian (Kennedy and O’Hagan 2001) and frequentist (Tuo and Wu 2015) paradigms have been developed. In our Phy-RK approach, we adopt a different point of view, assuming that a physical model is already available, with parameters calibrated independently of the residual modelization. Furthermore, we here adopt the trace-Universal Kriging (Menafoglio et al. 2013) predictor for the residual modeling, avoiding a basis expansion approach for functional data, as in Bayarri et al. (2007); Higdon et al. (2008), which may result in very high-dimensional analyses when the basis dimension increases. Another element of novelty with respect to this literature stream relies on the sequential analysis of the prediction residuals, which is key to provide an effective modeling framework for evolving random fields. Indeed, our framework allows one to deal with problems where observations become available in a sequential manner, as in Wang et al. (2019), but, more importantly, it can also be used when it is precisely the random field generating the data that evolves along the sequential sampling of the data. The latter is the core problem which we illustrate in this work.

The Phy-RK predictor has a broad applicability whenever the phenomenon under study exhibits spatial dependence and prior information is available through a physical model, for instance in environmental sciences, and more specifically in meteorology and pollution modeling (Webster and Oliver 2007), and in public health problems (Diggle and Giorgi 2019). In this work, we embed the formulation of Phy-RK within the problem of predicting production rates, measured in STB/day, of producer wells operating in a mature conventional reservoir. Here, we aim at forecasting production rates of wells yet to be drilled at a set of locations in the reservoir, based on the observed past production of the wells which were active in previous time intervals. Production forecast is necessary to quantify investments, e.g. number and type of new wells, and to optimize field operations.

Unlike unconventional shale gas reservoirs, in which a pure geostatistical method has been successfully applied (Menafoglio et al. 2015; Xi and Morgan 2019), the application of such techniques for production forecast of wells yet to be drilled in a conventional reservoir has not been investigated so far. The main reason is that drilling a new well perturbs the pressure of the whole reservoir, changing the random field realization observed through the production rates of previously operating wells. Applying a pure geostatistical method, based on functional Kriging, would not appropriately model this dynamical nature of the process. For instance, if more than one wells were open in the successive time interval, a purely geostatistical approach would not take into account their interaction (e.g., if two wells were drilled close to each other, one would expect lower productions, due to their interaction, than placing a single well). This phenomenon is somehow negligible in an unconventional reservoir, as in this type of systems the perturbation happens at a local scale due to their peculiar rock properties. For these reasons, the standard approach to production forecast in conventional systems is the reservoir numerical simulation (Aziz and Settari 1979), where the reservoir domain is discretized and the governing equations of fluid flow in porous media are solved. Surrogate models can be developed allowing one to avoid setting many parameters, such as porosity and permeability of each cell of a fine reservoir discretization, and long execution times even on modern computational infrastructures. Some examples are the Capacitance Resistance Models (CRM, Wanderley de Holanda et al. 2018), the Interwell Numerical SImulation Model (INSIM, Zhao et al. 2015) and the recently developed FlowNet (Kiærr et al. 2020). In this context, Phy-RK could exploit, in a data-driven framework, a surrogate model, i.e. a physical model which is characterized by fewer parameters and is less computationally demanding than a full-physics 3D reservoir simulation. Furthermore, our Phy-RK methodology overcomes the above-mentioned limitations due to the static nature of a pure geostatistical approach, as the wells drilling schedule is taken into full account by the physical model.

We present two applications of Phy-RK to the problem of forecasting production rates. Both concern single phase reservoirs; for the first one, production is supported by fluid injection, whereas, for the second one, by rock compaction. In both cases, wells are incrementally drilled in the reservoir according to a schedule which consists of eight time steps. At each step, the aim is to predict the production rates of the new wells drilled in that step as well as the future production rates of those already operating, based on the production data observed up to the previous step at the wells already operating. We shall show that the Phy-RK approach leads to substantially lower prediction errors than a pure geostatistical method, which will thus be shown to be inappropriate for the prediction of production rates in a conventional reservoir.

The work is organized as follows. Section 2 introduces the mathematical formulation of Phy-RK predictor, with a focus on its sequential formulation. In Sect. 3, we adapt the general approach of Phy-RK to the prediction of production rates in a mature reservoir. Section 4 presents a first application of Phy-RK for the prediction of liquid production rates in a single-phase reservoir supported by liquid injection. Section 5 presents a second application of Phy-RK for the prediction of liquid production rates supported by rock compaction. Finally, Sect. 5.3 outlines conclusions and considerations for future development and applications.

2 Physics-based Residual Kriging predictor for functional data

2.1 Formulation of physics-based Residual Kriging

Consider a probability space \((\Omega , {\mathcal {F}}, {\mathbb {P}})\) and a (functional) random object \({\mathcal {Y}}: \Omega \rightarrow {\mathcal {H}}\), where \({\mathcal {H}}\) is a separable Hilbert space, endowed with the norm \(\Vert \cdot \Vert\) induced by the scalar product \(\langle \cdot , \cdot \rangle\). We define the random field

$$\begin{aligned} \{ {\mathcal {Y}}_{{\varvec{s}}}, {\varvec{s}} \in D \subseteq {\mathbb {R}}^d\}, \end{aligned}$$

where \({\varvec{s}}\) indicates the spatial location, included in the spatial domain D, associated with the corresponding random function \({\mathcal {Y}}_{{\varvec{s}}}\). We assume that, for each \({\varvec{s}}\), \({\mathcal {Y}}_{{\varvec{s}}} \in L^2(\Omega ; {\mathcal {H}})\), that is \(\int _{\Omega } \Vert {\mathcal {Y}}_{{\varvec{s}}}\Vert ^2 {\mathbb {P}}(d\omega ) = {\mathbb {E}}\left[ \Vert {\mathcal {Y}}_{{\varvec{s}}}\Vert ^2\right] < \infty\). Our goal is the prediction of the functional variable at location \({\varvec{s}}_0\), having observed the random field at locations \(\{{\varvec{s}}_1, \dots , {\varvec{s}}_n\}\).

We suppose that the phenomenon under investigation can be described by means of a deterministic model, able to approximately predict \({\mathcal {Y}}_{{\varvec{s}}}\). We call these predictions \({\mathcal {F}}_{{\varvec{s}}}({\varvec{\theta }})\), where \({\varvec{\theta }}\in \Theta \subset {\mathbb {R}}^m\) are the model parameters, and we decompose the functional variables \({\mathcal {Y}}_{{\varvec{s}}}\) as

$$\begin{aligned} {\mathcal {Y}}_{{\varvec{s}}} = {\mathcal {F}}_{{\varvec{s}}}({\varvec{\theta }}) + {\mathcal {X}}_{{\varvec{s}}}, \end{aligned}$$

being \({\mathcal {X}}_{{\varvec{s}}}\) the stochastic residual from the deterministic model prediction.

In this work, we focus our attention on physics-based models, mathematically formulated by means of Partial Differential Equations (PDEs), which express physical principles, such as mass or energy conservation, into the mathematical relationship between partial derivatives of functions (for an introduction see, e.g., Evans 2010; Salsa 2015). In particular, we formulate the physical model as the differential problem

$$\begin{aligned} L({\varvec{\theta }}) u({\varvec{\theta }}) = g({\varvec{\theta }}) \quad \text {in}\,\, D, \end{aligned}$$
(1)

where \(u({\varvec{\theta }})\) is the solution, \(L({\varvec{\theta }})\) is a differential operator and \(g({\varvec{\theta }})\) is the forcing term, which all depend on the vector of parameters \({\varvec{\theta }}\). The determination of these parameters is out of the scope of this work, although our method can partially correct their misspecification (as discussed in Ssect. 4.4). We assume that problem (1), coupled with appropriate boundary conditions on the boundary of the domain D, \(\partial D\), and, if time-dependent, with an initial condition, is well-posed. In particular, we require that problem (1) admits a unique weak solution \(u({\varvec{\theta }}) \in V\) for each \({\varvec{\theta }}\in \Theta\), where V is an appropriate Hilbert space. The physical model prediction is thus given by

$$\begin{aligned} {\mathcal {F}}_{{\varvec{s}}}({\varvec{\theta }}) = F_s u({\varvec{\theta }}), \end{aligned}$$
(2)

where \(F_s: V \rightarrow {\mathcal {H}}\) is a continuous bounded operator which maps the PDE solution into the physical model prediction at location \({\varvec{s}} \in D\). Although we restrict ourselves to PDE models, this is not a limitation of our approach and more general models can be considered.

We combine the deterministic model (2) with a geostatistical model for the residual random field \(\{ {\mathcal {X}}_{{\varvec{s}}}, {\varvec{s}} \in D \subseteq {\mathbb {R}}^d\}.\) Indeed, the residual random field might exhibit spatial dependence, which results from the spatial structure that was already present in the random field \({\mathcal {Y}}_{{\varvec{s}}}\) but not completely captured by the physical model. This can be due, for instance, to simplified assumptions of the physical model, to numerical discretization errors in the PDE solution or to the presence of additional influent variables not included in the physical model. We point out that, in the residual part, physics can still play a role, which however is modeled following a geostatistical approach.

In this work, we rely on O2S2, with particular reference to Universal Kriging (UK) for functional observations (Menafoglio et al. 2013). This geostatistical framework allows one to predict the (functional) residual at unsampled locations within the domain, by modeling the spatial dependence of the residual field, then building, on this basis, the Universal Kriging predictor. We thus define the expected value \(m_{{\varvec{s}}}\) of the residual random field as

$$\begin{aligned} m_{{\varvec{s}}} = \int _{\Omega } {\mathcal {X}}_{{\varvec{s}}}(\omega ){\mathbb {P}}(d\omega ), \quad {\varvec{s}} \in D, \end{aligned}$$

the integral being defined in the Bochner sense, and the global covariance function (also called trace-covariogram) \(C: D \times D \rightarrow {\mathbb {R}}\), as

$$\begin{aligned} C({\varvec{s}}_i, {\varvec{s}}_j) = \text {Cov}({\mathcal {X}}_{{\varvec{s}}_i}, {\mathcal {X}}_{{\varvec{s}}_j}) = {\mathbb {E}}\left[ \langle {\mathcal {X}}_{{\varvec{s}}_i} - m_{{\varvec{s}}_i}, {\mathcal {X}}_{{\varvec{s}}_j} - m_{{\varvec{s}}_j}\rangle \right] . \end{aligned}$$

Note that the covariance function C allows one to quantify the spatial dependence between two elements of the field corresponding to two locations in the spatial domain. Recall that, in a functional setting (Menafoglio et al. 2013), a random field \(\{{\mathcal {X}}_{{\varvec{s}}}, {\varvec{s}} \in D \subseteq {\mathbb {R}}^d\}\) is said to be globally second-order stationary if \({\mathbb {E}}[{\mathcal {X}}_{{\varvec{s}}}] = m\) for all \({\varvec{s}} \in D\) and \(\text {Cov}({\mathcal {X}}_{{\varvec{s}}_i}, {\mathcal {X}}_{{\varvec{s}}_j}) = {\mathbb {E}}[\langle {\mathcal {X}}_{{\varvec{s}}_i} - m_, {\mathcal {X}}_{{\varvec{s}}_j} - m \rangle ] = C({\varvec{h}}_{i,j})\) for all \({\varvec{s}}_i, {\varvec{s}}_j \in D\), with \({\varvec{h}}_{i,j} = {\varvec{s}}_i -{\varvec{s}}_j\) (i.e. the separating vector between two locations). Furthermore, a second-order stationary random field is isotropic if \(\text {Cov}({\mathcal {X}}_{{\varvec{s}}_i}, {\mathcal {X}}_{{\varvec{s}}_j}) = C(h_{i,j})\), where \(h_{i,j} = \Vert {\varvec{h}}_{i,j} \Vert\) is the distance between \({\varvec{s}}_i\) and \({\varvec{s}}_j\) in D. Although the isotropy assumption can be easily weakened, we stick to it for ease of notation.

We assume that the random field \(\{ {\mathcal {X}}_{{\varvec{s}}}, {\varvec{s}} \in D \subseteq {\mathbb {R}}^d\}\) can be decomposed as the sum of a non-stationary mean and a stationary residual, i.e.

$$\begin{aligned} {\mathcal {X}}_{{\varvec{s}}} = m_{{\varvec{s}}} + \delta _{{\varvec{s}}}, \end{aligned}$$
(3)

where \(m_{{\varvec{s}}}\) is the mean term (a.k.a. drift) and \(\delta _{{\varvec{s}}}\) is a zero-mean, second-order stationary and isotropic residual random field. For the mean term, we assume a linear model

$$\begin{aligned} m_{{\varvec{s}}} = \sum _{i=0}^L \beta _l f_l({\varvec{s}}), \end{aligned}$$
(4)

where \(f_0\) is the intercept (i.e, \(f_0({\varvec{s}}) = 1\) for all \({\varvec{s}} \in D\)), \(f_1({\varvec{s}}), \dots , f_L({\varvec{s}}) \in {\mathbb {R}}\) are known spatially dependent covariates and \(\beta _0(\cdot ), \dots , \beta _L(\cdot )\) are functional parameters in \({\mathcal {H}}\), which do not depend on the spatial location. The model (3) could be generalized by considering a non-stationary residual random field \(\delta _{{\varvec{s}}}\). However, the estimation of the covariance function in this case would be impractical in presence of few observations. A possible solution is represented by the Physics-informed Kriging (Yang et al. 2018), where the covariance function is estimated from stochastic simulations of a physical model.

In practice, having collected n observations \(\{{\mathcal {X}}_{{\varvec{s}}_1}, \dots , {\mathcal {X}}_{{\varvec{s}}_n}\}\) of the residual random field at locations \(\{{\varvec{s}}_1, \dots , {\varvec{s}}_n\}\)—obtained as differences between the observed \(\{{\mathcal {Y}}_{{\varvec{s}}_1}, \dots ,{\mathcal {Y}}_{{\varvec{s}}_n}\}\) and the purely physics-based predictions \(\{{\mathcal {F}}_{{\varvec{s}}_1}, \dots ,{\mathcal {F}}_{{\varvec{s}}_n}\}\) – our goal is the prediction of the variable \({\mathcal {Y}}_{{\varvec{s}}_0}\) at the unsampled location \({\varvec{s}}_0\) in D. It is computed as the sum of the physical model prediction \({\mathcal {F}}_{{\varvec{s}}_0}\) and the residual \(\hat{{\mathcal {X}}}_{{\varvec{s}}_0}\) predicted by Universal Kriging, which is the best linear unbiased predictor, of the form:

$$\begin{aligned} \hat{{\mathcal {X}}}_{{\varvec{s}}_0} = \sum _{i=1}^n {\hat{\lambda }}_i {\mathcal {X}}_{{\varvec{s}}_i}, \end{aligned}$$
(5)

where the optimal weights \(({\hat{\lambda }}_1, \dots , {\hat{\lambda }}_n)\) solve the optimization problem:

$$\begin{aligned} \left( {\hat{\lambda }}_1, \dots , {\hat{\lambda }}_n\right)&= \mathop {\mathrm {arg\,min}}\limits _{\lambda _1, \cdots , \lambda _n \in {\mathbb {R}}} \text {Var}\left( \sum _{i=1}^n \lambda _i \,{\mathcal {X}}_{{\varvec{s}}_i}-{\mathcal {X}}_{{\varvec{s}}_0}\right) \nonumber \\&\quad \text {s.t.} \quad {\mathbb {E}}\left[ \sum _{i=1}^n \lambda _i \,{\mathcal {X}}_{{\varvec{s}}_i}\right] = m_{{\varvec{s}}_0}. \end{aligned}$$
(6)

Thus, the weights are chosen by minimizing the variance of the prediction error, subject to the (uniform) unbiasedness constraint. Under the assumption that the covariance function C is known, the optimization problem (6) can be equivalently formulated in terms of the following linear system

$$\begin{aligned} & \begin{pmatrix} C(0) & \cdots &C(h_{1,n})&f_0({\varvec{s}}_1)&\cdots &f_L({\varvec{s}}_1)\\ \vdots &\ddots &\vdots &\vdots &\vdots &\vdots \\ C(h_{n,1})&\cdots &C(0)&f_0({\varvec{s}}_n)&\cdots &f_L({\varvec{s}}_n)\\ f_0({\varvec{s}}_1)&\cdots &f_0({\varvec{s}}_n)&0&\cdots &0\\ \vdots &\ddots &\vdots &\vdots &\vdots &\vdots \\ f_L({\varvec{s}}_1)&\cdots &f_L({\varvec{s}}_n)&0&\cdots &0\\ \end{pmatrix} \begin{pmatrix} \lambda _1\\ \vdots \\ \lambda _n\\ \mu _0\\ \vdots \\ \mu _L\\ \end{pmatrix} \\&\quad = \begin{pmatrix} C(h_{0,1})\\ \vdots \\ C(h_{0,n})\\ f_0({\varvec{s}}_0)\\ \vdots \\ f_L({\varvec{s}}_0)\\ \end{pmatrix} ,\end{aligned}$$
(7)

where \((\mu _0, \cdots , \mu _L)\) are the Lagrange multipliers associated with the L linear constraints derived from the unbiasedness constraint in (6), see Menafoglio et al. (2013) for further details. Solving the linear system, we obtain the optimal weights and, thus, the corresponding prediction \(\hat{{\mathcal {X}}}_{{\varvec{s}}_0}\). Eventually, we compute the Physics-based Residual Kriging (Phy-RK) prediction for the functional variable \({\mathcal {Y}}_{{\varvec{s}}_0}\) as

$$\begin{aligned} \hat{{\mathcal {Y}}}_{{\varvec{s}}_0} = {\mathcal {F}}_{{\varvec{s}}_0}({\varvec{\theta }}) + \hat{{\mathcal {X}}}_{{\varvec{s}}_0}. \end{aligned}$$
(8)

In applications, one usually does not know the trace-covariogram C and thus needs to estimate it from the observations. This is done by estimating the trace-semivariogram \(\gamma (h_{i,j})=C(0) - C(h_{i,j})\), in two steps. Firstly, a method-of-moments estimator is used (i.e., a binned trace-variogram, see, e.g., Menafoglio et al. 2013), then a valid parametric variogram model (e.g., spherical, Matérn, see Cressie 2015) is fitted by optimizing its parameters in, e.g., a least-squares sense. This procedure is iterated until convergence, as one needs to compute the residuals \(\delta _{{\varvec{s}}}\) having estimated the coefficients \(\beta _0, \dots , \beta _L\) through Generalized Least Squares, as explained in Menafoglio et al. (2013). Finally, the trace-covariogram is easily obtained from the trace-semivariogram similarly as in classical geostatistics. Note that, in the setting of our work, the spatial dependence is estimated from the observations and not from multiple realizations of the physical model, as in Chen and Baker (2019); Rodat et al. (2018); Yang et al. (2018), because it is referred to the residuals from the physics model predictions (2). This is compatible with the simplified nature of our physical model (1), and allows us to reduce the overall computational burden of the procedure, as one avoids to repeatedly solve the PDE used to model the phenomenon under study.

2.2 Residual Kriging for sequential predictions

In this subsection, we extend the general Physics-based Residual Kriging method presented in Sect. 2.1 for tackling the problem of sequential prediction. In this context, let us consider the whole time domain \([t_0, \,t_{N_a} = T]\) and subdivide it into \(N_a\) activation time intervals, indicating the subintervals with \(a_i=[t_{i-1}, t_{i})\), for \(i=1,\dots ,N_a\), \(\cup _{i=1}^{N_a} a_i = [t_0, T]\). We assume that, in each interval \(a_i\), a realization of the functional-valued random field \(\{ {\mathcal {Y}}_{{\varvec{s}}, a_i}, {\varvec{s}} \in D \subseteq {\mathbb {R}}^d\}\), which depends on the interval, has been sampled at a set of locations \(W_i=\left\{ {\varvec{s}}^i_j\right\} _{j=1}^{n_i}\) in the spatial domain D. From now on, we omit the apex i of the locations, implicitly assuming that they belong to the set \(W_i\) specified by the activation interval \(a_i\). Furthermore, each random field is characterized by a covariogram \(C_{a_i}\), which can vary among different intervals. This mathematical setting describes, for instance, the prediction of production rates in a conventional reservoir. Indeed, we observe the production rates of currently operating wells in the time interval \(a_i\), but when additional wells are drilled, the whole reservoir is perturbed, leading to a possibly different random field in the next interval \(a_{i+1}\).

Given a location \({\varvec{s}}_0\) sampled during the interval \(a_i\) with \(i\ge 2\), our goal is to predict the corresponding functional observation, given the functional data sampled in the interval \(a_{i-1}\). We embed the general Phy-RK predictor (8) in this setting, by defining the predictor

$$\begin{aligned} \hat{{\mathcal {Y}}}_{{\varvec{s}}_0, a_i}= {\mathcal {F}}_{{\varvec{s}}_0, a_i}({\varvec{\theta }}) + \hat{{\mathcal {X}}}_{{\varvec{s}}_0, a_i}, \end{aligned}$$
(9)

which is obtained as the sum of the physics-based prediction \({\mathcal {F}}_{{\varvec{s}}_0, a_i}({\varvec{\theta }})\) in the interval \(a_i\) with wells configuration \(W_i\), and the residual predicted by functional Universal Kriging (5). In particular, the residual predictor has the following form

$$\begin{aligned} \hat{{\mathcal {X}}}_{{\varvec{s}}_0, a_i} = \sum _{j: {\varvec{s}}_j \in W_{i-1}} {\hat{\lambda }}_{j, a_i}{\mathcal {X}}_{{\varvec{s}}_j, a_{i-1}}, \end{aligned}$$
(10)

where the optimal weights \({\hat{\lambda }}_{j, a_i}\) are found by solving problem (6) based on the information available at the latest step (i.e., \({\mathcal {X}}_{{\varvec{s}}_j,a_{j-1}}, {\varvec{s}}_j\in W_{i-1}\)). Note that, in general, the intervals \(a_i\) and \(a_{i-1}\) may have different lengths. This implies that, if interval \(a_{i-1}\) is longer than interval \(a_i\), the prediction \(\hat{{\mathcal {X}}}_{{\varvec{s}}_0, a_i}\) will be defined on the whole interval \(a_i\). If this is not the case, the prediction \(\hat{{\mathcal {X}}}_{{\varvec{s}}_0, a_i}\) will be defined only on the the initial part of the interval \(a_i\) (i.e., on an interval domain whose length coincides with that of \(a_{i-1}\)). Moreover, we point out that the residuals at the locations that were observed in \(a_{i-1}\) are predicted precisely as the observed residuals in the previous time interval, i.e.,

$$\begin{aligned} \hat{{\mathcal {X}}}_{{\varvec{s}}_j, a_i} = {\mathcal {X}}_{{\varvec{s}}_j, a_{i-1}} \quad {\varvec{s}}_j \in W_{i-1}, \end{aligned}$$

due to the interpolating property of Kriging. However, note that, in general, the Phy-RK prediction \(\hat{{\mathcal {Y}}}_{{\varvec{s}}_j, a_i}\) will not coincide with the response \({\mathcal {Y}}_{{\varvec{s}}_j, a_{i-1}}\) observed in the previous interval, as the physics-driven prediction \({\mathcal {F}}_{{\varvec{s}}_0, a_i}({\varvec{\theta }})\) evolves along the intervals. In practice, to build predictor (10), we would need to estimate the covariogram \(C_{a_i}\) from the residuals of interval \(a_i\), which however are not available at the beginning of interval \(a_i\). Therefore, we employ the estimate obtained with residuals observed in interval \(a_{i-1}\), implicitly assuming that the covariogram does not abruptly vary between consecutive intervals. In fact, the viability of using the predictive model (9) in a sequential framework itself depends on the stability of the process over time—both in terms of the physics-based model and residuals field.

2.3 Physics-based Residual Kriging with sequential update

If the residuals among consecutive activation intervals exhibit a strong variation, due for instance to the strong evolutionary nature of the considered problem (as the one illustrated in Sect. 5), predictor (9) might be unable to adapt to the dynamic of the process itself. In this case, we may introduce additional terms, dynamically updated, to further correct the predictive model based on the misfit between observations and predictions in the previous activation interval. We illustrate the construction of such corrective terms in the first activation time intervals, the following intervals representing just a straightforward generalization.

Predictor (9) can be used to produce predictions starting from the second interval \(a_2\), as one needs to observe the residuals in \(a_1\) to be able to predict them in \(a_2\). If we now move to the interval \(a_3\), we can predict two residuals: (i) the residuals from the physical model \({\mathcal {X}}^{(1)}_{{\varvec{s}}_j, a_2}\), and (ii) the residuals from the predictive model used to carry out the forecast at the previous step, i.e, the one formulated in (9). These latter residuals, denoted as \({\mathcal {X}}^{(2)}_{{\varvec{s}}_j, a_2}\), are informative on the possible misfit of the latest available predictive model which may represent repeatable effects potentially observable in successive time instants. In fact, the second-order residuals \({\mathcal {X}}^{(2)}_{{\varvec{s}}_j, a_2}\) can be used to update the predictions in the third activation interval, correcting them by the observed misfits in the second interval.

Thus, for third activation interval, we can build a corrected predictor

$$\begin{aligned} \hat{{\mathcal {Y}}}_{{\varvec{s}}_0, a_3}= {\mathcal {F}}_{{\varvec{s}}_0, a_3} + \hat{{\mathcal {X}}}^{(1)}_{{\varvec{s}}_0, a_3} + \hat{{\mathcal {X}}}^{(2)}_{{\varvec{s}}_0, a_3}, \end{aligned}$$

where the two predicted residuals \(\hat{{\mathcal {X}}}^{(1)}_{{\varvec{s}}_0, a_3}\) and \(\hat{{\mathcal {X}}}^{(2)}_{{\varvec{s}}_0, a_3}\) are obtained through functional Kriging of the latest available observations of first- and second-order residuals, i.e.,

$$\begin{aligned} \hat{{\mathcal {X}}}^{(1)}_{{\varvec{s}}_0, a_3}&= \sum _{j: {\varvec{s}}_j \in W_{2}} {\hat{\lambda }}^{(1)}_{j, a_3}{\mathcal {X}}^{(1)}_{{\varvec{s}}_j, a_2}, \\ \hat{{\mathcal {X}}}^{(2)}_{{\varvec{s}}_0, a_3}&= \sum _{j: {\varvec{s}}_j \in W_{2}} {\hat{\lambda }}^{(2)}_{j, a_3}{\mathcal {X}}^{(2)}_{{\varvec{s}}_j, a_2}, \end{aligned}$$

with \({\mathcal {X}}^{(1)}_{{\varvec{s}}_j, a_2} = {\mathcal {Y}}_{{\varvec{s}}_j, a_2} - {\mathcal {F}}_{{\varvec{s}}_j, a_2}\) and \({\mathcal {X}}^{(2)}_{{\varvec{s}}_j, a_2} = {\mathcal {Y}}_{{\varvec{s}}_j, a_2} - {\mathcal {F}}_{{\varvec{s}}_j, a_2}- \hat{{\mathcal {X}}}^{(1)}_{{\varvec{s}}_j, a_2}\). In practice, one can decide how complex the predictive model should be (i.e., how many residuals to include): (i) no residuals (Phy-RK-0), i.e. the prediction is determined by the physical model only (the only possible predictive model at step \(a_1\)), (ii) one residual (Phy-RK-1), correcting the prediction of the physical model with first-order residuals (the complete predictive model at step \(a_2\)) or (iii) two residuals (Phy-RK-2), further correcting the predictive model in \(a_2\) with second-order residuals. This set of choices is depicted in Fig. 1, where these possibilities at a generic location \({\varvec{s}}_0\) are shown. In the first interval, the only possible prediction is \({\mathcal {F}}_{{\varvec{s}}_0, a_1}\), i.e., the physical model prediction; no residuals can be used (Phy-RK-0, green line). At the end of the first interval, having measured the functional observation in \({\varvec{s}}_0\), \({\mathcal {Y}}_{{\varvec{s}}_0, a_1}\), one can note, for instance, that the physical model actually overestimates the response in \(a_1\), producing a (positive) residual (represented as green arrows). On this basis, one can choose whether to correct the predictive model based on the physics by including the first order residuals. This would yield a corrected prediction (Phy-RK-1, blue line). In the third interval, one can choose between the physical model (Phy-RK-0, green line), one residual (Phy-RK-1, blue line) and two residuals (Phy-RK-2, dark yellow line) predictions, based on the observed misfit between the observations and the two possible predictive models at the previous step (Phy-RK-0, or Phy-RK-1). These residuals are represented as colored arrows (green for Phy-RK-0, blue for Phy-RK-1).

It is then clear that, at each interval, an additional residual is observed. Indeed, at the end of the i-th interval, one observes residuals up to order i, although up to (\(i-1\)) residuals could be used for the prediction in the same interval, because the residuals prediction at interval \(a_i\) was based only on the residuals available at the end of \(a_{i-1}\). Note also that the residuals used for prediction at the i-th interval are those available at the latest interval (\(i-1\)), and not those at previous intervals. Indeed, \(\{{\mathcal {X}}_{{\varvec{s}}_1}, \dots , {\mathcal {X}}_{{\varvec{s}}_n}\}\) are here interpreted as the residuals of the predictive models which can be used to perform prediction, rather than the residuals of the actual previous predictions. This is consistent with the dynamic modeling of the system, according to which previous observations of the residuals become soon obsolete as the schedule proceeds, and thus, poorly informative due to the phenomenon dynamics.

Iterating the same argument in the following intervals by adding higher-order residual terms as soon as they become available, the predictor at the i-th step can be defined as

$$\begin{aligned} \hat{{\mathcal {Y}}}_{{\varvec{s}}_0, a_i}= {\mathcal {F}}_{{\varvec{s}}_0, a_i} + \sum _{k=1}^{K_i} \hat{{\mathcal {X}}}^{(k)}_{{\varvec{s}}_0, a_i}, \end{aligned}$$
(11)

where \(K_i \le i - 1\) is the number of residual terms modeled in the i-th activation interval. Note that, if no residual terms (hereafter denoted by \(K_i = 0\)) are taken into account, the prediction coincides with that generated by the sole physical model. Each residual is predicted as

$$\begin{aligned} \hat{{\mathcal {X}}}^{(k)}_{{\varvec{s}}_0, a_i} = \sum _{j: {\varvec{s}}_j \in W_{i-1}} {\hat{\lambda }}^{(k)}_{j, a_i} {\mathcal {X}}^{(k)}_{{\varvec{s}}_j, a_{i-1}}, \end{aligned}$$

where the observed residuals are given by

$$\begin{aligned} {\mathcal {X}}^{(k)}_{{\varvec{s}}_j, a_{i-1}} = {\mathcal {Y}}_{{\varvec{s}}_j, a_{i-1}}- {\mathcal {F}}_{{\varvec{s}}_j, a_{i-1}} - \sum _{l=1}^{k-1} \hat{{\mathcal {X}}}^{(l)}_{{\varvec{s}}_j, a_{i-1}}. \end{aligned}$$

The optimal weights \({\hat{\lambda }}^{(k)}_{j, a_i}\) are found by solving the Kriging system (7), in which the covariogram \(C_{a_i}\) is estimated from the residuals observed in the previous interval. Thus, it can differ among different time intervals and different residual terms.

The parameter \(K_i\) is chosen by selecting the number of residuals which produces the lowest average error in the previous interval. This procedure allows us to avoid overfitting, as, in a sequential framework, the choice of \(K_i\) in \(a_i\) is based on a test set disjoint from the available observations (i.e., the observations in the interval \(a_{i-1}\)). Thus, \(K_i\) might vary among the intervals, as we further discuss in Sects. 4 and 5.

Fig. 1
figure 1

Graphical illustration of model (11) in the first three activation time intervals. Arrows represent the observed residuals in each time interval, computed with respect to the model of the corresponding color

3 Prediction of production rates in a mature reservoir

In this work, we apply the Physics-based Residual Kriging methodology illustrated in Sect. 2 for the prediction of liquid production rates of wells operating in a mature reservoir. In particular, having observed the production rates, which are time-dependent functional data, of some operating wells during a certain time interval, our goal is to forecast the production rates of existing and newly drilled wells in the subsequent time interval.

Referring to the mathematical formulation presented in Sect. 2.3, we thus indicate with \({\mathcal {Y}}_{{\varvec{s}}, a_i}\) the production rate over time of a well placed at location \({\varvec{s}} \in D\) operating in interval \(a_i\), and with \({\mathcal {F}}_{{\varvec{s}}, a_i}({\varvec{\theta }})\) the production rate predicted by a physical model suitable for the specific application, as those described in Sects. 4.2 and 5.2. Besides the physical parameters, the vector \({\varvec{\theta }}\) includes the wells drilling schedule. We assume that wells are open or closed in \(N_a\) time instants, \((t_0, \dots , t_{N_a-1})\), and, within any time interval \(a_i=[t_{i-1}, t_{i})\), for \(i=1,\dots ,N_a\), the well configuration does not change. \(W_i\) is the set of locations of the active producers in the interval \(a_i\).

In the context of a mature field, historical data are abundant, making possible to exploit spatial dependence among production rates in the prediction step. We present two applications in Sects. 4 and 5. In both cases, we aim to predict the production rates of a single-phase reservoir. However, in the first case the production is driven by injection, whereas in the second case the production is driven by rock compaction.

3.1 Governing equations of single-phase flow in a reservoir

We present the general equation for single-phase in a porous medium, which provides the basis for the physical models used in the two test cases described in Sects. 4 and 5. A petroleum reservoir is a sub-surface region containing hydrocarbons trapped in porous rocks. Although a reservoir usually contains water, oil and, depending on the temperature and pressure conditions, gas, in this work we restrict our analysis to single-phase isothermal flow conditions where production occurs because fluid is injected in the reservoir or because a combination of fluid expansion and rock compaction provides the necessary energy, see Dake (1978) for a review of these mechanisms. Neglecting poromechanics coupling, a single phase fluids flow in the reservoir is modeled according to the mass conservation PDE

$$\begin{aligned} -\nabla \cdot (\rho {\varvec{u}}) - {\tilde{m}} = \frac{\partial }{\partial t}(\phi \rho ), \end{aligned}$$

where \(\rho =\rho (p)\) is the mass density of the fluid, \(\phi =\phi (p)\) is the porosity, p is the pressure, \(\mathbf {u}\) is the Darcy’s velocity which can be related to fluid pressure, according to Bear (2013), by the Darcy’s equation

$$\begin{aligned} {\varvec{u}} = - \frac{1}{\mu } {\varvec{k}} \cdot \nabla p, \end{aligned}$$
(12)

where \(\mu =\mu (p)\) is the fluid viscosity and \(\mathbf {k}\) is the permeability tensor, which is a local rock property that measures the ability of the fluid to flow through it. Combining the two equations we get the pressure equation

$$\begin{aligned} \nabla \cdot \left( \frac{\rho }{\mu } {\varvec{k}} \cdot \nabla p \right) - {\tilde{m}} = \frac{\partial }{\partial t}(\phi \rho ). \end{aligned}$$
(13)

Solving Equation (13) for pressure requires the definition of a thermodynamic model, \(\rho =\rho (p)\), and a compaction model, \(\phi =\phi (p)\). These closure models depend on rock and fluid characteristics and are, thus, case-specific. Then, the time evolution of the fluid can be simulated provided that initial and boundary conditions are defined. In this work, we assume, as initial conditions, the equilibrium state of the reservoir before production and, as boundary conditions, that our reservoir is completely sealed from the rest of the subsurface. Furthermore, the mass source rate \({\tilde{m}}\) can be written as the sum of the contributions of the \(N_w\) wells, i.e. \({\tilde{m}}=\sum _{j=1}^{N_w} {\tilde{m}}_{j}(t)\delta ({\varvec{s}}-{\varvec{s}}^w_j)\), where \(\delta ({\varvec{s}})\) is the Dirac’s function, and \({\varvec{s}}^w_j \in D\), \({\tilde{m}}_j(t)\) are the positions and the mass rate produced/injected at time t at the j-th well. We assume that, in the development of the field, new wells can be drilled or existing wells can be closed in the reservoir.

The solution of equation (13) can be numerically computed under specific initial and boundary conditions and proper closure relation using a general purpose numerical simulator, such as OPM (Flemisch et al. 2011) or ECLIPSE (Schlumberger 2019), which essentially follows the techniques described in Aziz and Settari (1979); Lie (2019); Peaceman (1977): finite volume discretization of mass conservation equations on hexaedrons, fully implicit time-marching scheme for average fluid properties (cell pressure values) and well models based on the Peaceman’s formulation (Peaceman et al. 1983) which links inflow/outflow rates, cell pressure and bottom-hole fluid pressure \(p_{wf}\).

4 Single phase flow driven by injection

In this first test case, we consider a single-phase flow in a reservoir, in which the production is driven by fluid injection. Albeit being not realistic in the context of hydrocarbon recovery, as the liquid production is usually not driven by the injection of the same fluid, this situation can appear in the context of groundwater aquifer management (Jakeman et al. 2016) and it allows us to show the effectiveness of our approach in a relatively simple test case.

4.1 Dataset description

In order to ensure full disclosability, we analyze the production data produced by a synthetic reservoir simulated using OPM. The reservoir simulator OPM is employed only for the data generation and it does not play any role in the predictive model. Hereafter, we describe the characteristics of the synthetic reservoir in terms of geometry, fluid properties and well schedule. We consider a reservoir of dimensions 25600ft, 25600ft and 50ft along x-axis, y-axis and z-axis respectively, discretized with \(128 \times 128 \times 1\) cells, whose dimensions are 200ft, 200ft and 50ft, respectively. We generate the reservoir porosity \(\phi _0\) as the logistic transformation of a realization of a random field with isotropic spherical variogram with range \(R=4000\) ft, sill \(\sigma ^2= 0.1\) and zero nugget, which was linearly scaled in the range [0.1, 0.3]. We build the permeability tensor \({\varvec{k}}\) as diagonal with components \(k_x = k_y = k\), where k, expressed in milliDarcy, is obtained as \(\log k = \log 40 + 5(\phi _0-0.1)\).

The resulting porosity and permeability fields are reported in the supplementary material in Figure S1. In this case the compaction model is simply \(\phi =\phi _0 [1+c_{rock}(p-p_{rock})]\) while the water density is \(\rho =\rho _0 [1+c_{w}(p-p_{w})]\). More specifically, \(c_{rock}=9.8141 \cdot 10^{-7}\) psi\(^{-1}\), \(c_{w}=2.74 \cdot 10^{-6}\) psi\(^{-1}\), \(p_{rock}=5801.51\) psi and \(p_{w}=3118.3\) psi. Fluid viscosity \(\mu\) is set to \(\mu = 0.39851\) cP. A total number of 256 injectors and 225 producers are drilled in the reservoir in \(N_a=8\) activation time intervals of equal length. During each activation instant \(t_i=0,\dots ,7\), a group of injectors and producers is drilled, adding them to the already operating wells. The wells are placed according to a five-spot pattern, and their activation schedule is shown in Fig. 2. We impose a bottom hole pressure of 3000 psi for producers and of 5000 psi for injectors. The reservoir was implemented in OPM, obtaining the liquid production rates shown in Fig.  2.

Fig. 2
figure 2

Simulated production rates and schedule for the reservoir described in Sect. 4.1. Top panel: blue lines represent the production rates corresponding to newly drilled wells in each activation interval, gray lines correspond to wells already operating. Bottom panel: blue circles represent the newly drilled producers, red crosses the newly drilled injectors and grey symbols the already operating wells

4.2 Physical model

We now describe the physical model which shall be employed for this test case, analogously to Aziz and Settari (1979); Peaceman (1977). Even though fluid and rock in the reference model simulated using OPM are compressible, in the logic of our method we implement an approximate solution where fluid and rock are incompressible, so that \(\rho\) is constant and \(\phi\) does not play any role. Thus, equation (13) reduces to

$$\begin{aligned} {\left\{ \begin{array}{ll} \nabla \cdot \left( \frac{1}{\mu } {\varvec{k}} \cdot \nabla p \right) = {\tilde{m}}/\rho = q \quad & \text {in} \,\,D \\ \partial _n p = 0 \quad & \text {on} \,\,\partial D \end{array}\right. }, \end{aligned}$$
(14)

which can be seen as a conservation equation for the fluid volume at reservoir conditions. The boundary condition ensures zero net flow through the reservoir boundaries.

Equation (14) is solved by means of a finite volume discretization technique (Aziz and Settari 1979). We employ a two-dimensional uniform grid, which consists of rectangular blocks of dimensions \((\varDelta x, \varDelta y)\) and we indicate with H the reservoir thickness. We use the same spatial discretization as the reservoir simulator described in Sect. 4.1, thus \((\varDelta x, \varDelta y) = (200 ft, 200ft)\) and \(H=50ft\). The numerical method results in solving a linear system of the form

$$\begin{aligned} A {\varvec{p}} = {\varvec{b}}, \end{aligned}$$
(15)

where A is a pentadiagonal matrix, \({\varvec{p}}\) contains the unknown pressure numerical approximation in each block and \({\varvec{b}}\) is the right-hand side which is zero everywhere except for the elements corresponding to blocks containing a well. Details are given in the supplementary material, Section S1. The linear system (15) gives, in each activation interval, an approximation \({\varvec{p}}\) of the reservoir pressure p. Then, the prediction \({\mathcal {F}}_{{\varvec{s}}_0, a_i}({\varvec{\theta }})\) is given by (14), where, with reference to equation (2), \(u({\varvec{\theta }})\) is approximated by the vector \({\varvec{p}}\) and \(F_{{\varvec{s}}}\) is the Peaceman’s well model (Peaceman et al. 1983) applied to the approximated pressure \({\varvec{p}}\) in the well location. The model parameters \({\varvec{\theta }}\) are the permeability tensor \({\varvec{k}}\), the fluid viscosity \(\mu\), the active wells locations \({\varvec{s}}^w_j\), the wells connection factors \(T_w\) and the bottom hole pressures \(p^{\text {bhp}}_w\). In this test case, physical model predictions \({\mathcal {F}}_{{\varvec{s}}_0, a_i}\) are constant in the whole activation interval, being equation (14) time-independent.

4.3 Results and discussion

We now apply Phy-RK for the prediction of the liquid production rates simulated as described in Sect. 4.1, employing the physical model described in Sect. 4.2. For each activation phase, we compute the physical model predictions solving equation (14). The residuals are then predicted based only on the residuals of the previous activation interval according to model (11). We point out that, for each interval, we ignore the initial part of the data, i.e. the part which connects zero to the first observed value in the interval, as it is due only to the linear interpolation we perform on the simulated data to obtain a uniform time sampling.

We measure the relative error E between the observed data \({\mathcal {Y}}_{{\varvec{s}}_0, a_i}\) and the prediction \(\hat{{\mathcal {Y}}}_{{\varvec{s}}_0, a_i}\) in the activation interval \(a_i\) as

$$\begin{aligned} E_i = \frac{\int _{a_i} |{\mathcal {Y}}_{{\varvec{s}}_0, a_i} - \hat{{\mathcal {Y}}}_{{\varvec{s}}_0, a_i}|}{\int _{a_i} |{\mathcal {Y}}_{{\varvec{s}}_0, a_i}|}. \end{aligned}$$

The relative error \(E_i\) is dimensionless and expresses the cumulative absolute difference between the observed and the predicted rates, normalized by the total cumulative observed production.

We compare four approaches derived from the general predictor (11). The first one is the pure Ordinary Kriging (OK), in which functional Kriging, with only the intercept in the drift linear model (4), is employed, setting to zero the physical model predictions. The second one is the pure physical model (Phy), in which the predictions are \({\mathcal {F}}_{{\varvec{s}}_0, a_i}\), setting to zero the predicted residuals. Moreover, two versions of the Physics-based Residual Kriging are tested, in which the method described in Sect. 2.2 is applied with two different drift terms respectively: for the Physics-based Residual Ordinary Kriging (Phy-ROK) we use only the intercept as regressor, thus \(m_{{\varvec{s}}} = \beta _0\), whereas for the Physics-based Residual Universal Kriging (Phy-RUK) the drift \(m_{{\varvec{s}}} = \beta _0+\beta _1 p_{{\varvec{s}}}\), \(p_{{\varvec{s}}}\) being the numerical solution of the physical model (14) in the current interval \(a_i\) evaluated at location \({\varvec{s}}\). Note that the approaches can be compared starting from the second activation interval. As we observe the production rates in following intervals, we could use Phy-RK with increasing number of residual terms. Selection of the number \(K_i\) of residuals involved in predictor (11) for interval \(a_i\) is done by looking at the best predictor for the previous interval \(a_{i-1}\). This leads to \(K_i=1\) in all the intervals, for both Phy-ROK and Phy-RUK (see Fig. 3).

Rates predicted by Phy-RUK are shown in Figure S3 in the supplementary material, exhibiting a great agreement with the observations. Figure 4a displays the boxplots of the relative errors for each activation phase, comparing the different approaches. We note that, in this example, Phy has a lower prediction error than OK. However, combining the two approaches in the Phy-RK leads to remarkably lower errors. In particular, we observe that Phy-RUK, which uses the pressure as regressor, leads to consistently lower errors, showing that it can effectively exploit some information from the physical model in the geostatistical part of the model. Furthermore, we point out that the errors tend to be lower in the last activation intervals, when functional Kriging can take advantage from a higher number of observations. Note that, thanks to the sequential nature of our problem, we do not need cross-validation to select, among the four options, the model to employ in each interval. Indeed, we can choose the one that gives the lowest average relative error in the previous interval, assuming that there are no strong variations in the process that generates the data (both the physics and the stochastic process of the residuals) between consecutive time intervals. According to this criterion, Phy-RUK results to be the best model in all the intervals.

Fig. 3
figure 3

Relative prediction error boxplots produced by Phy-ROK (top panel) and Phy-RUK (bottom panel) with different orders of residual terms, for the case described in Sect. 4. The boxplots with red outlines are the ones of the models selected for their corresponding interval on the base of the lowest mean relative error in the previous time interval

4.4 Robustness to the compressibility coefficient

We now want to investigate the robustness of our approach to changes in the fluid compressibility \(c_w\), generating a new set of production data with OPM—as described in Sect. 4.1—with a higher fluid compressibility. This analysis can be used to evaluate the ability of the method to correct a more and more approximated physics in the simplified modeling. We here discuss three scenarios, in which the data are simulated multiplying the fluid compressibility by a factor of 10, 100 and 1000, with a baseline compressibility of \(2.74 \cdot 10^{-6}\) psi\(^{-1}\).

Figure 4 shows the relative errors boxplots varying the water compressibility (the other cases are reported in the supplementary material, Figure S4). We observe that the physical models predictions \({\mathcal {F}}_{{\varvec{s}}_0, a_i}\) are unreliable when the water compressibility is misspecified. However when the predictions \({\mathcal {F}}_{{\varvec{s}}_0, a_i}\) are employed in the framework of the Phy-ROK and Phy-RUK, they still significantly contribute to decrease the errors. Furthermore, we note that the errors increase when the compressibility increases. In correspondence of \(c_w = 2.74 \cdot 10^{-3}\) psi\(^{-1}\), the OK approach produces errors which are comparable to Physics-based Residual Kriging. This suggests that, when the physical model becomes too inaccurate, there is no value-added in considering the term \({\mathcal {F}}_{{\varvec{s}}_0, a_i}\) in model (9), but the residual modelization can partially correct the predictions. Concerning this aspect, using a spatial regression model with differential penalization (SR-PDE, Arnone et al. 2019), with an inaccurate physics would lead to worse predictions than a regression model without physics penalization. Indeed, in such a framework the functional data would be modeled as

$$\begin{aligned} {\mathcal {Y}}_{{\varvec{s}}_j} = f^0_{{\varvec{s}}_j} + \varepsilon _j, \end{aligned}$$
(16)

where \(f^0\) is a spatial field and \(\varepsilon\) a zero-mean residual, independent of the other residuals. The field \(f^0\) is then estimated minimizing the following functional

$$\begin{aligned} J_{a_i}(f) = \sum _{j=1}^{n_i} \int _{a_i} \left( f_{{\varvec{s}}_j} - {\mathcal {Y}}_{{\varvec{s}}_j}\right) ^2 + \lambda \int _{a_i} \int _{D} \left( \partial f/\partial t + Lf-u\right) ^2, \end{aligned}$$
(17)

where \(\lambda\) is a smoothing parameter, and \(\partial f/\partial t + Lf=u\) is the time-dependent PDE that expresses the prior knowledge on the phenomenon under study. When the physics is misspecified in (16)-(17), the field estimated via SR-PDE is biased towards it, leading to worse predictions than a pure regression model. Instead, we showed that Phy-RK may benefit also from such an imprecise physical model, thanks to the geostatistical modeling of the residual field \({\mathcal {X}}_{{\varvec{s}}}\).

Fig. 4
figure 4

Relative errors boxplots of the predictions for the case described in Sect. 4 produced by OK, Phy, Phy-ROK and Phy-RUK varying the fluid compressibility \(c_w\). Note that the y-axis is represented on a log scale

4.5 Permeability field degradation

Another important aspect to investigate is the partial knowledge of the parameters that play a role in the reservoir dynamics. We now assume that the permeability field is only partially known and we analyze two different scenarios. In the first one, we assume that the permeability field is corrupted by a spatially dependent noise, whereas in the second one, the permeability field is known only at the active well locations and the whole permeability field is obtained using scalar Kriging. In both the cases, Phy-ROK and Phy-RUK prove to be the best predictors, as discussed in details in the supplementary Section S2.

5 Single-phase flow driven by rock compaction

In the second test case, we deal with a single-phase flow in a reservoir which can be developed under primary depletion because production is supported firstly by rock compaction and then by fluid expansion, see Dake (1978). This problem idealizes typical production behavior in many real fields, where, for very long periods of time, production is solely due to the capability of the rock to compact and then provide energy to the fluid, see for instance the Ekofisk field (Sulak et al. 1991), the Valhall Field (Sulak et al. 1991) and a large part of the turbidite reservoirs in the Gulf of Mexico, (Morgenthaler et al. 2012).

5.1 Dataset description

Also for this test case, we consider data generated from the simulation of a synthetic reservoir, hereafter described. We rely on the ECLIPSE simulator, which fully implements rock compaction models, on the contrary to OPM simulator. As before, the ECLIPSE simulator is used only to produce the observed rates and not to generate the predictions. We consider the same reservoir illustrated in Sect. 4.1 in terms of geometry and petrophysical properties. The rock is considered to be compressible, with a porosity that depends on the pressure as \(\phi (p) = \phi _0 \eta (p)\), where \(\phi _0\) is the porosity at the reference pressure 7386 psi and \(\eta (p)\) is a pressure-dependent porosity multiplier, reported in Table 1. The fluid compressibility is set to \(3.13 \cdot 10^{-4}\) psi\(^{-1}\) and an initial reservoir pressure of 7000 psi is prescribed. During \(N_a = 8\) activation intervals, 100 producers are incrementally drilled in the reservoir, according to Fig. 5, and they operate at a constant bottom hole pressure of 3000 psi. We implemented the reservoir setting in ECLIPSE, getting the observations of the production rates depicted in Fig. 5.

Table 1 Porosity multiplier for different values of reservoir pressure
Fig. 5
figure 5

Simulated production rates and schedule for the reservoir described in Sect. 5.1. Top panel: blue lines represent the rates corresponding to newly drilled wells in each activation interval, gray lines correspond to wells already operating. Bottom panel: blue circles represent the newly drilled producers and grey circles the already operating ones

5.2 Physical model

We use a physical model which takes into account the rock compression, but in a simplified manner, whereas the fluid compression is not modeled. Starting from equation (13), we assume a constant fluid density \(\rho\) and a linear relation between porosity and pressure, with \(c_{rock}\) constant in the considered time interval, \(\phi =\phi _0 [1+c_{rock}(p-p_{rock})]\), where \(p_{rock}\) is the reference pressure, obtaining the following conservation equation

$$\begin{aligned} \nabla \cdot \left( \frac{1}{\mu } {\varvec{k}} \cdot \nabla p \right) = \phi _0 c_{rock}\frac{\partial p}{\partial t} + q. \end{aligned}$$
(18)

Then, discretizing equation (18), on a Cartesian grid, analogously to Sect. 4.2, and adding the implicit discretization of the temporal derivative of time step \(\varDelta t\), we end up with the following linear system, where the unknown is the discretized pressure at time \(t+\varDelta t\)

$$\begin{aligned} (A + D) {\varvec{p}}^{t+\varDelta t} = D {\varvec{p}}^{t} + {\varvec{b}}. \end{aligned}$$

A and the vector \({\varvec{b}}\) are defined according to (15), whereas D is a diagonal matrix, whose elements are \(D_{ii} = V_{i,j}\phi _0 c_{rock}/\varDelta t\). Thus, we need to solve a linear system at each time step, starting from the initial pressure \(p_0^{a_i}\). The spatial discretization \((\varDelta x, \varDelta y)\) is set to (200ft, 200ft), with a reservoir thickness \(H=50ft\). The time step \(\varDelta t\) is set to 28.95 days (which means we need 25 steps to cover each activation interval \(a_i\)) and we prescribe an initial condition \(p_0^{a_1}=7000\) psi for the first time interval. In order to obtain a physical model which produces meaningful predictions, we consider a constant rock compressibility in each time interval \(a_i\) equal to the sum of the fluid compressibility and the derivative of the function \(\eta (p)\) evaluated at the initial pressure \(p_0^{a_i}\) of each interval.

5.3 Results and discussion

For this problem, we apply Physics-based Residual Kriging analogously as in the first application, described in Sect. 4.3. We compare the same four approaches, namely, OK, Phy, Phy-ROK and Phy-RUK. In the case of Phy-RUK, the pressure computed solving equation (18), averaged over the i-th time interval, is used as covariate for the drift term. As in the first test case, we select the optimal number of residuals \(K_i\) in each interval as the one that produced the lowest average relative error in the previous interval, according to Figure S5 in the supplementary material. A major difference is in the resulting choice of the number of residual terms \(K_i\). Indeed, in this case, it is often convenient to select more than one residual term (see Table 2). Note that selected \(K_i\) is often coherent between the Phy-ROK and Phy-RUK approaches, except for slight discrepancies in the fourth and the sixth intervals. In the following, we assume that, in each time interval, the number of employed residual terms is chosen according to Table 2.

In many intervals, the Phy-RK method leads to much lower errors than a pure data-driven or pure-physical approach (see Fig. 6a and Table 2). In fact, the Phy-RUK proves to be the best predictor in the first intervals, whereas in the last ones the best is Phy-ROK. This is due to the degradation of the physical model predictions starting from the fourth interval, that implies also a worse reservoir pressure approximation, impacting the Phy-RUK performances. Note also that the chosen method for prediction (marked as “Selected model” in Fig. 6a) is always optimal, except for the fourth interval, where the best model would be Phy-ROK but Phy-RUK is selected instead. In correspondence of this interval, we observe a change in the goodness of the physical model predictions, that consequently impacts the Phy-RK predictions, leading to similar errors between Phy, Phy-ROK and Phy-RUK in intervals 4 and 5. However, starting from the sixth interval, the gain of the Phy-ROK method over Phy becomes more and more substantial.

To conclude, the Phy-RK method leads to a large improvement over the OK method in all the intervals. The Phy predictor is comparable to Phy-RK only in intervals 4 and 5, with much higher errors in the others. In Fig. 6b we show the predictions computed by the selected best models in each interval for this test case, noting a great agreement between predictions and observations.

Table 2 Optimal residual terms number \(K_i\) and mean and standard deviation of the relative error \(E_i\) in each interval for OK, Phy, Phy-ROK and Phy-RUK
Fig. 6
figure 6

a Relative errors boxplots of the predictions produced by OK, Phy, Phy-ROK and Phy-RUK. b Observed rates and rates predicted by the selected models for the case described in Sect. 5

6 Conclusions

In this work, we presented the Physics-based Residual Kriging method, a novel approach for geostatistical problems in presence of prior information expressed by a physical model. As further extension, we formulated it for sequential problems, where the Physics-based Residual Kriging predictor can be incrementally enriched by adding residual terms at each activation time interval. We applied this approach for the prediction of production rates in a mature reservoir. With two examples, we showed that this approach leads to remarkably lower prediction errors when compared to a pure geostatistical approach or to a pure physical model. Furthermore, we tested the robustness of our approach to changes or partial knowledge of the physical model parameters. Our study shows that a Physics-based Residual Kriging approach is indeed beneficial also when an inaccurate physical model is employed in its formulation. For all these reasons, Phy-RK represents a valuable approach to geostatistical problems, whenever a (possibly approximated) physical model is available.

As future research, we plan to apply the Phy-RK methodology with sequential update to more realistic reservoirs, where two or three phases flows make the underlying physics more complex. In these contexts, one might avoid the burden of a 3D full-physics simulation, by employing a surrogate model, e.g. INSIM (Zhao et al. 2015) or FlowNet (Kiærr et al. 2020), as the physics model term within Phy-RK approach.